Rust implementations of string similarity metrics:
- Hamming
- Levenshtein - distance & normalized
- Optimal string alignment
- Damerau-Levenshtein - distance & normalized
- Jaro and Jaro-Winkler
- Sørensen-Dice
The normalized versions return values between 0.0
and 1.0
, where 1.0
means
an exact match.
There are also generic versions of the functions for non-string inputs.
strsim
is available on crates.io. Add it to
your project:
cargo add strsim
Go to Docs.rs for the full documentation. You can
also clone the repo, and run $ cargo doc --open
.
extern crate strsim;
use strsim::{hamming, levenshtein, normalized_levenshtein, osa_distance,
damerau_levenshtein, normalized_damerau_levenshtein, jaro,
jaro_winkler, sorensen_dice};
fn main() {
match hamming("hamming", "hammers") {
Ok(distance) => assert_eq!(3, distance),
Err(why) => panic!("{:?}", why)
}
assert_eq!(levenshtein("kitten", "sitting"), 3);
assert!((normalized_levenshtein("kitten", "sitting") - 0.571).abs() < 0.001);
assert_eq!(osa_distance("ac", "cba"), 3);
assert_eq!(damerau_levenshtein("ac", "cba"), 2);
assert!((normalized_damerau_levenshtein("levenshtein", "löwenbräu") - 0.272).abs() <
0.001);
assert!((jaro("Friedrich Nietzsche", "Jean-Paul Sartre") - 0.392).abs() <
0.001);
assert!((jaro_winkler("cheeseburger", "cheese fries") - 0.911).abs() <
0.001);
assert_eq!(sorensen_dice("web applications", "applications of the web"),
0.7878787878787878);
}
Using the generic versions of the functions:
extern crate strsim;
use strsim::generic_levenshtein;
fn main() {
assert_eq!(2, generic_levenshtein(&[1, 2, 3], &[0, 2, 5]));
}
If you don't want to install Rust itself, you can run $ ./dev
for a
development CLI if you have Docker installed.
Benchmarks require a Nightly toolchain. Run $ cargo +nightly bench
.