Package strmetric
provides functions to measure string similarity and distance for Go lang.
Currently, package provides following similarity metrics;
Dice-Sørensen algorithm is used to gauge the similarity of two samples.
strmetric.DiceSorensonMetric("night", "nacht") // 0.6
The Hamming distance measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that. Used for error detection.
strmetric.HammingMetric("11011001", "10011101") // 2
Jaro Similarity is the edit distance between two strings. The higher the Jaro distance for two strings is, the more similar the strings are.
strmetric.JaroMetric("JELLYFISH", "SMELLYFISH") // 0.896296
Uses the Jaro similarity but takes both string prefix into an account(up-to 4 characters) and factors into a score.
strmetric.JaroMetric("martha", "marhta") // 0.944444
strmetric.JaroWinklerMetric("martha", "marhta") // 0.961111
Levenshtein distance is a string metric for measuring the difference between two sequences. Used for spell checkers, error detection/correction in optical character recognition, etc.
strmetric.LevenshteinMetric("kitten", "sitting") // 3
Use of this source code is governed by an MIT license that can be found in the LICENSE file.