edu.northwestern.at.utils.corpuslinguistics
Class LevensteinDistance

java.lang.Object
  extended by edu.northwestern.at.utils.corpuslinguistics.LevensteinDistance

public class LevensteinDistance
extends java.lang.Object

Computes the Levenstein edit distance between two strings.

The Levenstein edit distance is the number of insertions, deletions, substitutions, and adjacent transpositions required to transform one string into another. The larger the Levenstein distance, the more different the strings are.

The edit distance between two strings s1 and s2 can be converted to a similarity measure as follows:

max_length = max( length of s1 , length of s2 ) edit_distance = edit distance between s1 and s2 similarity = 1.0 - ( edit_distance / max_length )

This implementation of Levenstein distance is based upon one by Michael Gilleland and Charles Emerick.


Constructor Summary
LevensteinDistance()
           
 
Method Summary
static boolean areAlike(java.lang.String s1, java.lang.String s2)
          Are two strings alike based upon edit distance.
static int editDistance(java.lang.String s1, java.lang.String s2)
          Compute Levenstein edit distance between two strings.
static double similarity(java.lang.String s1, java.lang.String s2)
          Compute similarity between two strings.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LevensteinDistance

public LevensteinDistance()
Method Detail

editDistance

public static int editDistance(java.lang.String s1,
                               java.lang.String s2)
Compute Levenstein edit distance between two strings.

Parameters:
s1 - First string.
s2 - Second string.,
Returns:
Edit distance between strings s1 and s2.

similarity

public static double similarity(java.lang.String s1,
                                java.lang.String s2)
Compute similarity between two strings.

Parameters:
s1 - First string.
s2 - Second string.
Returns:
Similarity as a value from 0.0 (no similarity) to 1.0 (perfect similarity).

The similarity is computed from the edit distance between s1 and s2 as follows:

max_length = max( length of s1 , length of s2 ) edit_distance = edit distance between s1 and s2 similarity = 1.0 - ( edit_distance / max_length )


areAlike

public static boolean areAlike(java.lang.String s1,
                               java.lang.String s2)
Are two strings alike based upon edit distance.

Parameters:
s1 - First string.
s2 - Second string.
Returns:
True if strings are alike.