Package edu.northwestern.at.utils.corpuslinguistics

Methods and interfaces for corpus linguistics, including comparative frequency analysis and collocation.

See:
          Description

Interface Summary
Pretokenizer Prepares a string for tokenization.
StringSimilarity Interface defining a method for computing string similarity.
 

Class Summary
BigramLogLikelihood Computes Dunnett's log-likelihood for bigrams.
Collocation Computes bigram collocation measures.
DefaultPretokenizer Prepare a string for tokenization.
DoubleMetaphone This code is based on an implementation by Ed Parrish, which was obtained from: http://www.cse.ucsc.edu/~eparrish/toolbox/search.html
FileTokenizer Tokenize a text file.
Frequency Computes frequency-based statistics for comparing corpora.
LevensteinDistance Computes the Levenstein edit distance between two strings.
NGramExtractor Extract ngrams from text.
Soundex Soundex: Implements the Soundex Algorithm.
WordCountExtractor Counts words in a text.
 

Package edu.northwestern.at.utils.corpuslinguistics Description

Methods and interfaces for corpus linguistics, including comparative frequency analysis and collocation.