edu.northwestern.at.utils.corpuslinguistics
Class Frequency

java.lang.Object
  extended by edu.northwestern.at.utils.corpuslinguistics.Frequency

public class Frequency
extends java.lang.Object

Computes frequency-based statistics for comparing corpora.


Constructor Summary
protected Frequency()
          Don't allow instantiation but do allow overrides.
 
Method Summary
static double[] logLikelihoodFrequencyComparison(int sampleCount, int refCount, int sampleSize, int refSize)
          Compute log-likelihood statistic for comparing frequencies in two corpora.
static double[] logLikelihoodFrequencyComparison(int sampleCount, int refCount, int sampleSize, int refSize, boolean computeLLSig)
          Compute log-likelihood statistic for comparing frequencies in two corpora.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Frequency

protected Frequency()
Don't allow instantiation but do allow overrides.

Method Detail

logLikelihoodFrequencyComparison

public static double[] logLikelihoodFrequencyComparison(int sampleCount,
                                                        int refCount,
                                                        int sampleSize,
                                                        int refSize,
                                                        boolean computeLLSig)
Compute log-likelihood statistic for comparing frequencies in two corpora.

Parameters:
sampleCount - Count of word/lemma appearance in sample.
refCount - Count of word/lemma appearance in reference corpus.
sampleSize - Total words/lemmas in the sample.
refSize - Total words/lemmas in reference corpus.
computeLLSig - Compute significance of log likelihood.
Returns:
A double array containing frequency comparison statistics.

The contents of the result array are as follows.

(0) Count of word/lemma appearance in sample.
(1) Percent of word/lemma appearance in sample.
(2) Count of word/lemma appearance in reference.
(3) Percent of word/lemma appearance in reference.
(4) Log-likelihood measure.
(5) Significance of log-likelihood.

The results of any zero divides are set to zero.


logLikelihoodFrequencyComparison

public static double[] logLikelihoodFrequencyComparison(int sampleCount,
                                                        int refCount,
                                                        int sampleSize,
                                                        int refSize)
Compute log-likelihood statistic for comparing frequencies in two corpora.

Parameters:
sampleCount - Count of word/lemma appearance in sample.
refCount - Count of word/lemma appearance in reference corpus.
sampleSize - Total words/lemmas in the sample.
refSize - Total words/lemmas in reference corpus.
Returns:
A double array containing frequency comparison statistics.

The contents of the result array are as follows.

(0) Count of word/lemma appearance in sample.
(1) Percent of word/lemma appearance in sample.
(2) Count of word/lemma appearance in reference.
(3) Percent of word/lemma appearance in reference.
(4) Log-likelihood measure.
(5) Significance of log-likelihood.