edu.northwestern.at.utils.corpuslinguistics
Class Collocation

java.lang.Object
  extended by edu.northwestern.at.utils.corpuslinguistics.Collocation

public class Collocation
extends java.lang.Object

Computes bigram collocation measures.


Field Summary
static int DICE
          Indices of association measures in result array.
static int LOGLIKE
           
static int PHISQUARED
           
static int SCP
           
static int SMI
           
static int T
           
static int Z
           
 
Constructor Summary
protected Collocation()
          Don't allow instantiation but do allow overrides.
 
Method Summary
static double[] association(int sampleCount, int refCount, int sampleSize, int refSize)
          Computes collocation measures.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DICE

public static final int DICE
Indices of association measures in result array.

See Also:
Constant Field Values

LOGLIKE

public static final int LOGLIKE
See Also:
Constant Field Values

PHISQUARED

public static final int PHISQUARED
See Also:
Constant Field Values

SMI

public static final int SMI
See Also:
Constant Field Values

SCP

public static final int SCP
See Also:
Constant Field Values

T

public static final int T
See Also:
Constant Field Values

Z

public static final int Z
See Also:
Constant Field Values
Constructor Detail

Collocation

protected Collocation()
Don't allow instantiation but do allow overrides.

Method Detail

association

public static double[] association(int sampleCount,
                                   int refCount,
                                   int sampleSize,
                                   int refSize)
Computes collocation measures.

Parameters:
sampleCount - Count of collocation appearance in sample.
refCount - Count of collocation appearance in reference corpus.
sampleSize - Number of words/lemmas in the sample.
refSize - Number of words/lemmas in reference corpus.
Returns:
A double array containing the following measures of collocational association. (0) Dice coefficient (1) Log likelihood (2) Phi squared (3) Specific Mutual information score (4) Symmetric conditional probability (5) z score (6) t score