edu.northwestern.at.wordhoard.swing.calculator.analysis
Class MultiwordUnitData

java.lang.Object
  extended by edu.northwestern.at.wordhoard.swing.calculator.analysis.MultiwordUnitData

public class MultiwordUnitData
extends java.lang.Object

Multiword unit data.

Holds data on counts and association measure values for one multiword unit.


Field Summary
protected  double dice
           
protected  NGramExtractor[] extractors
           
protected  java.lang.String leftSuccessorPattern
           
protected  double logLikelihood
           
protected  java.lang.String mwu
           
protected  int mwuCount
           
protected  int mwuLength
           
protected  double phiSquared
           
protected  java.lang.String rightSuccessorPattern
           
protected  double scp
           
protected  double si
           
protected  double sigLogLikelihood
           
protected  int totalWordCount
           
protected  java.util.Map wordCountMap
           
protected  int[] wordCounts
           
protected  java.lang.String[] words
           
 
Constructor Summary
MultiwordUnitData(java.lang.String mwu, java.util.Map wordCountMap, int totalWordCount, NGramExtractor[] extractors)
           
 
Method Summary
 void calculateAssociationMeasures()
          Calculate the association measures.
 double freq(java.lang.String[] words, int i1, int i2)
          Calculate the frequency for a portion of a set of words.
protected  double getAvp()
          Get the fair dispersion point normalization.
protected  double getAvp2()
          Get the fair dispersion point normalization.
 double getAvx()
          Calculate fair probability for the left hand side of a pseudo-bigram.
 double getAvy()
          Calculate fair probability for the right hand side of a pseudo-bigram.
 double getDice()
          Return the Dice coefficient.
 double getLogLikelihood()
          Return log likelihood.
 java.lang.String getMWUText()
          Get the multiword unit text.
 int getMWUTextCount()
          Get the count for this multiword unit text.
 int getMWUTextLength()
          Get the number of words in this multiword unit.
 double getPhiSquared()
          Return phi squared.
 double getSCP()
          Return the symmetric conditional probability.
 double getSI()
          Return the specific mutual information.
 double getSigLogLikelihood()
          Return significance of log likelihood.
 int getWordCount(java.lang.String word)
          Get count for a specific word from the count map.
 int[] getWordCounts()
          Get the count for each word in this multiword unit.
 java.lang.String[] getWords()
          Get the words in this multiword unit.
 java.lang.String leftAntecedent()
          Get the left antecedent of the current multiword unit.
 java.lang.String[] leftSuccessors()
          Get the left successors of the current multiword unit.
 double prob(java.lang.String[] words, int i1, int i2)
          Calculate the probability for a portion of a set of words.
 java.lang.String rightAntecedent()
          Get the right antecedent of the current multiword unit.
 java.lang.String[] rightSuccessors()
          Get the right successors of the current multiword unit.
 java.lang.String[] successors()
          Get the successors of the current multiword unit.
 java.lang.String toString()
          Return mwu as a displayable string.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

mwu

protected java.lang.String mwu

mwuCount

protected int mwuCount

mwuLength

protected int mwuLength

words

protected java.lang.String[] words

wordCounts

protected int[] wordCounts

dice

protected double dice

logLikelihood

protected double logLikelihood

phiSquared

protected double phiSquared

scp

protected double scp

si

protected double si

sigLogLikelihood

protected double sigLogLikelihood

extractors

protected NGramExtractor[] extractors

leftSuccessorPattern

protected java.lang.String leftSuccessorPattern

rightSuccessorPattern

protected java.lang.String rightSuccessorPattern

totalWordCount

protected int totalWordCount

wordCountMap

protected java.util.Map wordCountMap
Constructor Detail

MultiwordUnitData

public MultiwordUnitData(java.lang.String mwu,
                         java.util.Map wordCountMap,
                         int totalWordCount,
                         NGramExtractor[] extractors)
Method Detail

getMWUText

public java.lang.String getMWUText()
Get the multiword unit text.

Returns:
The multiword unit text.

getMWUTextCount

public int getMWUTextCount()
Get the count for this multiword unit text.

Returns:
Count of appearances of this multiword unit.

getMWUTextLength

public int getMWUTextLength()
Get the number of words in this multiword unit.

Returns:
Number of words in this multiword unit.

getWords

public java.lang.String[] getWords()
Get the words in this multiword unit.

Returns:
Words in this multiword unit.

getWordCounts

public int[] getWordCounts()
Get the count for each word in this multiword unit.

Returns:
Count for each word in this multiword unit.

leftAntecedent

public java.lang.String leftAntecedent()
Get the left antecedent of the current multiword unit.

Returns:
The left antecedent as a string.

rightAntecedent

public java.lang.String rightAntecedent()
Get the right antecedent of the current multiword unit.

Returns:
The right antecedent as a string.

successors

public java.lang.String[] successors()
Get the successors of the current multiword unit.

Returns:
The successors as an array of strings.

leftSuccessors

public java.lang.String[] leftSuccessors()
Get the left successors of the current multiword unit.

Returns:
The left successors as an array of strings.

rightSuccessors

public java.lang.String[] rightSuccessors()
Get the right successors of the current multiword unit.

Returns:
The right successors as an array of strings.

getAvx

public double getAvx()
Calculate fair probability for the left hand side of a pseudo-bigram.

Returns:
Fair probability for left hand side of pseudo-bigram.

getAvy

public double getAvy()
Calculate fair probability for the right hand side of a pseudo-bigram.

Returns:
Fair probability for right hand side of pseudo-bigram.

getAvp

protected double getAvp()
Get the fair dispersion point normalization.

Returns:
Fair dispersion point normalization.

getAvp2

protected double getAvp2()
Get the fair dispersion point normalization.

Returns:
Fair dispersion point normalization.

calculateAssociationMeasures

public void calculateAssociationMeasures()
Calculate the association measures.


prob

public double prob(java.lang.String[] words,
                   int i1,
                   int i2)
Calculate the probability for a portion of a set of words.

Parameters:
words - The words.
i1 - Starting index.
i2 - Ending index.
Returns:
Probability from word counts.

We use the maximum likelihood estimate of the probability, which is just the number of times the word appears divided by the number of words. For ngrams, we divide the number of times the ngram appears by the total number of ngrams containing the same number of words.


freq

public double freq(java.lang.String[] words,
                   int i1,
                   int i2)
Calculate the frequency for a portion of a set of words.

Parameters:
words - The words.
i1 - Starting index.
i2 - Ending index.
Returns:
Frequency from ngram frequencies.

getDice

public double getDice()
Return the Dice coefficient.

Returns:
The Dice coefficient.

getLogLikelihood

public double getLogLikelihood()
Return log likelihood.

Returns:
log likelihood.

getPhiSquared

public double getPhiSquared()
Return phi squared.

Returns:
phi squared.

getSCP

public double getSCP()
Return the symmetric conditional probability.

Returns:
The symmetric conditional probability.

getSI

public double getSI()
Return the specific mutual information.

Returns:
The specific mutual information.

getSigLogLikelihood

public double getSigLogLikelihood()
Return significance of log likelihood.

Returns:
significance of log likelihood.

getWordCount

public int getWordCount(java.lang.String word)
Get count for a specific word from the count map.

Parameters:
word - The word text.
Returns:
The count for the specified word. 0 if the word does not occur.

toString

public java.lang.String toString()
Return mwu as a displayable string.

Overrides:
toString in class java.lang.Object
Returns:
The mwu as a displayable string.