edu.northwestern.at.wordhoard.swing.calculator.modelutils
Class WordCounter

java.lang.Object
  extended by edu.northwestern.at.wordhoard.swing.calculator.modelutils.WordCounter

public class WordCounter
extends java.lang.Object

A word form counter.

WordCounter wraps a WordHoard model object which can count "words," specifically, spellings, lemmata, speaker gender, parts of speech, prose versus poetry flag, and (eventually) semantic category. Currently wrappable objects include the Corpus, PhraseSet, WordSet, WorkSet, Work, and WorkPart objects. All parts of WordHoard requiring word and work counts should get them through a WordCounter object. WordCounter hides all the ugly details involved in getting counts for the different types of objects.


Field Summary
static int CORPUS
          Word counter types for the classes above.
protected  java.lang.Object object
          The object containing countable words.
protected  int objectType
          The type of word form counter object.
static int PHRASESET
           
static java.lang.Class[] wordFormCounterClasses
          The word counter classes.
static int WORDSET
           
static int WORK
           
static int WORKPART
           
static int WORKSET
           
 
Constructor Summary
WordCounter(CanCountWords canCountWords)
          Create a word form counter for a CanCountWords object.
WordCounter(int anObjectType, java.lang.Long anObjectId)
          Create a word form counter for a specified object type and ID.
WordCounter(int anObjectType, java.lang.String aTag)
          Create a word form counter for a specified object type and tag.
 
Method Summary
 boolean equals(java.lang.Object object)
          Compare another word form object to this one for equality.
 WorkPart[] getActualWorkParts()
          Return all actual work parts in this object.
 int getDistinctWordFormCount(int wordForm)
          Get distinct word count for word form type.
 java.lang.Object getObject()
          Get the word form counter object.
 java.lang.Long getObjectId()
          Get the word form counter object's persistence Id.
 int getObjectType()
          Get the word form counter object type.
 java.util.Map[] getPhrasesAndCounts(int wordForm)
          Get phrases and their counts of a specific word form type.
 Word[] getSpan(Word word, int leftSpan, int rightSpan)
          Get surrounding words of a specified word.
 java.lang.String getTag()
          Get the tag for an object.
 int getTotalWordFormCount(int wordForm)
          Get total word count for word form type.
static WordCounter[] getWordCounters()
          Get all available word counter types as WordCounter objects.
static WordCounter[] getWordCounters(boolean includeCorpora, boolean includeWorks, boolean includeWorkSets, boolean includeWordSets, boolean includePhraseSets)
          Get all available works, work sets, and corpora as WordCounter objects.
 java.util.Map getWordFormCount(Spelling[] words, int wordForm)
          Get counts for several words.
 int getWordFormCount(Spelling word, int wordForm)
          Get count of a word form.
 java.util.Map[] getWordFormCountByYear(Spelling word, int wordForm, boolean usePhrases)
          Get word form and its counts by year.
 Word[] getWordOccurrences(Spelling word, int wordForm)
          Get word occurrences for a word.
 java.util.Map[] getWordsAndCounts(int wordForm)
          Get words and their counts of a specific word form type.
 java.util.Map[] getWordsAndCounts(int wordForm, boolean getWorkCounts)
          Get words and their counts of a specific word form type.
 java.util.Map[] getWordsAndCounts(WordCounter otherCounter, int wordForm)
          Get words and counts for two WordCounter objects.
 int getWorkCount()
          Get the number of works represented in this word counter.
 int getWorkCount(WordCounter otherCounter)
          Get the number of works represented in this word counter and another.
 WorkPart[] getWorkParts()
          Return all work parts in this object.
 Work[] getWorks()
          Return all works represented in this object.
 int hashCode()
          Returns a hash code for the object.
protected  java.lang.String htmlizeTitle(java.lang.String title)
          HTMLize an object title.
 boolean isCorpus()
          Is word counter a Corpus?
 boolean isPhraseSet()
          Is word counter a PhraseSet?
 boolean isWordSet()
          Is word counter a WordSet?
 boolean isWork()
          Is word counter a Work?
 boolean isWorkPart()
          Is word counter a work part?
 boolean isWorkSet()
          Is word counter a WorkSet?
 java.lang.String toHTMLString()
          Return HTML string form of object.
 java.lang.String toHTMLString(boolean useShortName)
          Return HTML string form of object.
 java.lang.String toString()
          Return string form of object.
 java.lang.String toString(boolean useShortString)
          Return string form of object.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

wordFormCounterClasses

public static final java.lang.Class[] wordFormCounterClasses
The word counter classes.


CORPUS

public static final int CORPUS
Word counter types for the classes above.

See Also:
Constant Field Values

PHRASESET

public static final int PHRASESET
See Also:
Constant Field Values

WORDSET

public static final int WORDSET
See Also:
Constant Field Values

WORK

public static final int WORK
See Also:
Constant Field Values

WORKPART

public static final int WORKPART
See Also:
Constant Field Values

WORKSET

public static final int WORKSET
See Also:
Constant Field Values

object

protected java.lang.Object object
The object containing countable words.


objectType

protected int objectType
The type of word form counter object.

Constructor Detail

WordCounter

public WordCounter(CanCountWords canCountWords)
Create a word form counter for a CanCountWords object.

Parameters:
canCountWords - The object implementing CanCountWords.

WordCounter

public WordCounter(int anObjectType,
                   java.lang.Long anObjectId)
Create a word form counter for a specified object type and ID.

Parameters:
anObjectType - WordCounter object type.
anObjectId - The object's ID.

WordCounter

public WordCounter(int anObjectType,
                   java.lang.String aTag)
Create a word form counter for a specified object type and tag.

Parameters:
anObjectType - WordCounter object type.
aTag - The object's tag.

Not all WordCounter objects have permanent tags. Only those with permanent tags can be created using this constructor.

Method Detail

getObject

public java.lang.Object getObject()
Get the word form counter object.

Returns:
The word form counter object.

getObjectType

public int getObjectType()
Get the word form counter object type.

Returns:
The word form counter object type.

getObjectId

public java.lang.Long getObjectId()
Get the word form counter object's persistence Id.

Returns:
The persistence Id.

getTag

public java.lang.String getTag()
Get the tag for an object.

Returns:
The tag for the object.

Not all WordCounter objects have tags. For those that do not, toString() is returned as the tag.


isCorpus

public boolean isCorpus()
Is word counter a Corpus?

Returns:
True if word counter is a Corpus.

isPhraseSet

public boolean isPhraseSet()
Is word counter a PhraseSet?

Returns:
True if word counter is a PhraseSet.

isWork

public boolean isWork()
Is word counter a Work?

Returns:
True if word counter is a Work.

isWorkPart

public boolean isWorkPart()
Is word counter a work part?

Returns:
True if word counter is a WorkPart.

isWordSet

public boolean isWordSet()
Is word counter a WordSet?

Returns:
True if word counter is a WordSet.

isWorkSet

public boolean isWorkSet()
Is word counter a WorkSet?

Returns:
True if word counter is a WorkSet.

getWordFormCount

public int getWordFormCount(Spelling word,
                            int wordForm)
Get count of a word form.

Parameters:
word - The word form whose count is desired.
wordForm - The type of word form as specified in WordForms.
Returns:
The count of times the word form appears.

getWordFormCount

public java.util.Map getWordFormCount(Spelling[] words,
                                      int wordForm)
Get counts for several words.

Parameters:
words - The words whose counts are desired.
wordForm - The type of word form as specified in WordForms.
Returns:
Map with each word as a key and count of times each word appears as a value.

getWordsAndCounts

public java.util.Map[] getWordsAndCounts(int wordForm,
                                         boolean getWorkCounts)
Get words and their counts of a specific word form type.

Parameters:
wordForm - The word form as specified in WordForms.
getWorkCounts - true to get work counts instead of work list in second result map (see below).
Returns:
Array of two maps.

The first map contains each word of the specified word form in the first set of work parts as a key and the count of the appearance of the word in the first set of work parts as a value.

The second map also has the word as the key. If getWorkCounts is true, the value for each word provides the number of works (NOT work parts) in which the word appears as a value. If getWorkParts is false, the value is a hash set containing the work IDs of the works in which the word appears.


getWordsAndCounts

public java.util.Map[] getWordsAndCounts(int wordForm)
Get words and their counts of a specific word form type.

Parameters:
wordForm - The word form as specified in WordForms.
Returns:
Array of two maps.

The first map contains each word of the specified word form in the first set of work parts as a key and the count of the appearance of the word in the first set of work parts as a value.

The second map also has the word as the key but provides the number of works (NOT work parts) in which the word appears as a value.


getPhrasesAndCounts

public java.util.Map[] getPhrasesAndCounts(int wordForm)
Get phrases and their counts of a specific word form type.

Parameters:
wordForm - The word form as specified in WordForms.
Returns:
Array of two maps.

The first map contains each phrase containing the specified word form as a key and the count of the appearance of the phrase as a value.

The second map also has the phrase as the key but provides the number of works (NOT work parts) in which the phrase appears as a value.


getWordsAndCounts

public java.util.Map[] getWordsAndCounts(WordCounter otherCounter,
                                         int wordForm)
Get words and counts for two WordCounter objects.

Parameters:
otherCounter - The other word counter.
wordForm - The word form as specified in WordForms.
Returns:
Array of three maps.

The first map contains each word of the specified word form in the first set of work parts as a key and the count of the appearance of the word in the first set of work parts as a value.

The second map contains each word of the specified word form in the second set of work parts as a key and the count of the appearance of the word in the second set of work parts as a value.

The third map also has the word as the key but provides the number of works (NOT work parts) in which the word appears as a value in either of the two sets of work parts.

This method significantly reduces the query load when the two sets of work parts have common entries.


getTotalWordFormCount

public int getTotalWordFormCount(int wordForm)
Get total word count for word form type.

Parameters:
wordForm - The word form as specified in WordForms.
Returns:
The total count of the word form type.

getDistinctWordFormCount

public int getDistinctWordFormCount(int wordForm)
Get distinct word count for word form type.

Parameters:
wordForm - The word form as specified in WordForms.
Returns:
The number of distinct values of the word form type.

getWordFormCountByYear

public java.util.Map[] getWordFormCountByYear(Spelling word,
                                              int wordForm,
                                              boolean usePhrases)
Get word form and its counts by year.

Parameters:
word - The word form whose count is desired.
wordForm - The type of word form as specified in WordForms.
usePhrases - Analyze phrase counts instead of word counts if the current object allows this.
Returns:
Three maps, each with the year as a key. The first has the word count in the year as a value. The second has the total word count in the year as a value. The third has the work count in the year as a value.

getWordCounters

public static WordCounter[] getWordCounters(boolean includeCorpora,
                                            boolean includeWorks,
                                            boolean includeWorkSets,
                                            boolean includeWordSets,
                                            boolean includePhraseSets)
Get all available works, work sets, and corpora as WordCounter objects.

Parameters:
includeCorpora - Return corpora.
includeWorks - Return works.
includeWorkSets - Return work sets.
includeWordSets - Return word sets.
includePhraseSets - Return phrase sets.
Returns:
Array of WordCounter objects.

getWordCounters

public static WordCounter[] getWordCounters()
Get all available word counter types as WordCounter objects.

Returns:
Array of WordCounter objects for all word counter types. Null if none.

getWorkParts

public WorkPart[] getWorkParts()
Return all work parts in this object.

Returns:
Array of WorkPart.

For a work, the returned array has just one entry for the work. For a work part, the returned array has just one entry for the work part. For a corpus, the returned array usually has more than one entry, one for each work in the corpus. For both Work and Corpus, all the returned WorkPart entries are actually Work objects. For a WorkSet, the returned entries may be a combination of Work and WorkPart objects.


getActualWorkParts

public WorkPart[] getActualWorkParts()
Return all actual work parts in this object.

Returns:
Array of WorkPart.

This is what getWorkParts should probably do but does not for historical reasons.

$$$PIB$$$ Need to rectify this with getWorkParts.

getWorks

public Work[] getWorks()
Return all works represented in this object.

Returns:
Array of Work.

getWorkCount

public int getWorkCount()
Get the number of works represented in this word counter.

Returns:
The number of works.

getWorkCount

public int getWorkCount(WordCounter otherCounter)
Get the number of works represented in this word counter and another.

Parameters:
otherCounter - The other counter.
Returns:
The number of works.

getWordOccurrences

public Word[] getWordOccurrences(Spelling word,
                                 int wordForm)
Get word occurrences for a word.

Parameters:
word - The word to look up.
wordForm - The word form.
Returns:
Array of Word entries.

getSpan

public Word[] getSpan(Word word,
                      int leftSpan,
                      int rightSpan)
Get surrounding words of a specified word.

Parameters:
word - Word for which to get span.
leftSpan - # of words to left of specified word to retrieve.
rightSpan - # of words to right of specified word to retrieve.
Returns:
Span of words around specified word.

htmlizeTitle

protected java.lang.String htmlizeTitle(java.lang.String title)
HTMLize an object title.

Parameters:
title - The object title.
Returns:
The HTMLized title.

toHTMLString

public java.lang.String toHTMLString(boolean useShortName)
Return HTML string form of object.

Parameters:
useShortName - True to use short object name.
Returns:
HTML string form (=long or short title) of object.

toHTMLString

public java.lang.String toHTMLString()
Return HTML string form of object.

Returns:
HTML string form (=title) of object.

toString

public java.lang.String toString()
Return string form of object.

Overrides:
toString in class java.lang.Object
Returns:
String form (=title) of object.

toString

public java.lang.String toString(boolean useShortString)
Return string form of object.

Parameters:
useShortString - Return short string.
Returns:
String form of object.

At present, we only return short strings if requested for Work objects.


equals

public boolean equals(java.lang.Object object)
Compare another word form object to this one for equality.

Overrides:
equals in class java.lang.Object
Parameters:
object - The other word form object.

Two WordCounter objects are equal if they wrap objects with the same id.


hashCode

public int hashCode()
Returns a hash code for the object.

Overrides:
hashCode in class java.lang.Object
Returns:
The hash code, based upon the id of the wrapped object.