edu.northwestern.at.wordhoard.swing.calculator.modelutils
Class WordSetUtils

java.lang.Object
  extended by edu.northwestern.at.wordhoard.swing.calculator.modelutils.WordSetUtils
All Implemented Interfaces:
java.io.Serializable

public class WordSetUtils
extends java.lang.Object
implements java.io.Serializable

Word set utilities.

Methods for adding, deleting, and updating word sets. Word sets are unstructured collections of words which may span many different corpora, works, and work parts.

To add a word set, call addWordSetUsingQuery. The protected versions of addWordSet are used to perform portions of the process and should not be used on their own. Word sets are marked as owned by the currently logged-in user as stored in WordHoardSettings.

See Also:
Serialized Form

Field Summary
protected static long serialVersionUID
          Serial version UID.
 
Constructor Summary
protected WordSetUtils()
          Don't allow instantiation but do allow overrides.
 
Method Summary
protected static boolean addWords(WordSet wordSet, java.util.Map words, ProgressReporter progressReporter)
          Add words to a word set.
protected static boolean addWords(WordSet wordSet, java.lang.String[] wordTags, java.lang.String[] workTags, java.lang.String[] workPartTags, ProgressReporter progressReporter)
          Add words to a word set.
protected static WordSet addWordSet(java.lang.String title, java.lang.String description, java.lang.String webPageURL, boolean isPublic, java.lang.String query)
          Add a new word set.
static WordSet addWordSet(java.lang.String title, java.lang.String description, java.lang.String webPageURL, boolean isPublic, java.lang.String query, java.util.Collection words, ProgressReporter progressReporter)
          Add a new word set.
protected static WordSet addWordSet(java.lang.String title, java.lang.String description, java.lang.String webPageURL, boolean isPublic, java.lang.String query, CountableWordDataCounter wordDataCounter, ProgressReporter progressReporter)
          Create word set given words and counts.
protected static WordSet addWordSet(java.lang.String title, java.lang.String description, java.lang.String webPageURL, boolean isPublic, java.lang.String query, java.util.Map words, ProgressReporter progressReporter)
          Add a new word set with specified words.
static WordSet addWordSet(java.lang.String title, java.lang.String description, java.lang.String webPageURL, boolean isPublic, java.lang.String query, Word[] words, ProgressReporter progressReporter)
          Add a new word set.
static WordSet addWordSetUsingQuery(java.lang.String title, java.lang.String description, java.lang.String webPageURL, boolean isPublic, WordCounter analysisText, java.lang.String query, java.awt.Window parentWindow, ProgressReporter progressReporter)
          Add a new word set using a specified query.
static java.lang.Object[] createCounts(CountableWordDataCounter wordCounter)
          Create count entries for a word set.
static boolean deleteWordSet(java.lang.String title)
          Delete a word set by title.
static boolean deleteWordSet(WordSet wordSet)
          Delete a word set.
static boolean deleteWordSets(WordSet[] wordSets)
          Delete multiple word sets.
static boolean deleteWordSets(WordSet[] wordSets, ProgressReporter progressReporter)
          Delete multiple word sets.
static int getDistinctWordFormCount(WordSet wordSet, int wordForm)
          Get distinct word form count in a word set.
static Word[] getSpan(WordSet wordSet, Word word, int leftSpan, int rightSpan)
          Get surrounding words of a specified word in a word set.
static Word[] getWord(WordSet wordSet)
          Get all available words in a word set as an array of Word.
static java.util.Map[] getWordCounts(WordSet[] wordSets, int wordForm)
          Get word form counts in a set of word sets.
static java.util.Map[] getWordCounts(WordSet[] wordSets, int wordForm, boolean getWorkCounts)
          Get word form counts in a set of word sets.
static java.util.Map getWordCounts(WordSet wordSet, int wordForm)
          Get word counts in a single word set.
static java.util.Map getWordFormCount(WordSet[] wordSets, Spelling[] words, int wordForm)
          Get word count for multiple words in a set of word sets.
static int getWordFormCount(WordSet wordSet, int wordForm)
          Get total word form count in a word set.
static int getWordFormCount(WordSet wordSet, int wordForm, Work work)
          Get total word form count for one work represented in a word set.
static java.util.Map getWordFormCount(WordSet wordSet, Spelling[] words, int wordForm)
          Get word count for multiple words in a word set.
static int getWordFormCount(WordSet wordSet, Spelling word, int wordForm)
          Get word count in a word set.
static int getWordFormCount(WordSet wordSet, Spelling word, int wordForm, Work work)
          Get word count in a word set for a specific work.
static Word[] getWordOccurrences(WordSet wordSet, int wordForm, Spelling word)
          Get word occurrences for a word in a specified word set.
static Word[] getWords(WordSet wordSet)
          Get all available words in a word set as an array.
static WordSet getWordSet(java.lang.String title)
          Get a word set by title.
static WordSet getWordSet(java.lang.String title, java.lang.String owner)
          Get a word set by title.
static WordSet[] getWordSets()
          Get all available public word sets as an array.
static WordSet[] getWordSets(java.lang.String owner)
          Get all available word sets for a specified owner as an array.
static int getWordSetsCount()
          Get count of all available word sets.
static int getWordSetsCount(java.lang.String owner)
          Get count of word sets for a user.
static WordSet[] getWordSetsForLoggedInUser()
          Get all word sets for the logged in user as an array.
static WorkPart[] getWorkParts(WordSet wordSet)
          Get array of all work parts for a word set.
static Work[] getWorks(WordSet wordSet)
          Get array of all works for a word set.
static WordSet importFromDOMDocument(org.w3c.dom.Node wordSetNode)
          Import a specified word set by name from a DOM document.
static WordSet[] importWordSets(org.w3c.dom.Document importDocument)
          Import one or more word sets from XML file.
static boolean isWordInWordSet(WordSet wordSet, Word word)
          Check if word is in in a specified word set.
static WordSet loadWordSet(WordSet wordSet)
          Load a word set if it isn't already loaded.
protected static int performBatchInserts(java.lang.String[] inserts)
          Perform batch of inserts.
protected static boolean persistCounts(WordSet wordSet, int wordForm, java.util.Map wordCountMap, java.util.Map totalWordCountMap, boolean persistingPhrases, ProgressReporter progressReporter)
          Persist counts for word set map.
protected static boolean persistWordSetCounts(WordSet wordSet, java.util.Map[] wordCountMaps, java.util.Map[] totalCountMaps, boolean persistingPhrases, ProgressReporter progressReporter)
          Persist word set counts.
static java.util.Set pruneToWordsInWordSet(WordSet wordSet, Word[] words)
          Prune words to those in a specified word set.
static java.util.List removePhraseSets(java.util.List wordSetList)
          Remove phrase sets from word set list.
static WordSet saveWordSet(javax.swing.JFrame parentWindow, WordGetter wordGetter, ProgressReporter progressReporter)
          Display create/update word set dialog with creation/update.
protected static WordSet[] udosToWordSets(UserDataObject[] udos)
          Copy UserDataObject array to WordSet array.
static boolean updateWordSet(WordSet wordSet, java.lang.String title, java.lang.String description, java.lang.String webPageURL, boolean isPublic)
          Update a word set.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

serialVersionUID

protected static final long serialVersionUID
Serial version UID.

See Also:
Constant Field Values
Constructor Detail

WordSetUtils

protected WordSetUtils()
Don't allow instantiation but do allow overrides.

Method Detail

udosToWordSets

protected static WordSet[] udosToWordSets(UserDataObject[] udos)
Copy UserDataObject array to WordSet array.

Parameters:
udos - Array of user data objects, all actually WordSet objects.
Returns:
Array of WordSet objects.

performBatchInserts

protected static int performBatchInserts(java.lang.String[] inserts)
Perform batch of inserts.

Parameters:
inserts - String array of database insert requests.
Returns:
Number of objects inserted.

persistCounts

protected static boolean persistCounts(WordSet wordSet,
                                       int wordForm,
                                       java.util.Map wordCountMap,
                                       java.util.Map totalWordCountMap,
                                       boolean persistingPhrases,
                                       ProgressReporter progressReporter)
Persist counts for word set map.

Parameters:
wordSet - The word set.
wordForm - The word form type.
wordCountMap - The word count map.
totalWordCountMap - The total word count map.
persistingPhrases - True if the counts are for phrases.
progressReporter - The progress display. May be null.
Returns:
True if all counts persisted.

createCounts

public static java.lang.Object[] createCounts(CountableWordDataCounter wordCounter)
Create count entries for a word set.

Parameters:
wordCounter - Countable word data counter.
Returns:
Three element object array. First element is word count maps. Second element is total count maps. Third element is total number of entries in all maps.

persistWordSetCounts

protected static boolean persistWordSetCounts(WordSet wordSet,
                                              java.util.Map[] wordCountMaps,
                                              java.util.Map[] totalCountMaps,
                                              boolean persistingPhrases,
                                              ProgressReporter progressReporter)
Persist word set counts.

Parameters:
wordSet - The word set.
wordCountMaps - The word count maps.
totalCountMaps - The total count maps.
persistingPhrases - true if persisting phrase counts.
progressReporter - The progress display to update.
Returns:
true if all counts persisted.

addWordSet

protected static WordSet addWordSet(java.lang.String title,
                                    java.lang.String description,
                                    java.lang.String webPageURL,
                                    boolean isPublic,
                                    java.lang.String query,
                                    java.util.Map words,
                                    ProgressReporter progressReporter)
                             throws DuplicateWordSetException,
                                    BadOwnerException
Add a new word set with specified words.

Parameters:
title - Title for the new word set.
description - Description for the new word set.
webPageURL - Web page URL for the new word set.
isPublic - True if word set to be public.
query - Query text which generated the words.
words - Array of CountableWordData entries to add.
progressReporter - Progress reporter.
Returns:
WordSet object if word set added, else null.
Throws:
DuplicateWordSetException - if (title,owner) combination already exists.
BadOwnerException - if the owner null or empty.

addWordSet

protected static WordSet addWordSet(java.lang.String title,
                                    java.lang.String description,
                                    java.lang.String webPageURL,
                                    boolean isPublic,
                                    java.lang.String query)
                             throws DuplicateWordSetException,
                                    BadOwnerException
Add a new word set.

Parameters:
title - Title for the new word set.
description - The description.
webPageURL - The web page URL.
isPublic - True if word set is to be public.
query - The query to generate the words.
Returns:
WordSet object if word set added, else null.
Throws:
DuplicateWordSetException - if (title,owner) combination already exists.
BadOwnerException - if the owner null or empty.

addWordSet

public static WordSet addWordSet(java.lang.String title,
                                 java.lang.String description,
                                 java.lang.String webPageURL,
                                 boolean isPublic,
                                 java.lang.String query,
                                 java.util.Collection words,
                                 ProgressReporter progressReporter)
                          throws DuplicateWordSetException,
                                 BadOwnerException
Add a new word set.

Parameters:
title - Title for the new word set.
description - The description.
webPageURL - The web page URL.
isPublic - True if word set is to be public.
query - The query used to generate the words.
words - Collection of words for the word set.
progressReporter - If not null, used to display progress.
Returns:
WordSet object if word set added, else null.
Throws:
DuplicateWordSetException - if (title,owner) combination already exists.
BadOwnerException - if the owner null or empty.

addWordSet

public static WordSet addWordSet(java.lang.String title,
                                 java.lang.String description,
                                 java.lang.String webPageURL,
                                 boolean isPublic,
                                 java.lang.String query,
                                 Word[] words,
                                 ProgressReporter progressReporter)
                          throws DuplicateWordSetException,
                                 BadOwnerException
Add a new word set.

Parameters:
title - Title for the new word set.
description - The description.
webPageURL - The web page URL.
isPublic - True if word set is to be public.
query - The query used to generate the words.
words - Array of words for the word set.
progressReporter - If not null, used to display progress.
Returns:
WordSet object if word set added, else null.
Throws:
DuplicateWordSetException - if (title,owner) combination already exists.
BadOwnerException - if the owner null or empty.

addWordSetUsingQuery

public static WordSet addWordSetUsingQuery(java.lang.String title,
                                           java.lang.String description,
                                           java.lang.String webPageURL,
                                           boolean isPublic,
                                           WordCounter analysisText,
                                           java.lang.String query,
                                           java.awt.Window parentWindow,
                                           ProgressReporter progressReporter)
                                    throws DuplicateWordSetException,
                                           BadOwnerException
Add a new word set using a specified query.

Parameters:
title - Title for the new word set.
description - Description for the new word set.
webPageURL - Web page URL for the new word set.
isPublic - True if word set to be public.
analysisText - Text set from which to extract words.
query - Query to select words in analysis text.
parentWindow - Parent window.
progressReporter - If not null, used to display progress.
Returns:
WordSet object if word set added, else null.
Throws:
DuplicateWordSetException - if (title,owner) combination already exists.
BadOwnerException - if the owner null or empty.

addWordSet

protected static WordSet addWordSet(java.lang.String title,
                                    java.lang.String description,
                                    java.lang.String webPageURL,
                                    boolean isPublic,
                                    java.lang.String query,
                                    CountableWordDataCounter wordDataCounter,
                                    ProgressReporter progressReporter)
                             throws DuplicateWordSetException,
                                    BadOwnerException
Create word set given words and counts.

Parameters:
title - Title for the new word set.
description - Description for the new word set.
webPageURL - Web page URL for the new word set.
isPublic - True if word set to be public.
query - Query to select words in analysis text.
wordDataCounter - Word data counter.
progressReporter - If not null, used to display progress.
Returns:
WordSet object if word set added, else null.
Throws:
DuplicateWordSetException - if (title,owner) combination already exists.
BadOwnerException - if the owner null or empty.

addWords

protected static boolean addWords(WordSet wordSet,
                                  java.util.Map words,
                                  ProgressReporter progressReporter)
Add words to a word set.

Parameters:
wordSet - The word set.
words - Map with words to add to word set. Map entries are CountableWordData objects.
progressReporter - Progress display to update. May be null.
Returns:
true if words added successfully, else false.

If this method returns false, the caller should delete the word set.


addWords

protected static boolean addWords(WordSet wordSet,
                                  java.lang.String[] wordTags,
                                  java.lang.String[] workTags,
                                  java.lang.String[] workPartTags,
                                  ProgressReporter progressReporter)
Add words to a word set.

Parameters:
wordSet - The word set.
wordTags - Array of word tags.
workTags - Array of work tags.
workPartTags - Array of work part tags.
progressReporter - Progress display to update. May be null.
Returns:
true if words added successfully, else false.

If this method returns false, the caller should delete the word set.


deleteWordSet

public static boolean deleteWordSet(WordSet wordSet)
Delete a word set.

Parameters:
wordSet - The word set to delete.
Returns:
true if word set deleted, false otherwise.

deleteWordSet

public static boolean deleteWordSet(java.lang.String title)
Delete a word set by title.

Parameters:
title - The title of the word set to delete.
Returns:
true if word set deleted, false otherwise. If the word set didn't exist, true is returned.

deleteWordSets

public static boolean deleteWordSets(WordSet[] wordSets)
Delete multiple word sets.

Parameters:
wordSets - The word sets to delete.
Returns:
true if word sets deleted, false otherwise.

deleteWordSets

public static boolean deleteWordSets(WordSet[] wordSets,
                                     ProgressReporter progressReporter)
Delete multiple word sets.

Parameters:
wordSets - The word sets to delete.
progressReporter - A progress reporter.
Returns:
true if word sets deleted, false otherwise.

getWordSet

public static WordSet getWordSet(java.lang.String title,
                                 java.lang.String owner)
Get a word set by title.

Parameters:
title - The title of the word set to fetch.
owner - The owner of the word set to fetch.
Returns:
The word set with the requested title, or null if not found.

getWordSet

public static WordSet getWordSet(java.lang.String title)
Get a word set by title.

Parameters:
title - The title of the word set to fetch.
Returns:
The word set with the requested title, or null if not found.

getWordSets

public static WordSet[] getWordSets()
Get all available public word sets as an array.

Returns:
All available word sets as an array of WordSet.

removePhraseSets

public static java.util.List removePhraseSets(java.util.List wordSetList)
Remove phrase sets from word set list.

Parameters:
wordSetList - The word set list which may also include phrase sets.

getWordSets

public static WordSet[] getWordSets(java.lang.String owner)
Get all available word sets for a specified owner as an array.

Parameters:
owner - The owner.
Returns:
All available word sets as an array of WordSet.

getWordSetsForLoggedInUser

public static WordSet[] getWordSetsForLoggedInUser()
Get all word sets for the logged in user as an array.

Returns:
All word sets for logged in user as an array of WordSet.

getWordSetsCount

public static int getWordSetsCount(java.lang.String owner)
Get count of word sets for a user.

Parameters:
owner - The owner.
Returns:
Count of word sets owned by "owner".

getWordSetsCount

public static int getWordSetsCount()
Get count of all available word sets.

Returns:
Count of all available word sets.

getWords

public static Word[] getWords(WordSet wordSet)
Get all available words in a word set as an array.

Parameters:
wordSet - The word set.
Returns:
All available words in the word set as an array of Word.

Returns null if word set is null.


getWord

public static Word[] getWord(WordSet wordSet)
Get all available words in a word set as an array of Word.

Parameters:
wordSet - The word set.
Returns:
All available words in the word set as an array of Word.

Returns null if word set is null.


updateWordSet

public static boolean updateWordSet(WordSet wordSet,
                                    java.lang.String title,
                                    java.lang.String description,
                                    java.lang.String webPageURL,
                                    boolean isPublic)
                             throws DuplicateWordSetException,
                                    BadOwnerException
Update a word set.

Parameters:
wordSet - The word set to update.
title - Title for the word set.
description - Description for the word set.
webPageURL - Web page URL for the word set.
isPublic - True if word set is public.
Returns:
true if update succeed, false otherwise.
Throws:
DuplicateWordSetException - if new (title,owner) combination already exists.
BadOwnerException - if user is not logged in or is not the word set owner .

We do not provide for updating the words in the word set. You should create a new word set if the words are changed.


getWorkParts

public static WorkPart[] getWorkParts(WordSet wordSet)
Get array of all work parts for a word set.

Parameters:
wordSet - The word set.
Returns:
Array of WorkPart for all worK parts represented in word set.

getWorks

public static Work[] getWorks(WordSet wordSet)
Get array of all works for a word set.

Parameters:
wordSet - The word set.
Returns:
Array of Work for all works represented in word set.

getWordFormCount

public static int getWordFormCount(WordSet wordSet,
                                   int wordForm,
                                   Work work)
Get total word form count for one work represented in a word set.

Parameters:
wordSet - The word set.
wordForm - The word form.
work - The work.
Returns:
Count of the word form in the word set.

getWordFormCount

public static int getWordFormCount(WordSet wordSet,
                                   int wordForm)
Get total word form count in a word set.

Parameters:
wordSet - The word set.
wordForm - The word form.
Returns:
Count of the word form in the word set.

getDistinctWordFormCount

public static int getDistinctWordFormCount(WordSet wordSet,
                                           int wordForm)
Get distinct word form count in a word set.

Parameters:
wordSet - The word set.
wordForm - The word form.
Returns:
Count of the distinct word forms in the word set.

getWordFormCount

public static int getWordFormCount(WordSet wordSet,
                                   Spelling word,
                                   int wordForm)
Get word count in a word set.

Parameters:
wordSet - The word set.
word - The word.
wordForm - The word form.
Returns:
Count of the word form in the word set.

getWordFormCount

public static int getWordFormCount(WordSet wordSet,
                                   Spelling word,
                                   int wordForm,
                                   Work work)
Get word count in a word set for a specific work.

Parameters:
wordSet - The word set.
word - The word.
wordForm - The word form.
work - The work.
Returns:
Count of the word form in the word set.

getWordFormCount

public static java.util.Map getWordFormCount(WordSet[] wordSets,
                                             Spelling[] words,
                                             int wordForm)
Get word count for multiple words in a set of word sets.

Parameters:
wordSets - The word sets.
words - The words.
wordForm - The word form.
Returns:
Map with words as keys and counts of each word in the word sets as values.

getWordFormCount

public static java.util.Map getWordFormCount(WordSet wordSet,
                                             Spelling[] words,
                                             int wordForm)
Get word count for multiple words in a word set.

Parameters:
wordSet - The word set.
words - The words.
wordForm - The word form.
Returns:
Map with words as keys and counts of each word in the word set as values.

getWordCounts

public static java.util.Map getWordCounts(WordSet wordSet,
                                          int wordForm)
Get word counts in a single word set.

Parameters:
wordSet - The word set.
wordForm - The word form to count.
Returns:
Map containing each word in the word set as a key and the count of the word as the value.

getWordCounts

public static java.util.Map[] getWordCounts(WordSet[] wordSets,
                                            int wordForm,
                                            boolean getWorkCounts)
Get word form counts in a set of word sets.

Parameters:
wordSets - The word sets.
wordForm - The word form to count.
getWorkCounts - if true, work counts are returned in the second result map (see below). If false, hashsets of work IDs are returned in the second result map.
Returns:
Array of two maps. The first map contains each word of then specified word form in the word sets as a key and the count of the appearance of the word in the word sets as a value. The second map also has the word as the key. If "getWorkCounts" is true, the values for each word are the counts of the works in which the word appears. If "getWorkCounts" is false, the value is a hash set of the word IDs for each work in which the word appears.

getWordCounts

public static java.util.Map[] getWordCounts(WordSet[] wordSets,
                                            int wordForm)
Get word form counts in a set of word sets.

Parameters:
wordSets - The word sets.
wordForm - The word form to count.
Returns:
Array of two maps. The first map contains each word of then specified word form in the word sets as a key and the count of the appearance of the word in the word sets as a value. The second map also has the word as the key but provides the number of parent works for the word sets in which the word appears as a value.

getWordOccurrences

public static Word[] getWordOccurrences(WordSet wordSet,
                                        int wordForm,
                                        Spelling word)
Get word occurrences for a word in a specified word set.

Parameters:
wordSet - The word set.
wordForm - The word form.
word - The word to look up.
Returns:
Array of Word entries for selected word in word set. The search word is converted to case and diacritical insensitive form before the search.

getSpan

public static Word[] getSpan(WordSet wordSet,
                             Word word,
                             int leftSpan,
                             int rightSpan)
Get surrounding words of a specified word in a word set.

Parameters:
wordSet - The word set.
word - Word for which to get span.
leftSpan - # of words to left of specified word to retrieve.
rightSpan - # of words to right of specified word to retrieve.
Returns:
Span of words in the word set for the specified word.

isWordInWordSet

public static boolean isWordInWordSet(WordSet wordSet,
                                      Word word)
Check if word is in in a specified word set.

Parameters:
wordSet - The word set.
word - Word to look up.
Returns:
true if word is in the word set, else false.

pruneToWordsInWordSet

public static java.util.Set pruneToWordsInWordSet(WordSet wordSet,
                                                  Word[] words)
Prune words to those in a specified word set.

Parameters:
wordSet - The word set.
words - Word entries to look up in word set.
Returns:
Set of words which exist in the word set.

importFromDOMDocument

public static WordSet importFromDOMDocument(org.w3c.dom.Node wordSetNode)
Import a specified word set by name from a DOM document.

Parameters:
wordSetNode - The DOM node which is the root of the word set to import.
Returns:
The imported word set, or null if the import fails.
Throws:
BadOwnerException - if the user is not logged in.

importWordSets

public static WordSet[] importWordSets(org.w3c.dom.Document importDocument)
Import one or more word sets from XML file.

Parameters:
importDocument - The DOM document containing the word sets to import.
Returns:
The imported word sets. May be empty.

Note: The word sets are not persisted here. That is the responsibility of the caller.


loadWordSet

public static WordSet loadWordSet(WordSet wordSet)
Load a word set if it isn't already loaded.

Parameters:
wordSet - The word set to load.

saveWordSet

public static WordSet saveWordSet(javax.swing.JFrame parentWindow,
                                  WordGetter wordGetter,
                                  ProgressReporter progressReporter)
                           throws java.lang.Exception
Display create/update word set dialog with creation/update.

Parameters:
parentWindow - Parent window for dialog.
wordGetter - WordGetter to retrieve list of words to add to word set.
progressReporter - Progress reporter to display progress of save word set operation. May be null.
Returns:
The new or updated word set.
Throws:
java.lang.Exception - if something went wrong.

If ProgressReporter is not null, you should execute this method from a separate thread to ensure the GUI updates while the word set is being saved.