WordHoard - Version History

Table of Contents

Version History

Version 1.4.4. March 1, 2011.

We added line numbers to the Early Modern Drama texts.

You may now add the publication date for a work to each of the table of contents displays.

Version 1.4.3. January 26, 2011.

A few minor revisions were made to the NUPos tag set.

Lemma, rather than spelling, is now the default word form in the "Analysis" menu procedures.

The homonym markers have been restored to the Shakespeare corpus in the main site. Homonym markers are not present in the Early Modern Drama corpus.

The morphological adornments in the Early Modern Drama site have been extensively revised by hand. The adornments have been removed from paratext such as stage directions and front and back matter.

Proper nouns now have their own major word class, "name". This allows you to display a table of (single word) proper names using the Lexicon facility.

Version 1.4.2. February 1, 2010.

The word class is now displayed in a separate column, rather than joined to the spelling or lemma, in the following analyses:

Create Word Form List

Find Collocates

Find Multiword Units

Compare Single Word Form

Compare Many Word Forms

Compare Collocates

The column titles were not matched correctly with the data in columns when saving the contents of a table to a file. This has been corrected.

".tab" is now used as the extension used for tab-separated file output rather than ".tsv".

Version 1.4.1. January 18, 2010.

We fixed a problem with the lemma tagging on the main public site. In WordHoard, lemma tags are case-sensitive, and in version 1.4 some of the lemmas were inconisistently tagged. For example, the lemma "i (pn)" was sometimes tagged as "i (pn)" (lower case) and sometimes as "I (pn)" (upper case). In version 1.4.1 all lemmas are tagged consistently. For example, there is now only a single lemma "i (pn)" (lower case).

Version 1.4. December 17, 2009.

The tagging data for the Early Modern English Drama (EMD) site has been greatly improved. Many errors have been corrected.

All non-dramatic texts, including the works of Chaucer and Spenser, as well as Shakespeare's poems, have been removed from the EMD site. They are now available only on the main public site.

We dropped the Nineteenth Century Fiction site.

The NUPOS tagset has been revised. There are two significant changes from the previous version. Forms of the verb 'be' have been tagged in a more granular fashion recognizing the morphological differences between 'art' and 'beest' or 'wast' and 'wert'. Words beginining with the prefix un- have been lemmatized as negative forms of their positive roots. For instance, 'unnatural' is the negative form (j-u) of 'natural' (j).

We introduced a new tool named "BuildWorkSets" that is used to create the built-in (pre-defined) system work sets. The tool uses a new data input file named "work-sets.xml".

Version 1.3.2. July 28, 2009.

Apple introduced a nasty bug in their recent Java security update version 1.5.0_19 which caused the WordHoard menu bar at the top of the screen to get truncated after some dialogs and alerts were dismissed. This version of WordHoard introduces a workaround for the bug.

Version 1.3.1. April 1, 2009.

The length of the drop-down lists for selecting analysis and reference texts has been increased in several dialogs.

Version 1.3. December 30, 2008.

WordHoard now presents a "site selection" dialog on startup from which you select the site you want to use. There is now only a single binary distribution of the WordHoard client.

There are three sites:

Main Public Site: This site contains Early Greek Epic and the works of Shakespeare, Spenser and Chaucer. It is open to the world.

Nineteenth Century Fiction Site: This site contains the works of Shakespeare, Spenser and Chaucer, plus 250 British novels from 1780 to 1900. It is accessible only at Northwestern University.

Early Modern English Drama Site: This site contains approximately 330 Early Modern English plays including all of Shakespeare. It also includes the works of Chaucer and Spenser. It is accessible only at Northwestern University and at other institutions that are subscribers to the Text Creation Partnership (TCP).

The NCF and EMD drama texts are new.

We added simple tag clouds to provide visual displays of the results of several analysis types.

We added documentation for the Find Collocates and Find Multiword Units analyses.

We changed the lemmatization of personal possessive pronouns.

There is a new "Display authors" option in the table of contents view for corpora which have more than one author.

Version 1.2.9. September 2, 2008.

We upgraded to a new version of NUPOS, the part of speech tagset. Separate tags have been created for each morphological form of "be", "have", and "do", which now appear with the prefixes "vb", "vh", and "vd", where previously they were grouped under "va" (verb auxiliary). In addition, some inconsistencies from the old tagset have been corrected and there are some new tags for rare phenomena such as repurposing conjunctions as nouns ("n2-acp").

Version 1.2.8. May 7, 2008.

We corrected some tagging errors and made some structural changes to the new works by Spenser that were added in version 1.2.7.

Version 1.2.7. April 22, 2008.

We added ten more works by Edmund Spenser.

Version 1.2.6. February 15, 2008.

Shakespeare's The Rape of Lucrece did not properly appear in the "By Genre" table of contents view.

Version 1.2.5. January 3, 2008.

Accounts with the "canManageAccounts" privilege can now view, edit, or delete any user annotation.

Version 1.2.4. December 21, 2007.

We corrected some Java 6 specific compilation problems in the source code.

We corrected Print Preview for Windows and Linux.

We added a "nar()" term to the corpus query language to allow searching for words in narrative or speech.

We corrected some problems with creating and editing saved word and work sets.

We corrected the handling of the "Break down by work parts" option for work sets in the "Compare Texts" procedure.

Version 1.2.3. November 30, 2007.

We made several changes to the way the public version of WordHoard is deployed on the host wordhoard.northwestern.edu.

Version 1.2.2. October 15, 2007.

We fixed some bugs in the speaker tagging data for King Henry IV Part 1

We fixed a bug in the "by author" table of contents view that caused some works to not appear in the table of contents.

Version 1.2.1. September 21, 2007.

We fixed a number of bugs in the tagging data.

We made some special-case changes to books 9-12 of The Odyssey. The extended narration by Odysseus in these books is now treated as narration rather than speech. In the "More" tab of the "Get Info" window for words in these books, the speaker is now identified as "Spoken by the poet or by Odysseus as narrator." Speeches by other characters within this narration are still tagged as speeches in the usual way.

In Greek text, the sigma character (σ) is no longer considered to be the case and diacritical-insensitive version of the terminal sigma character (ς).

The remaining changes are all for developers and text developers.

There is a new "by author" table of contents view that may be used by text developers. See The Corpora XML File for details.

Long work and work part titles are now truncated to 50 characters.

Text lines can now be right justified. See Work XML Files for details.

Individual words within tagged lines of text can now be marked as "untagged". See Work XML Files for details.

Temporary files created by the various build programs are now created inside a subdirectory named temp of the WordHoard development directory, rather than in the system's default temporary file directory (e.g., /tmp on many UNIX systems.)

There is a new experimental tool named ConvertMorph and a new shell script named convert-morph.csh which are useful for converting TEI and other XML files which have been tagged by the MorphAdorner tool into WordHoard ingest format.

Version 1.2. June 12, 2007.

We fixed a bug in the "Find Lemmata" command which computed and displayed incorrect document frequency counts in the results panel.

We also fixed a few minor errors in the tagging data.

Version 1.2b1. May 23, 2007.

In this major new release we have switched to a new part of speech tagset named "NUPOS". In NUPOS, parts of speech are characterized by, and are largely defined by, how they use the grammatical categories of syntax, tense, mood, voice, case, gender, person, number, degree, and negative. This makes the Greek and English part of speech taxonomies more similar and uniform.

There is now a single "Parts of Speech" window instead of separate windows for English and Greek. The window has new controls at the top for showing all the parts of speech or only the English or Greek parts of speech, and for ordering the displayed parts of speech by the various category values.

We also fixed a few minor errors in the texts.

Version 1.1.9. January 26, 2007.

We fixed an error that incorrectly tagged about 5,000 words as having the lemma "cold (aj)".

Version 1.1.8. January 23, 2007.

We made several text and annotation formatting improvements to Shepheardes Calender.

Version 1.1.7. December 22, 2006.

We added the new work Shepheardes Calender to the Spenser corpus, along with the accompanying E. K. annotations.

Text developers may now use the indent attribute to apply additional indentation to invdividual lines of text. For static annotations, the rend, align, and indent attributes may also now be used with p elements.

Version 1.1.6. October 27, 2006.

Word info windows no longer display standard spellings. There were a number of problems with our notion of "standard spelling", enough so that we judged this feature to be less than useful.

We fixed a bug that in some circumstances could cause unexpected errors in Lemma search result windows.

We fixed a bug that in some circumstances could casue unexpected errors when doing searches or analyses with large word sets.

We fixed a bug with lemma searches which have collection frequency criteria.

We fixed a bug that caused analysis procedures to return incorrect results if capital letters were used in lemma and spelling parameters.

We fixed a bug that failed to display some of the morphology information for English parts of speech in word info windows.

We improved the performance of searches and other operations involving word and work sets. Some searches which used to take many minutes now take only a few seconds, or even less than a second.

The scenes of Act 3 of Twelfth Night or What You Will were numbered incorrectly. This bug has been fixed.

The last speech of King Lear was incorrectly attributed to Albany rather than Edgar. This bug has been fixed.

We added new chapters on Scripting to this user manual.

Version 1.1.5. September 7, 2006.

For Mac users who are using Apple's J2SE 5.0 Release 4 (Java version 1.5.0_06) or later, WordHoard's menu bar now once again appears at the top of the screen rather than at the top of each window. We were able to restore this feature thanks to a new release of Hibernate which fixed a serious bug. The new Hibernate release also makes the program a little bit faster and fixes problems on some kinds of Linux systems.

Version 1.1.4. August 3, 2006.

This version of WordHoard introduces improved support for user annotations. Users with accounts may attach annotations to passages of text. Annotations may be private or shared with other users. For details, see the chapter on Annotations.

System administrators may now create and maintain user groups. Users who create annotations may extend read access for their annotations to any group or groups of which they are a member. For details, see the chapter on Managing Accounts.

User annotations are now implemented as an integeral part of WordHoard, rather than in a separate server. They use the same internal architecture that is used for saving user-defined word and work sets.

Version 1.1.3. July 27, 2006.

In version 1.1.2, the change made to fix serious problems on Intel Macintoshes and on some Linux systems unfortunately caused the "WordHoard" menu on Macintoshes to be improperly named "com.sun.javaws.Main", which was very confusing. To fix this problem, we have moved the menu bar on the Macintosh back from the top of the screen to inside each window. This reverses a change we made in version 1.1.1.

Version 1.1.2. July 25, 2006.

We fixed a bug which caused Early Greek Epic work windows to fail to open properly on Apple's Intel-based Macintosh systems, and which caused a variety of failures on some Intel-based Linux systems.

We fixed a bug which caused unexpected errors on some text copy operations.

We added additional script interfaces to expose more of WordHoard's utility functions.

We improved the performance of "Find collocates" in the Analysis menu. Finding collocates runs about two to three times faster than before. Finding collocate contexts runs ten to fifty times faster than before.

We corrected a bug in the WordHoard corpus query language which prevented regular expressions from working in CQL terms.

The WordHoard corpus query language now allows you to search by word class, major word class, and part of speech. The search term "pos" which used to refer to major word class now refers to part of speech. The search term "wc" is now used for word class, and "mwc" for major word class.

The WordHoard corpus now allows you to search for patterns of adjacent words. You can use such queries to create phrase lists for processing in scripts.

Version 1.1.1. June 9, 2006.

This version introduces "short work titles." Each work now has both a full title (e.g., "Hamlet, Prince of Denmark") and a short title (e.g., "Hamlet"). Short titles are used in concordances and other contexts.

You can now use the wild card character "*" in spelling criteria in the "Find Words" dialog. For example, searching for the spelling "pardon*" finds the words "pardon" (both the noun and the verb), "pardoned", "pardons", etc.

For Mac users who are using Apple's recently released J2SE 5.0 Release 4 (Java version 1.5.0_06) (or a later future version), WordHoard's menu bar now appears at the top of the screen, rather than at the top of each window. This makes WordHoard look and behave like other Mac OS X applications. Apple finally fixed a bug in their implementation of Web Start that makes this change possible in WordHoard. With this change, the About and Quit commands now appear only in the application menu named "WordHoard." They no longer appear in the "File" menu. For Mac users who are using older versions of Java, the menu bar still appears at the top of each window, as in previous versions of WordHoard.

We fixed a bug which caused scripts to fail to compile in the Web Start environment.

We fixed a bug in scripting computations.

We now compress all of the network data traffic between WordHoard and the MySQL database server. This improves the performance of the program, especially on slower network connections.

We made several improvements to decrease the time required to open and display the analysis and Open Work Set dialogs.

Calculator windows are now closed in many contexts when they are not needed (e.g., after running an analysis).

Version 1.1. May 17, 2006.

This is the "developer release" of WordHoard.

The source code for WordHoard is now available under the GNU General Public License. See the Files and Setup chapter in the "Notes for Developers" section to get a copy of the source code.

We fixed a bug which caused garbage to appear in the Calculator window "Edit" menu.

This version also fixes the "Collection Frequency" constraint used in "Find Words" and "Find Lemmata," and a bug in the behavior of the "Copy" menu item for the lemma search results panel.

Version 1.0.4. May 12, 2006.

Saved word sets, work sets, and queries are now written to the WordHoard database by the WordHoard server rather than by the client. This improves the security of the program, and it makes it possible for us to offer scholars accounts on our NU WordHoard system. To request an account, send email to Martin Mueller.

If you are logged in, you can now save the result of a "Find Words" command as a word set.

We added a new "Collection frequency" search criterion to the "Find Words" and "Find Lemmas" commands.

Table of contents views for corpora are now defined in the corpora.xml file. See the Corpora XML File chapter in the "Adding New Texts" section for details. There is now no corpus-specific code left in the WordHoard client program. All features are data driven by definitions supplied by text developers in the various XML data source files.

In Spenser's Farie Queene, proper names are italicized in the text. In concordances, lines containing such names had the punctuation between words scrambled. In addition, if you tried to get information for a word following such a name on the same line, you would erroneously get information for the wrong word (an earlier word on the same line), or get an alert claiming that "no information is available." Both of these bugs have been fixed.

Chaucer's Book of the Duchess was incorrectly titled "Book of the Dutchess." This bug has been fixed.

We fixed a few spelling errors in labels and error messages.

We now use zip archives instead of tarballs to distribute various sets of files to developers.

We added this "Version History" chapter to the user manual, along with a new chapter for developers on the changes we had to make to the open source Hibernate persistence product.

Version 1.0.3. April 21, 2006.

We fixed a bad bug in the tagging data in Spenser that was introduced in the changes to the database made in the April 19 release. Approximately 3,000 occurrences of the lemma "come" were improperly tagged. Only the database was changed, not any of the code in the program. Note that the version number of the program was not changed - it remains version 1.0.3.

Version 1.0.3. April 19, 2006.

King Richard the Second had Act 5, Scene 3 incorrectly titled Act 5, Scene 2. This bug has been fixed. This version also includes a number of fixes for the tagging data.

We made another internal improvement to the connection pool provider.

Version 1.0.2. April 14, 2006.

We made some improvements to the new connection pool provider, and added some debugging and logging code to try to track down another bug.

Version 1.0.1. April 12, 2006.

This version introduces a new internal "Hibernate connection pool provider" which should help eliminate most of the network failure unexpected error messages some users experience.

We also fixed bugs which could cause unexpected error messages when trying to type to select lemmas in lexicon windows in some rare contexts, and when clicking on an empty user text annotation.

Version 1.0. March 31, 2006.

The first public release of WordHoard to users in the non-Northwestern University community, and the unveiling of the new wordhoard.northwestern.edu web site.

Table of Contents