< 

Mathematical Functions

< 

Table of Contents

< 

Scripting Example

Utility functions

Table of Contents

Introduction

WordHoard's scripting language includes access to many of the WordHoard built-in utility classes and methods. These include methods for formatting strings and numbers for display, and phonetic encoding, and word stemming.

Formatting numbers and strings

WordHoard provides methods for formatting individual numbers for display.

formatIntegerWithCommas( int i ); Formats an integer i with the default thousands grouping character (a comma in the United States).
Example: s = formatIntegerWithCommas( 1234567 );
returns s = "1,234,567" in many locales.
formatLongWithCommas( long l ); Formats a long integer l with the default thousands grouping character (a comma in the United States).
Example: s = formatLongWithCommas( 1234567 );
returns s = "1,234,567" in many locales.
formatFloat( float f , int d ); Formats a floating point number f with the specified number d of decimal places. The default thousands grouping character is used for the portion of the number (if any) that appears before the radix point.
Example: s = formatFloat( 1450.879f , 2 );
returns "1,450.88" in many locales.
formatDouble( double x , int d ); Formats a double precision floating point number x with the specified number d of decimal places. The default thousands grouping character is used for the portion of the number (if any) that appears before the radix point.
Example: s = formatDouble( 1450.879D , 2 );
returns "1,450.88" in many locales.

WordHoard also provides a C-like sprintf facility for formatting arrays of objects. WordHoard is currently intended to run under Java 1.4 releases as well as later releases, so WordHoard does not use the sprintf formatting facilities introduced in Java 1.5 . Alan Jacobs of Sun Microsystems authored the PrintfFormat class used by WordHoard. The createTable script method uses PrintfFormat format descriptors for the elements stored in the table.

As an example, assume we have a Java object array the first element of which is a string, the second an integer, and the third a double precision floating point number. We can format the object array using PrintfFormat as follows.

        // Create object array to hold three values to format.

stuffToFormat = new Object[ 3 ];

        // Add a string, an integer, and a double to the object array.

stuffToFormat[ 0 ] = "Hello WordHoard World!";
stuffToFormat[ 1 ] = 10000;
stuffToFormat[ 2 ] = 12345.6789;

        // Create a PrintfFormat object to format the entries in
        // stuffToFormat.  We format the string as-is followed by a
        // blank.  We format the integer into a ten-column wide field,
        // and add the thousands grouping symbol (typically a comma) every three
        // digits.  We format the floating point value in a twelve-column
        // wide field with three decimal places and thousands grouping symbols
        // every three digits.

format = new PrintfFormat( "%s %'10d %'12.3f" );

        // Format the entries in stuffToFormat using the format we created.

formattedString = format.sprintf( stuffToFormat );

        // Display the formatted entries.

print( formattedString );

The result is:

Hello WordHoard World!     10,000   12,345.679

The javadoc for PrintfFormat details all the different formatting options. The table below lists the most commonly used format characters.

%d Format item as a decimal integer.
%e Format item as a floating point number with exponential notation.
%f Format item as a floating-point number with a decimal point but no exponent.
%g Format item as a floating point number using either the %f or %e format, whichever appears to be better.
%o Format item as an octal integer.
%s Format a string or a single character as a string.
%x Format item as a hexadecimal integer.
%% Output the '%' character.

You can add a minimum field width and, for floating point numbers, the number of decimal places. For example, "%10i" requests an integer be formatted in a field at least ten columns wide. "%12.3f" requests a floating point value be formatted in a field at least twelve columns wide with three decimal digits following the radix point. A single quote "'" before the field width requests thousands grouping before the radix point, using the default thousands grouping character.

Specifing a negative field width such as "%-10i" left-justifies the formatted value within the output field. Usually the formatted value appears right-justified. A leading 0 in the field width such as "%010i" adds leading zeros to fill blank spaces at the left of the converted number. A leading '+' before the field width such as "%+10i" adds a leading plus sign to the output for positive values.

Phonetic Encoding

Soundex encodes words using a simple model based upon the approximate sound of the word as pronounced in American English. The Soundex encoding is a four character string in which the first character is an uppercase letter and the remaining characters are digits. Soundex was originally intended only for encoding proper last names, but occasionally finds other uses as well. The Soundex algorithm was devised and patented by Margaret K. Odell and Robert C. Russell in 1918.

The obtain the Soundex encoding of a string in WordHoard, use the soundex method. For example, to obtain the Soundex encoding for the name Burns, use:

soundexEncoding = soundex( "Burns" );

The resulting soundex value for Burns is B652.

The Double Metaphone algorithm is another phonetic coding algorithm which encodes English words -- and foreign words heard frequently in the United States -- into a string consisting of twelve consonant sounds. The double metaphone values can be used to group words with similar sounds and handle various types of spelling correction. Double Metaphone improves over the earlier Metaphone algorithm by returning two possible encodings should a word appear to have two feasible pronunciations, such as a foreign word. Double Metaphone was designed primarily for use with proper names, but it has proved useful in more general contexts as well. The Double Metaphone algorithm was proposed by Lawrence Philips.

The simplest way to use Double Metaphone in a WordHoard script is to invoke the doubleMetaphone( String s ) method. For example:

metaphoneEncoding = doubleMetaphone( "happy" );

The value of metaphoneEncoding is the primary metaphone encoding, which in this case is the two-letter string HP. If you want to retrieve both the primary and alternate metaphone strings, instantiate a DoubleMetaphone object and use the following methods to retrieve the primary and alternate encodings. You should also use this approach if you intend to find the double metaphone encodings for a large number of strings.

        // Create a Double Metaphone encoder.

metaphone = new DoubleMetaphone();

        // Encode the string "smith".  This returns the primary encoding.

encoding = metaphone.encode( "smith" );

        // Explicitly retrieve the primary encoding.

primaryEncoding = metaphone.getPrimary();

        // Get the alternate encoding.

alternateEncoding = metaphone.getAlternate();

        // Print the encodings.

print( "Primary encoding for smith   : " + primaryEncoding );
print( "Alternate encoding for smith : " + alternateEncoding );

Executing these script statements yields the output below. For "smith" the primary encoding is SM0, while the alternate encoding is XMT.

Primary encoding for smith   : SM0
Alternate encoding for smith : XMT

Table output

You can create a standard WordHoard output table to display results from your script using the createTable script method. Several of the scripting examples use createTable in conjunction with addResults to display results in a new WordHoard window.

To use createTable you must store each row of table data you wish to display in a SortedTableModelRow object. You create such an object as follows:

rowData = new SortedTableModelRow( Object[] data );

Each entry in the data array must implement the standard Java Comparable interface, which allows Java to sort the values properly. Most common data value types implement Comparable: String, Double, Integer, Long, etc. The value of row[ 0 ] must uniquely identify the row. If the values in another column uniquely identify the row, use:

rowData = new SortedTableModelRow( Object[] data , int uniqueIDColumn );

The uniqueIDColumn value specifies the column which uniquely identifies the row data.

You may also use a separate value to identify each row uniquely. Use:

rowData = new SortedTableModelRow( Object[] data , Object uniqueIDValue );

The uniqueIDValue value uniquely identifies the row.

If your data consists of a unique row label (String) as the first column, and an array of double values, you may use a FrequencyAnalysisDataRow object, which is a specialized version of a SortedTableModelRow. You create a FrequencyAnalysisDataRow as follows:

rowData = new FrequencyAnalysisDataRow( String rowLabel , double[] data );

where rowLabel is a string identifying the row of data and data is an array of double precision floating point numbers. Collect all of the FrequencyAnalysisDataRow objects in a standard Java ArrayList. The following example creates three rows of data and adds them to an ArrayList.

        //  Create an ArrayList to hold the data rows.

dataList    = new ArrayList();

        //  Create three rows of data and add them
        //  to the data list.  We will add a number
        //  and its square as two data values.


data        = new double[ 2 ];

        //  First data row ...

data[ 0 ]   = 2;
data[ 1 ]   = data[ 0 ] * data[ 0 ];

dataList.add( new FrequencyAnalysisDataRow( "One" , data ) );

        //  Second data row ...

data[ 0 ]   = 5;
data[ 1 ]   = data[ 0 ] * data[ 0 ];

dataList.add( new FrequencyAnalysisDataRow( "Two" , data ) );

        //  Third data row ...

data[ 0 ]   = 10;
data[ 1 ]   = data[ 0 ] * data[ 0 ];

dataList.add( new FrequencyAnalysisDataRow( "Three" , data ) );

We can now create a table to display these three rows of data using the createTable method. The resulting table is an extended version of a standard Java Swing JTable called an XTable. Please see the javadoc for XTable for more details.

                                //  Create output table.
table =
    createTable
    (
                                //  Title for table.
                                //  Used for printing purposes.
        "My table" ,
                                //  Table column titles.
        new String[]
        {
            "Label" ,
            "x" ,
            "x squared"
        } ,
                                //  Column formats are given as
                                //  "printf"-like specifications.
                                //
                                //  "%s" formats a string as is.
                                //
                                //  %26.0f formats a floating point
                                //  number in a field 26 characters wide
                                //  with no decimal places.
                                //
                                //  The "'" before the 26 says we want
                                //  the numbers displayed with a
                                //  thousands grouping character
                                //  (in the U.S., this is a comma).
        new String[]
        {
            "%s" ,
            "%'26.0f" ,
            "%'26.0f"
        } ,
                                //  ArrayList containing results
                                //  we created above.
        dataList ,
                                //  Sort results by this column.
                                //  Column indices start at 0.
                                //  Column 0 is the row labels column.
                                //  Column 1 is the "x" values.
                                //  Column 2 is the "x squared" values.
                                //  We sort by the value of "x".
                                //  To leave the columns unsorted,
                                //  set to -1.
        1 ,
                                //  Sort in ascending order.  Set to
                                //  false to sort in descending order.
        true
    );

                                //  Create a new WordHoard window
                                //  to display the table.

    addResults( "My window title" , "My header" , table );

The resulting WordHoard window looks like this.

Sample table output

You can print this table by selecting "Print" from the "File" menu. You can save the table output to a file by selecting "Save as" from the File menu. You can use the "Select all" and "Copy" commands to copy the table cells to the system clipboard for export to other programs.

Text file input and output

WordHoard scripting provides methods for reading the contents of a text file into a string and writing the contents of a string to a file. You can use standard Java I/O facilities to perform these operations as well.

result = readTextFile( fileName , encoding ); Reads the content of the text file named fileName, encoded in the encoding character set, as a Java string and returns it as the result. Should any error occur while attempting to read the file, the standard IOException condition is raised. You should check for this condition and handle it appropriately. You should select the proper encoding based upon the contents of the file. If you specify the empty string as the encoding, a default encoding is used which may be inappropriate.
writeTextFile( fileName , append, contents, encoding ); Writes the content of the string contents to the text file named fileName, encoded in the encoding character set. Set append to true to append the contents to the existing data in the file. Set append to false to overwrite the existing contents. Should any error occur while attempting to read the file, the standard IOException condition is raised. You should check for this condition and handle it appropriately. Most of the time you should choose "UTF8" as the encoding since that is how WordHoard stores its own data.

The following example creates five lines of text, writes them to a file, reads them back into a string, and prints them.

        //  Get the end of line character(s) for the current system.
        //  Typically this is an Ascii line feed for Unix systems
        //  and an Ascii line feed, Ascii carriage return pair for
        //  Windows system.  Java will generally read a text file
        //  properly even if the end of line characters in the file
        //  do not match those for the current system.

String eolChars = System.getProperty( "line.separator" );

        //  Create a few lines of text.  We accumulate them in
        //  a StringBuffer since this is more efficient than
        //  using a String.

StringBuffer sb = new StringBuffer();

sb.append( "To be, or not to be: that is the question:" );
sb.append( eolChars );

sb.append( "Whether 'tis nobler in the mind to suffer" );
sb.append( eolChars );

sb.append( "The slings and arrows of outrageous fortune," );
sb.append( eolChars );

sb.append( "Or to take arms against a sea of troubles," );
sb.append( eolChars );

sb.append( "And by opposing end them?" );
sb.append( eolChars );

        //  Write the lines to a text file.  We enclose the use of
        //  writeTextFile in a "try" block in order to capture
        //  any errors which may occur when writing the file.  The
        //  variable "ok" remembers if the write operation succeeded
        //  or not.

ok  = true;

try
{
    writeTextFile( "/soliloquoy.txt" , false , sb.toString() , "utf8" );
}
catch ( IOExceptione e )
{
        //  When the write fails, we set "ok" to false and
        //  print the reason for the failure.

    ok  = false;
    print( "Could not write output file: " + e.getString() );
}

        //  If the write operation succeeded, read the the text
        //  we just wrote into a string, and print the string.
        //  If the read fails, we print an error message.

if ( ok )
{
    try
    {
        s   = readTextFile( "/soliloquoy.txt" , "utf8" );

        print( s );
    }
    catch ( Exception e )
    {
        print( "Could not read file: " + e.getString() );
    }
}

Word stemming

A stemmer determines a stem form of a given inflected word form. For most purposes, the stem need not be a valid morphological root of the word, but related words should map to the same stem. For example, the words "talk", "talking", and "talked" should all map to the same stem, "talk."

WordHoard provides implementations of two popular stemmers for English: the Porter Stemmer, written by Martin Porter, and the Lancaster stemmer, written by Chris D. Pace and colleagues.

To use the Porter stemmer in a script, instantiate a PorterStemmer object and call the stem method with the word whose stem you want as an argument. We will try the stemmer on several variants of the word "talk."

        // Create a Porter Stemmer.

stemmer = new PorterStemmer();

        // Get the stem for a number of words related to "talk".

stem1   = stemmer.stem( "talk" );
stem2   = stemmer.stem( "talking" );
stem3   = stemmer.stem( "talked" );
stem4   = stemmer.stem( "talker" );

        // Output the stems.

print( "Stem for talk     : " + stem1 );
print( "Stem for talking  : " + stem2 );
print( "Stem for talked   : " + stem3 );
print( "Stem for talker   : " + stem4 );

Executing these script statements yields the output below. The values for stem1 through stem3 are all the same: talk. The stem for "talker" is talker instead of talk. While perhaps surprising, this is correct according to Porter's stemming algorithm.

Stem for talk     : talk
Stem for talking  : talk
Stem for talked   : talk
Stem for talker   : talker

To use the Lancaster stemmer in a script, instantiate a LancasterStemmer object and call the stem method with the word whose stem you want as an argument. Again we use the stemmer on variants of the word "talk."

        // Create a Lancaster Stemmer.

stemmer = new LancasterStemmer();

        // Get the stem for a number of words related to "talk".

stem1   = stemmer.stem( "talk" );
stem2   = stemmer.stem( "talking" );
stem3   = stemmer.stem( "talked" );
stem4   = stemmer.stem( "talker" );

        // Output the stems.

print( "Stem for talk     : " + stem1 );
print( "Stem for talking  : " + stem2 );
print( "Stem for talked   : " + stem3 );
print( "Stem for talker   : " + stem4 );

Executing these script statements yields the output below. The values for stem1 through stem3 are all the same: talk. Note that the stem for "talker" is talk which differs from the value returned by Porter's stemming algorithm.

Stem for talk     : talk
Stem for talking  : talk
Stem for talked   : talk
Stem for talker   : talk

< 

Mathematical Functions

< 

Table of Contents

< 

Scripting Example