edu.northwestern.at.wordhoard.tools.cm
Class LineGenerator

java.lang.Object
  extended by edu.northwestern.at.wordhoard.tools.cm.LineGenerator

public class LineGenerator
extends java.lang.Object

Generates WordHoard tagged lines (paragraphs).


Constructor Summary
LineGenerator(XMLWriter out, java.util.Map posToWordClassMap, Rules rules, java.lang.String fullWorkTag)
          Creates a new line generator.
 
Method Summary
 void appendPunctuation(java.lang.String str)
          Appends punctuation.
 void appendUntaggedWord(java.lang.String str, boolean isVerse)
          Appends an untagged word.
 void endElement(java.lang.String name)
          Emit end tag for element.
static int getNumBadContractions()
          Gets the number of bad contractions.
static int getNumWords()
          Gets the number of words generated.
 void incDivCount()
          Increment div count.
 void lineBreak()
          Generates a line break.
 void normalizedText(java.lang.String str)
          Generates normalized plain text.
 void parBreak()
          Generates a paragraph break.
 void popStyle()
          Pops the style stack.
 void processC(org.w3c.dom.Element el)
          Processes a MorphAdorner c element.
 void processGap(org.w3c.dom.Element el)
          Processes a gap element.
 void processW(org.w3c.dom.Element el)
          Processes a MorphAdorner w element.
 void pushStyle(Style style)
          Adds a style and pushes it onto the style stack.
static void resetDivCount()
          Reset div count to zero.
 void startElement(java.lang.String name)
          Emit start tag for element.
 void untaggedLine(java.lang.String str)
          Generates an untagged line.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LineGenerator

public LineGenerator(XMLWriter out,
                     java.util.Map posToWordClassMap,
                     Rules rules,
                     java.lang.String fullWorkTag)
Creates a new line generator.

Parameters:
out - WordHoard XML output file writer.
posToWordClassMap - Map from pos tags to word class tags.
rules - Rules.
fullWorkTag - Full work tag for line IDs.
Method Detail

pushStyle

public void pushStyle(Style style)
Adds a style and pushes it onto the style stack.

The specified style is added to the current style on the top of the style stack, and the result is pushed onto the style stack.

The indentation level and word styles are cumulative. For example, suppose the current top style is indented 10 pixels and is bold, and a style specifying an indentation of 5 pixels and italic is pushed. The new style is indented 15 pixels and is both bold and italic.

Parameters:
style - Style.

popStyle

public void popStyle()
Pops the style stack.


resetDivCount

public static void resetDivCount()
Reset div count to zero.


incDivCount

public void incDivCount()
Increment div count.


processW

public void processW(org.w3c.dom.Element el)
Processes a MorphAdorner w element.

MorphAdorner sometimes emits multiple "w" elements for a single word, with the same id. This typically happens with words marked up with multiple styles. For WordHoard, we discard all but the first occurence of words tagged with the same id.

All lemmas are mapped to lower case, to avoid having multiple WordHoard lemmas which are really the same, differing only in case.

Parameters:
el - MorphAdorner w element.

processC

public void processC(org.w3c.dom.Element el)
Processes a MorphAdorner c element.

Space characters at the beginning of lines are discarded.

Parameters:
el - MorphAdorner c element.

processGap

public void processGap(org.w3c.dom.Element el)
Processes a gap element.

Parameters:
el - MorphAdorner c element.

startElement

public void startElement(java.lang.String name)
Emit start tag for element.

Parameters:
name - The element name.

endElement

public void endElement(java.lang.String name)
Emit end tag for element.

Parameters:
name - The element name.

appendUntaggedWord

public void appendUntaggedWord(java.lang.String str,
                               boolean isVerse)
Appends an untagged word.

Parameters:
str - Word.
isVerse - True if word in verse.

appendPunctuation

public void appendPunctuation(java.lang.String str)
Appends punctuation.

Space characters at the beginning of lines are discarded.

Parameters:
str - Punctuation

lineBreak

public void lineBreak()
Generates a line break.


parBreak

public void parBreak()
Generates a paragraph break.


untaggedLine

public void untaggedLine(java.lang.String str)
Generates an untagged line.

Parameters:
str - Text for line.

normalizedText

public void normalizedText(java.lang.String str)
Generates normalized plain text.

Parameters:
str - Text to generate.

getNumBadContractions

public static int getNumBadContractions()
Gets the number of bad contractions.

Returns:
Number of bad contractions.

getNumWords

public static int getNumWords()
Gets the number of words generated.

Returns:
Number of words generated.