edu.northwestern.at.wordhoard.tools.cm
Class LineGenerator

java.lang.Object
  extended by edu.northwestern.at.wordhoard.tools.cm.LineGenerator

public class LineGenerator
extends java.lang.Object

Generates WordHoard tagged lines (paragraphs).


Constructor Summary
LineGenerator(XMLWriter out, java.util.Map posToWordClassMap)
          Creates a new line generator.
 
Method Summary
 void appendPunctuation(java.lang.String str)
          Appends punctuation.
 void appendUntaggedWord(java.lang.String str)
          Appends an untagged word.
static int getNumBadContractions()
          Gets the number of bad contractions.
static int getNumWords()
          Gets the number of words generated.
 void lineBreak()
          Generates a line break.
 void parBreak()
          Generates a paragraph break.
 void popStyle()
          Pops the style stack.
 void processC(org.w3c.dom.Element el)
          Processes a MorphAdorner c element.
 void processW(org.w3c.dom.Element el)
          Processes a MorphAdorner w element.
 void pushStyle(Style style)
          Adds a style and pushes it onto the style stack.
 void untaggedLine(java.lang.String str)
          Generates an untagged line.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LineGenerator

public LineGenerator(XMLWriter out,
                     java.util.Map posToWordClassMap)
Creates a new line generator.

Parameters:
out - WordHoard XML output file writer.
posToWordClassMap - Map from pos tags to word class tags.
Method Detail

pushStyle

public void pushStyle(Style style)
Adds a style and pushes it onto the style stack.

The specified style is added to the current style on the top of the style stack, and the result is pushed onto the style stack.

The indentation level and word styles are cumulative. For example, suppose the current top style is indented 10 pixels and is bold, and a style specifying an indentation of 5 pixels and italic is pushed. The new style is indented 15 pixels and is both bold and italic.

Parameters:
style - Style.

popStyle

public void popStyle()
Pops the style stack.


processW

public void processW(org.w3c.dom.Element el)
Processes a MorphAdorner w element.

MorphAdorner sometimes emits multiple "w" elements for a single word, with the same id. This typically happens with words marked up with multiple styles. For WordHoard, we discard all but the first occurence of words tagged with the same id.

All lemmas are mapped to lower case, to avoid having multiple WordHoard lemmas which are really the same, differing only in case.

Parameters:
el - MorphAdorner w element.

processC

public void processC(org.w3c.dom.Element el)
Processes a MorphAdorner c element.

Space characters at the beginning of lines are discarded.

Parameters:
el - MorphAdorner c element.

appendUntaggedWord

public void appendUntaggedWord(java.lang.String str)
Appends an untagged word.

Parameters:
str - Word.

appendPunctuation

public void appendPunctuation(java.lang.String str)
Appends punctuation.

Space characters at the beginning of lines are discarded.

Parameters:
str - Punctuation

lineBreak

public void lineBreak()
Generates a line break.


parBreak

public void parBreak()
Generates a paragraph break.


untaggedLine

public void untaggedLine(java.lang.String str)
Generates an untagged line.

Parameters:
str - Text for line.

getNumBadContractions

public static int getNumBadContractions()
Gets the number of bad contractions.

Returns:
Number of bad contractions.

getNumWords

public static int getNumWords()
Gets the number of words generated.

Returns:
Number of words generated.