|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectedu.northwestern.at.wordhoard.tools.cm.LineGenerator
public class LineGenerator
Generates WordHoard tagged lines (paragraphs).
| Constructor Summary | |
|---|---|
LineGenerator(XMLWriter out,
java.util.Map posToWordClassMap)
Creates a new line generator. |
|
| Method Summary | |
|---|---|
void |
appendPunctuation(java.lang.String str)
Appends punctuation. |
void |
appendUntaggedWord(java.lang.String str)
Appends an untagged word. |
static int |
getNumBadContractions()
Gets the number of bad contractions. |
static int |
getNumWords()
Gets the number of words generated. |
void |
lineBreak()
Generates a line break. |
void |
parBreak()
Generates a paragraph break. |
void |
popStyle()
Pops the style stack. |
void |
processC(org.w3c.dom.Element el)
Processes a MorphAdorner c element. |
void |
processW(org.w3c.dom.Element el)
Processes a MorphAdorner w element. |
void |
pushStyle(Style style)
Adds a style and pushes it onto the style stack. |
void |
untaggedLine(java.lang.String str)
Generates an untagged line. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public LineGenerator(XMLWriter out,
java.util.Map posToWordClassMap)
out - WordHoard XML output file writer.posToWordClassMap - Map from pos tags to word class tags. | Method Detail |
|---|
public void pushStyle(Style style)
The specified style is added to the current style on the top of the style stack, and the result is pushed onto the style stack.
The indentation level and word styles are cumulative. For example, suppose the current top style is indented 10 pixels and is bold, and a style specifying an indentation of 5 pixels and italic is pushed. The new style is indented 15 pixels and is both bold and italic.
style - Style. public void popStyle()
public void processW(org.w3c.dom.Element el)
MorphAdorner sometimes emits multiple "w" elements for a single word, with the same id. This typically happens with words marked up with multiple styles. For WordHoard, we discard all but the first occurence of words tagged with the same id.
All lemmas are mapped to lower case, to avoid having multiple WordHoard lemmas which are really the same, differing only in case.
el - MorphAdorner w element. public void processC(org.w3c.dom.Element el)
Space characters at the beginning of lines are discarded.
el - MorphAdorner c element. public void appendUntaggedWord(java.lang.String str)
str - Word. public void appendPunctuation(java.lang.String str)
Space characters at the beginning of lines are discarded.
str - Punctuation public void lineBreak()
public void parBreak()
public void untaggedLine(java.lang.String str)
str - Text for line. public static int getNumBadContractions()
public static int getNumWords()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||