edu.northwestern.at.wordhoard.tools.filters
Class Filter01

java.lang.Object
  extended by edu.northwestern.at.wordhoard.tools.filters.Filter01

public class Filter01
extends java.lang.Object

Applies fixers.

Runs the fixers, pretty-prints the output, eliminates unused elements and attributes, eliminates punctuation elements, and fixes title page text.

Usage:

Filter01 in out corpus-tag work-tag [fixers]

in = Path to TEI XML input file for a work.

out = Path to TEI XML output file for a work.

corpus-tag = Corpus tag (e.g., "sha" for Shakespeare).

work-tag = Work tag (e.g., "ham" for Hamlet).

fixers = Optional comma-separated list of XML "fixers". These are the unqualified names of classes in the package "edu.northwestern.at.wordhoard.tools.fixers". The fixers are executed in turn to fix problems and irregularites in the parsed XML DOM tree to transform it into normalized form.

A report is written to stdout which contains detailed messages from the fixers.

The following transformations are performed:

  1. The specified fixers are run, if any.
  2. The output is pretty-printed. Tab characters are used to indent lines. Elements which contain text descendants (e.g., "title", "head", and "l") are output on a single line.
  3. Unused elements and attributes are elminated. This includes "title" elements with a "type" attribute other than "subordinate".
  4. The "c", "seg", and "gap" punctuation elements are removed and replaced by just the punctuation text they contain.
  5. Title page text in responsibility and publication statements is cleaned up. All runs of white space are replaced by a single space character, including embedded new line characters. ") )" is replaced by ")". "Larry D Benson" is replaced by "Larry D. Benson".


Method Summary
static void main(java.lang.String[] args)
          The main program.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

main

public static void main(java.lang.String[] args)
The main program.

Parameters:
args - Command line arguments.