edu.northwestern.at.utils.corpuslinguistics
Class FileTokenizer

java.lang.Object
  extended by edu.northwestern.at.utils.corpuslinguistics.FileTokenizer

public class FileTokenizer
extends java.lang.Object

Tokenize a text file.


Field Summary
protected  java.io.BufferedReader input
           
protected  java.lang.String nextToken
           
protected  Pretokenizer pretokenizer
           
protected  java.util.StringTokenizer tokenizer
           
 
Constructor Summary
FileTokenizer(java.io.File file)
          Create file tokenizer.
FileTokenizer(java.io.File file, java.lang.String encoding, Pretokenizer pretokenizer)
          Create file tokenizer.
FileTokenizer(java.lang.String fileName)
          Create file tokenizer.
FileTokenizer(java.lang.String fileName, java.lang.String encoding)
          Create file tokenizer.
 
Method Summary
 void close()
          Close input file once tokenization is complete.
 java.lang.String getNextToken()
          Get the next token.
 boolean hasMoreTokens()
          Check if more tokens are available.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

input

protected java.io.BufferedReader input

tokenizer

protected java.util.StringTokenizer tokenizer

nextToken

protected java.lang.String nextToken

pretokenizer

protected Pretokenizer pretokenizer
Constructor Detail

FileTokenizer

public FileTokenizer(java.io.File file,
                     java.lang.String encoding,
                     Pretokenizer pretokenizer)
              throws java.io.IOException
Create file tokenizer.

Parameters:
file - Input file.
encoding - Input file text encoding (e.g., "utf-8").
pretokenizer - The pretokenizer for each input line. DefaultPretokenizer is used if null.
Throws:
java.io.IOException - if input file can't be read.

FileTokenizer

public FileTokenizer(java.io.File file)
              throws java.io.IOException
Create file tokenizer.

Parameters:
file - Input file.
Throws:
java.io.IOException - if input file can't be read.

FileTokenizer

public FileTokenizer(java.lang.String fileName,
                     java.lang.String encoding)
              throws java.io.IOException
Create file tokenizer.

Parameters:
fileName - Name of the input file.
encoding - Encoding (e.g., "utf-8").
Throws:
java.io.IOException - if input file can't be read.

FileTokenizer

public FileTokenizer(java.lang.String fileName)
              throws java.io.IOException
Create file tokenizer.

Parameters:
fileName - Name of the input file.
Throws:
java.io.IOException - if input file can't be read.
Method Detail

getNextToken

public java.lang.String getNextToken()
Get the next token.

Returns:
Next available token, or null at the end of the file.

hasMoreTokens

public boolean hasMoreTokens()
Check if more tokens are available.

Returns:
true if more tokens are available, false if not.

close

public void close()
           throws java.io.IOException
Close input file once tokenization is complete.

Throws:
java.io.IOException