edu.northwestern.at.wordhoard.model.text
Class CharsetUtils

java.lang.Object
  extended by edu.northwestern.at.wordhoard.model.text.CharsetUtils

public class CharsetUtils
extends java.lang.Object

Character set utilities.


Method Summary
static java.lang.String getBadBetaSeq()
          Gets the bad beta code sequence.
static java.text.Collator getCollator(byte charset, int strength)
          Gets a collator.
static java.lang.String translateBetaToUni(java.lang.String str)
          Translates a beta code string to unicode.
static java.lang.String translateToInsensitive(java.lang.String str)
          Translates a string to a case and diacritical insensitive version.
static java.lang.String translateTonosToOxia(java.lang.String str)
          Translates tonos accents to oxia accents in a string.
static java.lang.String translateUniToBeta(java.lang.String str)
          Translates a unicode string to beta code.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

translateBetaToUni

public static java.lang.String translateBetaToUni(java.lang.String str)
Translates a beta code string to unicode.

Parameters:
str - Beta code string.
Returns:
Unicode string.

getBadBetaSeq

public static java.lang.String getBadBetaSeq()
Gets the bad beta code sequence.

Returns:
Bad (unknown) beta code sequence from most recent call to translateBetaToUni, or null if none.

translateUniToBeta

public static java.lang.String translateUniToBeta(java.lang.String str)
Translates a unicode string to beta code.

Parameters:
str - Unicode string.
Returns:
Beta code string.

getCollator

public static java.text.Collator getCollator(byte charset,
                                             int strength)
Gets a collator.

The character sets are:

The collation strengths are:

Parameters:
charset - Character set.
strength - Strength.
Returns:
Collator.

translateToInsensitive

public static java.lang.String translateToInsensitive(java.lang.String str)
Translates a string to a case and diacritical insensitive version.

All diacritical marks are removed and all letters are mapped to lower case.

Parameters:
str - String.
Returns:
Insensitive string.

translateTonosToOxia

public static java.lang.String translateTonosToOxia(java.lang.String str)
Translates tonos accents to oxia accents in a string.

We use oxia accents on lower case vowels in the Greek Extended Unicode range in the Early Greek Epic text and lemma spellings. The tonos accents in the Greek and Coptic range are nearly indistinguishable visually and may be typed by users. For example, the Mac OS X Polytonic Greek input method results in tonos accents.

To prevent confusion, we convert tonos to oxia accents in all strings typed by users before we attempt to do searches for the strings in the WordHoard database.

Parameters:
str - String with tonos accents.
Returns:
String with tonos accents translated to oxia accents.