Class ICUTransformFilter

java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.icu.ICUTransformFilter
All Implemented Interfaces:
Closeable, AutoCloseable, org.apache.lucene.util.Unwrappable<org.apache.lucene.analysis.TokenStream>

public final class ICUTransformFilter extends org.apache.lucene.analysis.TokenFilter
A TokenFilter that transforms text with ICU.

ICU provides text-transformation functionality via its Transliteration API. Although script conversion is its most common use, a Transliterator can actually perform a more general class of tasks. In fact, Transliterator defines a very general API which specifies only that a segment of the input text is replaced by new text. The particulars of this conversion are determined entirely by subclasses of Transliterator.

Some useful transformations for search are built-in:

  • Conversion from Traditional to Simplified Chinese characters
  • Conversion from Hiragana to Katakana
  • Conversion from Fullwidth to Halfwidth forms.
  • Script conversions, for example Serbian Cyrillic to Latin

Example usage:

stream = new ICUTransformFilter(stream, Transliterator.getInstance("Traditional-Simplified"));

For more details, see the ICU User Guide.
  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

    org.apache.lucene.util.AttributeSource.State
  • Field Summary

    Fields inherited from class org.apache.lucene.analysis.TokenFilter

    input

    Fields inherited from class org.apache.lucene.analysis.TokenStream

    DEFAULT_TOKEN_ATTRIBUTE_FACTORY
  • Constructor Summary

    Constructors
    Constructor
    Description
    ICUTransformFilter(org.apache.lucene.analysis.TokenStream input, com.ibm.icu.text.Transliterator transform)
    Create a new ICUTransformFilter that transforms text on the given stream.
  • Method Summary

    Modifier and Type
    Method
    Description
    boolean
     

    Methods inherited from class org.apache.lucene.analysis.TokenFilter

    close, end, reset, unwrap

    Methods inherited from class org.apache.lucene.util.AttributeSource

    addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • ICUTransformFilter

      public ICUTransformFilter(org.apache.lucene.analysis.TokenStream input, com.ibm.icu.text.Transliterator transform)
      Create a new ICUTransformFilter that transforms text on the given stream.
      Parameters:
      input - TokenStream to filter.
      transform - Transliterator to transform the text.
  • Method Details

    • incrementToken

      public boolean incrementToken() throws IOException
      Specified by:
      incrementToken in class org.apache.lucene.analysis.TokenStream
      Throws:
      IOException