Class ICUFoldingFilter

java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.icu.ICUNormalizer2Filter
org.apache.lucene.analysis.icu.ICUFoldingFilter
All Implemented Interfaces:
Closeable, AutoCloseable, org.apache.lucene.util.Unwrappable<org.apache.lucene.analysis.TokenStream>

public final class ICUFoldingFilter extends ICUNormalizer2Filter
A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.

This filter applies the following foldings from the report to unicode text:

  • Accent removal
  • Case folding
  • Canonical duplicates folding
  • Dashes folding
  • Diacritic removal (including stroke, hook, descender)
  • Greek letterforms folding
  • Han Radical folding
  • Hebrew Alternates folding
  • Jamo folding
  • Letterforms folding
  • Math symbol folding
  • Multigraph Expansions: All
  • Native digit folding
  • No-break folding
  • Overline folding
  • Positional forms folding
  • Small forms folding
  • Space folding
  • Spacing Accents folding
  • Subscript folding
  • Superscript folding
  • Suzhou Numeral folding
  • Symbol folding
  • Underline folding
  • Vertical forms folding
  • Width folding

Additionally, Default Ignorables are removed, and text is normalized to NFKC. All foldings, case folding, and normalization mappings are applied recursively to ensure a fully folded and normalized result.

A normalizer with additional settings such as a filter that lists characters not to be normalized can be passed in the constructor.

  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

    org.apache.lucene.util.AttributeSource.State
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final com.ibm.icu.text.Normalizer2
    A normalizer for search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.

    Fields inherited from class org.apache.lucene.analysis.TokenFilter

    input

    Fields inherited from class org.apache.lucene.analysis.TokenStream

    DEFAULT_TOKEN_ATTRIBUTE_FACTORY
  • Constructor Summary

    Constructors
    Constructor
    Description
    ICUFoldingFilter(org.apache.lucene.analysis.TokenStream input)
    Create a new ICUFoldingFilter on the specified input
    ICUFoldingFilter(org.apache.lucene.analysis.TokenStream input, com.ibm.icu.text.Normalizer2 normalizer)
    Create a new ICUFoldingFilter on the specified input with the specified normalizer
  • Method Summary

    Methods inherited from class org.apache.lucene.analysis.icu.ICUNormalizer2Filter

    incrementToken

    Methods inherited from class org.apache.lucene.analysis.TokenFilter

    close, end, reset, unwrap

    Methods inherited from class org.apache.lucene.util.AttributeSource

    addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Field Details

    • NORMALIZER

      public static final com.ibm.icu.text.Normalizer2 NORMALIZER
      A normalizer for search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.
  • Constructor Details

    • ICUFoldingFilter

      public ICUFoldingFilter(org.apache.lucene.analysis.TokenStream input)
      Create a new ICUFoldingFilter on the specified input
    • ICUFoldingFilter

      public ICUFoldingFilter(org.apache.lucene.analysis.TokenStream input, com.ibm.icu.text.Normalizer2 normalizer)
      Create a new ICUFoldingFilter on the specified input with the specified normalizer