Class PolishAnalyzer

java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.StopwordAnalyzerBase
org.apache.lucene.analysis.pl.PolishAnalyzer
All Implemented Interfaces:
Closeable, AutoCloseable

public final class PolishAnalyzer extends org.apache.lucene.analysis.StopwordAnalyzerBase
Analyzer for Polish.
Since:
3.1
  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer

    org.apache.lucene.analysis.Analyzer.ReuseStrategy, org.apache.lucene.analysis.Analyzer.TokenStreamComponents
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
    File containing default Polish stemmer table.
    static final String
    File containing default Polish stopwords.

    Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase

    stopwords

    Fields inherited from class org.apache.lucene.analysis.Analyzer

    GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
  • Constructor Summary

    Constructors
    Constructor
    Description
    Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE.
    PolishAnalyzer(org.apache.lucene.analysis.CharArraySet stopwords)
    Builds an analyzer with the given stop words.
    PolishAnalyzer(org.apache.lucene.analysis.CharArraySet stopwords, org.apache.lucene.analysis.CharArraySet stemExclusionSet)
    Builds an analyzer with the given stop words.
  • Method Summary

    Modifier and Type
    Method
    Description
    protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents
    Creates a Analyzer.TokenStreamComponents which tokenizes all the text in the provided Reader.
    static org.apache.lucene.analysis.CharArraySet
    Returns an unmodifiable instance of the default stop words set.
    static Trie
    Returns an unmodifiable instance of the default stemmer table.
    protected org.apache.lucene.analysis.TokenStream
    normalize(String fieldName, org.apache.lucene.analysis.TokenStream in)
     

    Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase

    getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet

    Methods inherited from class org.apache.lucene.analysis.Analyzer

    attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • DEFAULT_STOPWORD_FILE

      public static final String DEFAULT_STOPWORD_FILE
      File containing default Polish stopwords.
      See Also:
    • DEFAULT_STEMMER_FILE

      public static final String DEFAULT_STEMMER_FILE
      File containing default Polish stemmer table.
      See Also:
  • Constructor Details

    • PolishAnalyzer

      public PolishAnalyzer()
      Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE.
    • PolishAnalyzer

      public PolishAnalyzer(org.apache.lucene.analysis.CharArraySet stopwords)
      Builds an analyzer with the given stop words.
      Parameters:
      stopwords - a stopword set
    • PolishAnalyzer

      public PolishAnalyzer(org.apache.lucene.analysis.CharArraySet stopwords, org.apache.lucene.analysis.CharArraySet stemExclusionSet)
      Builds an analyzer with the given stop words. If a non-empty stem exclusion set is provided this analyzer will add a SetKeywordMarkerFilter before stemming.
      Parameters:
      stopwords - a stopword set
      stemExclusionSet - a set of terms not to be stemmed
  • Method Details

    • getDefaultStopSet

      public static org.apache.lucene.analysis.CharArraySet getDefaultStopSet()
      Returns an unmodifiable instance of the default stop words set.
      Returns:
      default stop words set.
    • getDefaultTable

      public static Trie getDefaultTable()
      Returns an unmodifiable instance of the default stemmer table.
    • createComponents

      protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents createComponents(String fieldName)
      Creates a Analyzer.TokenStreamComponents which tokenizes all the text in the provided Reader.
      Specified by:
      createComponents in class org.apache.lucene.analysis.Analyzer
      Returns:
      A Analyzer.TokenStreamComponents built from an StandardTokenizer filtered with LowerCaseFilter, StopFilter , SetKeywordMarkerFilter if a stem exclusion set is provided and StempelFilter.
    • normalize

      protected org.apache.lucene.analysis.TokenStream normalize(String fieldName, org.apache.lucene.analysis.TokenStream in)
      Overrides:
      normalize in class org.apache.lucene.analysis.Analyzer