Class OpenNLPLemmatizerFilter

java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.opennlp.OpenNLPLemmatizerFilter
All Implemented Interfaces:
Closeable, AutoCloseable, org.apache.lucene.util.Unwrappable<org.apache.lucene.analysis.TokenStream>

public class OpenNLPLemmatizerFilter extends org.apache.lucene.analysis.TokenFilter
Runs OpenNLP dictionary-based and/or MaxEnt lemmatizers.

Both a dictionary-based lemmatizer and a MaxEnt lemmatizer are supported, via the "dictionary" and "lemmatizerModel" params, respectively. If both are configured, the dictionary-based lemmatizer is tried first, and then the MaxEnt lemmatizer is consulted for out-of-vocabulary tokens.

The dictionary file must be encoded as UTF-8, with one entry per line, in the form word[tab]lemma[tab]part-of-speech

  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

    org.apache.lucene.util.AttributeSource.State
  • Field Summary

    Fields inherited from class org.apache.lucene.analysis.TokenFilter

    input

    Fields inherited from class org.apache.lucene.analysis.TokenStream

    DEFAULT_TOKEN_ATTRIBUTE_FACTORY
  • Constructor Summary

    Constructors
    Constructor
    Description
    OpenNLPLemmatizerFilter(org.apache.lucene.analysis.TokenStream input, NLPLemmatizerOp lemmatizerOp)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    final boolean
     
    void
     

    Methods inherited from class org.apache.lucene.analysis.TokenFilter

    close, end, unwrap

    Methods inherited from class org.apache.lucene.util.AttributeSource

    addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • OpenNLPLemmatizerFilter

      public OpenNLPLemmatizerFilter(org.apache.lucene.analysis.TokenStream input, NLPLemmatizerOp lemmatizerOp)
  • Method Details

    • incrementToken

      public final boolean incrementToken() throws IOException
      Specified by:
      incrementToken in class org.apache.lucene.analysis.TokenStream
      Throws:
      IOException
    • reset

      public void reset() throws IOException
      Overrides:
      reset in class org.apache.lucene.analysis.TokenFilter
      Throws:
      IOException