Class OpenNLPLemmatizerFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.opennlp.OpenNLPLemmatizerFilter
- All Implemented Interfaces:
Closeable,AutoCloseable,org.apache.lucene.util.Unwrappable<org.apache.lucene.analysis.TokenStream>
public class OpenNLPLemmatizerFilter
extends org.apache.lucene.analysis.TokenFilter
Runs OpenNLP dictionary-based and/or MaxEnt lemmatizers.
Both a dictionary-based lemmatizer and a MaxEnt lemmatizer are supported, via the "dictionary" and "lemmatizerModel" params, respectively. If both are configured, the dictionary-based lemmatizer is tried first, and then the MaxEnt lemmatizer is consulted for out-of-vocabulary tokens.
The dictionary file must be encoded as UTF-8, with one entry per line, in the form
word[tab]lemma[tab]part-of-speech
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.State -
Field Summary
Fields inherited from class org.apache.lucene.analysis.TokenFilter
inputFields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY -
Constructor Summary
ConstructorsConstructorDescriptionOpenNLPLemmatizerFilter(org.apache.lucene.analysis.TokenStream input, NLPLemmatizerOp lemmatizerOp) -
Method Summary
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end, unwrapMethods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Constructor Details
-
OpenNLPLemmatizerFilter
public OpenNLPLemmatizerFilter(org.apache.lucene.analysis.TokenStream input, NLPLemmatizerOp lemmatizerOp)
-
-
Method Details
-
incrementToken
- Specified by:
incrementTokenin classorg.apache.lucene.analysis.TokenStream- Throws:
IOException
-
reset
- Overrides:
resetin classorg.apache.lucene.analysis.TokenFilter- Throws:
IOException
-