Class OpenNLPTokenizer

java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.util.SegmentingTokenizerBase
org.apache.lucene.analysis.opennlp.OpenNLPTokenizer
All Implemented Interfaces:
Closeable, AutoCloseable

public final class OpenNLPTokenizer extends org.apache.lucene.analysis.util.SegmentingTokenizerBase
Run OpenNLP SentenceDetector and Tokenizer. The index of each sentence is stored in SentenceAttribute.
  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

    org.apache.lucene.util.AttributeSource.State
  • Field Summary

    Fields inherited from class org.apache.lucene.analysis.util.SegmentingTokenizerBase

    buffer, BUFFERMAX, offset

    Fields inherited from class org.apache.lucene.analysis.Tokenizer

    input

    Fields inherited from class org.apache.lucene.analysis.TokenStream

    DEFAULT_TOKEN_ATTRIBUTE_FACTORY
  • Constructor Summary

    Constructors
    Constructor
    Description
    OpenNLPTokenizer(org.apache.lucene.util.AttributeFactory factory, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    void
     
    protected boolean
     
    void
     
    protected void
    setNextSentence(int sentenceStart, int sentenceEnd)
     

    Methods inherited from class org.apache.lucene.analysis.util.SegmentingTokenizerBase

    end, incrementToken, isSafeEnd

    Methods inherited from class org.apache.lucene.analysis.Tokenizer

    correctOffset, setReader, setReaderTestPoint

    Methods inherited from class org.apache.lucene.util.AttributeSource

    addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Constructor Details

  • Method Details

    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class org.apache.lucene.analysis.Tokenizer
      Throws:
      IOException
    • setNextSentence

      protected void setNextSentence(int sentenceStart, int sentenceEnd)
      Specified by:
      setNextSentence in class org.apache.lucene.analysis.util.SegmentingTokenizerBase
    • incrementWord

      protected boolean incrementWord()
      Specified by:
      incrementWord in class org.apache.lucene.analysis.util.SegmentingTokenizerBase
    • reset

      public void reset() throws IOException
      Overrides:
      reset in class org.apache.lucene.analysis.util.SegmentingTokenizerBase
      Throws:
      IOException