Class HMMChineseTokenizerFactory

java.lang.Object
org.apache.lucene.analysis.AbstractAnalysisFactory
org.apache.lucene.analysis.TokenizerFactory
org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory

public final class HMMChineseTokenizerFactory extends org.apache.lucene.analysis.TokenizerFactory
Factory for HMMChineseTokenizer

Note: this class will currently emit tokens for punctuation. So you should either add a WordDelimiterFilter after to remove these (with concatenate off), or use the SmartChinese stoplist with a StopFilterFactory via: words="org/apache/lucene/analysis/cn/smart/stopwords.txt"

Since:
4.10.0
WARNING: This API is experimental and might change in incompatible ways in the next release.
SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).
"hmmChinese"
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
    SPI name

    Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory

    LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
  • Constructor Summary

    Constructors
    Constructor
    Description
    Default ctor for compatibility with SPI
    Creates a new HMMChineseTokenizerFactory
  • Method Summary

    Modifier and Type
    Method
    Description
    org.apache.lucene.analysis.Tokenizer
    create(org.apache.lucene.util.AttributeFactory factory)
     

    Methods inherited from class org.apache.lucene.analysis.TokenizerFactory

    availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers

    Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory

    defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

    • HMMChineseTokenizerFactory

      public HMMChineseTokenizerFactory(Map<String,String> args)
      Creates a new HMMChineseTokenizerFactory
    • HMMChineseTokenizerFactory

      public HMMChineseTokenizerFactory()
      Default ctor for compatibility with SPI
  • Method Details

    • create

      public org.apache.lucene.analysis.Tokenizer create(org.apache.lucene.util.AttributeFactory factory)
      Specified by:
      create in class org.apache.lucene.analysis.TokenizerFactory