Class KNearestNeighborClassifier

java.lang.Object
org.apache.lucene.classification.KNearestNeighborClassifier
All Implemented Interfaces:
Classifier<org.apache.lucene.util.BytesRef>
Direct Known Subclasses:
KNearestNeighborDocumentClassifier

public class KNearestNeighborClassifier extends Object implements Classifier<org.apache.lucene.util.BytesRef>
A k-Nearest Neighbor classifier (see http://en.wikipedia.org/wiki/K-nearest_neighbors ) based on MoreLikeThis
WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected final String
    the name of the field used as the output text
    protected final org.apache.lucene.search.IndexSearcher
    an IndexSearcher used to perform queries
    protected final int
    the no.
    protected final org.apache.lucene.queries.mlt.MoreLikeThis
    a MoreLikeThis instance used to perform MLT queries
    protected final org.apache.lucene.search.Query
    a Query used to filter the documents that should be used from this classifier's underlying LeafReader
    protected final String[]
    the name of the fields used as the input text
  • Constructor Summary

    Constructors
    Constructor
    Description
    KNearestNeighborClassifier(org.apache.lucene.index.IndexReader indexReader, org.apache.lucene.search.similarities.Similarity similarity, org.apache.lucene.analysis.Analyzer analyzer, org.apache.lucene.search.Query query, int k, int minDocsFreq, int minTermFreq, String classFieldName, String... textFieldNames)
  • Method Summary

    Modifier and Type
    Method
    Description
    ClassificationResult<org.apache.lucene.util.BytesRef>
    Assign a class (with score) to the given text String
    protected List<ClassificationResult<org.apache.lucene.util.BytesRef>>
    buildListFromTopDocs(org.apache.lucene.search.TopDocs topDocs)
    build a list of classification results from search results
    protected ClassificationResult<org.apache.lucene.util.BytesRef>
    classifyFromTopDocs(org.apache.lucene.search.TopDocs knnResults)
    TODO
    List<ClassificationResult<org.apache.lucene.util.BytesRef>>
    Get all the classes (sorted by score, descending) assigned to the given text String.
    List<ClassificationResult<org.apache.lucene.util.BytesRef>>
    getClasses(String text, int max)
    Get the first max classes (sorted by score, descending) assigned to the given text String.
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

    • mlt

      protected final org.apache.lucene.queries.mlt.MoreLikeThis mlt
      a MoreLikeThis instance used to perform MLT queries
    • textFieldNames

      protected final String[] textFieldNames
      the name of the fields used as the input text
    • classFieldName

      protected final String classFieldName
      the name of the field used as the output text
    • indexSearcher

      protected final org.apache.lucene.search.IndexSearcher indexSearcher
      an IndexSearcher used to perform queries
    • k

      protected final int k
      the no. of docs to compare in order to find the nearest neighbor to the input text
    • query

      protected final org.apache.lucene.search.Query query
      a Query used to filter the documents that should be used from this classifier's underlying LeafReader
  • Constructor Details

    • KNearestNeighborClassifier

      public KNearestNeighborClassifier(org.apache.lucene.index.IndexReader indexReader, org.apache.lucene.search.similarities.Similarity similarity, org.apache.lucene.analysis.Analyzer analyzer, org.apache.lucene.search.Query query, int k, int minDocsFreq, int minTermFreq, String classFieldName, String... textFieldNames) throws IOException
      Parameters:
      indexReader - the reader on the index to be used for classification
      analyzer - an Analyzer used to analyze unseen text
      similarity - the Similarity to be used by the underlying IndexSearcher or null (defaults to BM25Similarity)
      query - a Query to eventually filter the docs used for training the classifier, or null if all the indexed docs should be used
      k - the no. of docs to select in the MLT results to find the nearest neighbor
      minDocsFreq - MoreLikeThis.minDocFreq parameter
      minTermFreq - MoreLikeThis.minTermFreq parameter
      classFieldName - the name of the field used as the output for the classifier
      textFieldNames - the name of the fields used as the inputs for the classifier, they can contain boosting indication e.g. title^10
      Throws:
      IOException
  • Method Details

    • assignClass

      public ClassificationResult<org.apache.lucene.util.BytesRef> assignClass(String text) throws IOException
      Description copied from interface: Classifier
      Assign a class (with score) to the given text String
      Specified by:
      assignClass in interface Classifier<org.apache.lucene.util.BytesRef>
      Parameters:
      text - a String containing text to be classified
      Returns:
      a ClassificationResult holding assigned class of type T and score
      Throws:
      IOException - If there is a low-level I/O error.
    • classifyFromTopDocs

      protected ClassificationResult<org.apache.lucene.util.BytesRef> classifyFromTopDocs(org.apache.lucene.search.TopDocs knnResults) throws IOException
      TODO
      Throws:
      IOException
    • getClasses

      public List<ClassificationResult<org.apache.lucene.util.BytesRef>> getClasses(String text) throws IOException
      Description copied from interface: Classifier
      Get all the classes (sorted by score, descending) assigned to the given text String.
      Specified by:
      getClasses in interface Classifier<org.apache.lucene.util.BytesRef>
      Parameters:
      text - a String containing text to be classified
      Returns:
      the whole list of ClassificationResult, the classes and scores. Returns null if the classifier can't make lists.
      Throws:
      IOException - If there is a low-level I/O error.
    • getClasses

      public List<ClassificationResult<org.apache.lucene.util.BytesRef>> getClasses(String text, int max) throws IOException
      Description copied from interface: Classifier
      Get the first max classes (sorted by score, descending) assigned to the given text String.
      Specified by:
      getClasses in interface Classifier<org.apache.lucene.util.BytesRef>
      Parameters:
      text - a String containing text to be classified
      max - the number of return list elements
      Returns:
      the whole list of ClassificationResult, the classes and scores. Cut for "max" number of elements. Returns null if the classifier can't make lists.
      Throws:
      IOException - If there is a low-level I/O error.
    • buildListFromTopDocs

      protected List<ClassificationResult<org.apache.lucene.util.BytesRef>> buildListFromTopDocs(org.apache.lucene.search.TopDocs topDocs) throws IOException
      build a list of classification results from search results
      Parameters:
      topDocs - the search results as a TopDocs object
      Returns:
      a List of ClassificationResult, one for each existing class
      Throws:
      IOException - if it's not possible to get the stored value of class field
    • toString

      public String toString()
      Overrides:
      toString in class Object