Interface DocumentClassifier<T>

All Known Implementing Classes:
KNearestNeighborDocumentClassifier, SimpleNaiveBayesDocumentClassifier

public interface DocumentClassifier<T>
A classifier, see http://en.wikipedia.org/wiki/Classifier_(mathematics), which assign classes of type T to a Documents
WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Method Summary

    Modifier and Type
    Method
    Description
    assignClass(org.apache.lucene.document.Document document)
    Assign a class (with score) to the given Document
    getClasses(org.apache.lucene.document.Document document)
    Get all the classes (sorted by score, descending) assigned to the given Document.
    getClasses(org.apache.lucene.document.Document document, int max)
    Get the first max classes (sorted by score, descending) assigned to the given text String.
  • Method Details

    • assignClass

      ClassificationResult<T> assignClass(org.apache.lucene.document.Document document) throws IOException
      Assign a class (with score) to the given Document
      Parameters:
      document - a Document to be classified. Fields are considered features for the classification.
      Returns:
      a ClassificationResult holding assigned class of type T and score
      Throws:
      IOException - If there is a low-level I/O error.
    • getClasses

      List<ClassificationResult<T>> getClasses(org.apache.lucene.document.Document document) throws IOException
      Get all the classes (sorted by score, descending) assigned to the given Document.
      Parameters:
      document - a Document to be classified. Fields are considered features for the classification.
      Returns:
      the whole list of ClassificationResult, the classes and scores. Returns null if the classifier can't make lists.
      Throws:
      IOException - If there is a low-level I/O error.
    • getClasses

      List<ClassificationResult<T>> getClasses(org.apache.lucene.document.Document document, int max) throws IOException
      Get the first max classes (sorted by score, descending) assigned to the given text String.
      Parameters:
      document - a Document to be classified. Fields are considered features for the classification.
      max - the number of return list elements
      Returns:
      the whole list of ClassificationResult, the classes and scores. Cut for "max" number of elements. Returns null if the classifier can't make lists.
      Throws:
      IOException - If there is a low-level I/O error.