it.unimi.di.mg4j.search.score
Class TfIdfScorer

java.lang.Object
  extended by it.unimi.di.mg4j.search.score.AbstractScorer
      extended by it.unimi.di.mg4j.search.score.AbstractWeightedScorer
          extended by it.unimi.di.mg4j.search.score.TfIdfScorer
All Implemented Interfaces:
DelegatingScorer, Scorer, FlyweightPrototype<Scorer>

public class TfIdfScorer
extends AbstractWeightedScorer
implements DelegatingScorer

A scorer that implements the TF/IDF ranking formula.

There are a number of incarnations with small variations of the formula itself. Here, the weight assigned to a term which appears in f documents out of a collection of N documents w.r.t. to a document of length l in which the term appears c times is

log(N / f) c / l,

This class uses a CounterCollectionVisitor and related classes to take into consideration only terms that are actually involved in the current document.

Author:
Sebastiano Vigna

Field Summary
 
Fields inherited from class it.unimi.di.mg4j.search.score.AbstractWeightedScorer
index2Weight
 
Fields inherited from class it.unimi.di.mg4j.search.score.AbstractScorer
documentIterator, indexIterator
 
Constructor Summary
TfIdfScorer()
           
 
Method Summary
 TfIdfScorer copy()
           
 double score()
          Computes a score by calling Scorer.score(Index) for each index in the current document iterator, and adding the weighted results.
 double score(Index index)
          Returns a score for the current document of the last document iterator given to Scorer.wrap(DocumentIterator), but considering only a given index (optional operation).
 boolean usesIntervals()
          Whether this scorer uses intervals.
 void wrap(DocumentIterator d)
          Wraps the given document iterator.
 
Methods inherited from class it.unimi.di.mg4j.search.score.AbstractWeightedScorer
getWeights, setWeights
 
Methods inherited from class it.unimi.di.mg4j.search.score.AbstractScorer
nextDocument
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface it.unimi.di.mg4j.search.score.Scorer
getWeights, nextDocument, setWeights
 

Constructor Detail

TfIdfScorer

public TfIdfScorer()
Method Detail

copy

public TfIdfScorer copy()
Specified by:
copy in interface DelegatingScorer
Specified by:
copy in interface Scorer
Specified by:
copy in interface FlyweightPrototype<Scorer>

score

public double score()
             throws IOException
Description copied from class: AbstractWeightedScorer
Computes a score by calling Scorer.score(Index) for each index in the current document iterator, and adding the weighted results.

Specified by:
score in interface Scorer
Overrides:
score in class AbstractWeightedScorer
Returns:
the combined weighted score.
Throws:
IOException

score

public double score(Index index)
Description copied from interface: Scorer
Returns a score for the current document of the last document iterator given to Scorer.wrap(DocumentIterator), but considering only a given index (optional operation).

Specified by:
score in interface Scorer
Parameters:
index - the only index to be considered.
Returns:
the score.

wrap

public void wrap(DocumentIterator d)
          throws IOException
Description copied from class: AbstractScorer
Wraps the given document iterator.

This method records internally the provided iterator.

Specified by:
wrap in interface Scorer
Overrides:
wrap in class AbstractWeightedScorer
Parameters:
d - the document iterator that will be used in subsequent calls to Scorer.score() and Scorer.score(Index).
Throws:
IOException

usesIntervals

public boolean usesIntervals()
Description copied from interface: Scorer
Whether this scorer uses intervals.

This method is essential when aggregating scorers, because if several scores need intervals, a CachingDocumentIterator will be necessary.

Specified by:
usesIntervals in interface Scorer
Returns:
true if this scorer uses intervals.