java.lang.Object
- it.unimi.di.big.mg4j.search.score.AbstractScorer
- - it.unimi.di.big.mg4j.search.score.AbstractWeightedScorer
  - - it.unimi.di.big.mg4j.search.score.BM25Scorer

All Implemented Interfaces:

DelegatingScorer, Scorer, FlyweightPrototype<Scorer>
```
public class BM25Scorer
extends AbstractWeightedScorer
implements DelegatingScorer
```
A scorer that implements the BM25 ranking scheme.
BM25 is the name of a ranking scheme for text derived from the probabilistic model. The essential feature of the scheme is that of assigning to each term appearing in a given document a weight depending both on the count (the number of occurrences of the term in the document), on the frequency (the number of the documents in which the term appears) and on the document length (in words). It was devised in the early nineties, and it provides a significant improvement over the classical TF/IDF scheme. Karen Spärck Jones, Steve Walker and Stephen E. Robertson give a full account of BM25 and of the probabilistic model in “A probabilistic model of information retrieval: development and comparative experiments”, Inf. Process. Management 36(6):779−840, 2000.
There are a number of incarnations with small variations of the formula itself. Here, the weight assigned to a term which appears in f documents out of a collection of N documents w.r.t. to a document of length l in which the term appears c times is
log( (N − f + 1/2) / (f + 1/2) ) ( k₁ + 1 ) c ⁄ ( c + k₁ ((1 − b) + bl / L) ),
where L is the average document length, and k₁ and b are parameters that default to DEFAULT_K1 and DEFAULT_B: these values were chosen following the suggestions given in “Efficiency vs. effectiveness in Terabyte-scale information retrieval”, by Stefan Büttcher and Charles L. A. Clarke, in Proceedings of the 14th Text REtrieval Conference (TREC 2005). Gaithersburg, USA, November 2005. The logarithmic part (a.k.a. idf (inverse document-frequency) part) is actually maximised with EPSILON_SCORE, so it is never negative (the net effect being that terms appearing in more than half of the documents have almost no weight).
Evaluation

This class has two modes of evaluation, generic and flat. The generic evaluator uses an internal visitor building on CounterSetupVisitor and related classes (by means of DocumentIterator.acceptOnTruePaths(it.unimi.di.big.mg4j.search.visitor.DocumentIteratorVisitor)) to take into consideration only terms that are actually involved in query semantics for the current document. The flat evaluator simulates the behaviour of the generic evaluator on a special subset of queries, that is, queries that are formed by an index iterator or a composite document iterator whose underlying queries are all index iterators, by means of a simple loop. This is significantly faster than the generic evaluator (as there is no recursive visit) either if document iterator is a subclass of AbstractIntersectionDocumentIterator, or if it is a subclass of AbstractUnionDocumentIterator and the disjuncts are not too many (less than MAX_FLAT_DISJUNCTS).

Author:

Mauro Mereu, Sebastiano Vigna

Field Summary

Fields
Modifier and Type	Field	Description
`static boolean`	`DEBUG`
`static double`	`DEFAULT_B`	The default value used for the parameter `b`.
`static double`	`DEFAULT_K1`	The default value used for the parameter `k`₁.
`static double`	`EPSILON_SCORE`	The value of the document-frequency part for terms appearing in more than half of the documents.
`static Logger`	`LOGGER`
`static int`	`MAX_FLAT_DISJUNCTS`	Disjunctive queries on index iterators are handled using the flat evaluator only if they contain less than this number of disjuncts.

Fields inherited from class it.unimi.di.big.mg4j.search.score.AbstractWeightedScorer
index2Weight

Fields inherited from class it.unimi.di.big.mg4j.search.score.AbstractScorer
documentIterator, indexIterator

Constructor Summary

Constructors
Constructor	Description
`BM25Scorer()`	Creates a BM25 scorer using `DEFAULT_K1` and `DEFAULT_B` as parameters.
`BM25Scorer(double k1, double b)`	Creates a BM25 scorer using specified `k`₁ and `b` parameters.
`BM25Scorer(String k1, String b)`	Creates a BM25 scorer using specified `k`₁ and `b` parameters specified by strings.

Method Summary

Modifier and Type	Method	Description
`BM25Scorer`	`copy()`
`double`	`score()`	Computes a score by calling `Scorer.score(Index)` for each index in the current document iterator, and adding the weighted results.
`double`	`score(Index index)`	Returns a score for the current document of the last document iterator given to `Scorer.wrap(DocumentIterator)`, but considering only a given index (optional operation).
`boolean`	`usesIntervals()`	Whether this scorer uses intervals.
`void`	`wrap(DocumentIterator d)`	Wraps the given document iterator.

Methods inherited from class it.unimi.di.big.mg4j.search.score.AbstractWeightedScorer
getWeights, setWeights

Methods inherited from class it.unimi.di.big.mg4j.search.score.AbstractScorer
nextDocument

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface it.unimi.di.big.mg4j.search.score.Scorer
getWeights, nextDocument, setWeights

- Field Detail
  - LOGGER
```
public static final Logger LOGGER
```
  - DEBUG
```
public static final boolean DEBUG
```
    See Also:
    
    Constant Field Values
  - DEFAULT_K1
```
public static final double DEFAULT_K1
```
    The default value used for the parameter k₁.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_B
```
public static final double DEFAULT_B
```
    The default value used for the parameter b.
    
    See Also:
    
    Constant Field Values
  - EPSILON_SCORE
```
public static final double EPSILON_SCORE
```
    The value of the document-frequency part for terms appearing in more than half of the documents.
    
    See Also:
    
    Constant Field Values
  - MAX_FLAT_DISJUNCTS
```
public static final int MAX_FLAT_DISJUNCTS
```
    Disjunctive queries on index iterators are handled using the flat evaluator only if they contain less than this number of disjuncts. The generic evaluator is more efficient if there are several disjuncts, as it invokes IndexIterator.count() only on the terms that are part of the front. This value is largely architecture, query, term-distribution, and whatever else dependent.
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - BM25Scorer
```
public BM25Scorer()
```
    Creates a BM25 scorer using DEFAULT_K1 and DEFAULT_B as parameters.
  - BM25Scorer
```
public BM25Scorer(double k1,
                  double b)
```
    Creates a BM25 scorer using specified k₁ and b parameters.
    
    Parameters:
    
    k1 - the k₁ parameter.
    
    b - the b parameter.
  - BM25Scorer
```
public BM25Scorer(String k1,
                  String b)
```
    Creates a BM25 scorer using specified k₁ and b parameters specified by strings.
    
    Parameters:
    
    k1 - the k₁ parameter.
    
    b - the b parameter.
- Method Detail
  - copy
```
public BM25Scorer copy()
```
    Specified by:
    
    copy in interface FlyweightPrototype<Scorer>
    
    Specified by:
    
    copy in interface Scorer
  - score
```
public double score()
             throws IOException
```
    Description copied from class: AbstractWeightedScorer
    
    Computes a score by calling Scorer.score(Index) for each index in the current document iterator, and adding the weighted results.
    
    Specified by:
    
    score in interface Scorer
    
    Overrides:
    
    score in class AbstractWeightedScorer
    
    Returns:
    
    the combined weighted score.
    
    Throws:
    
    IOException
  - score
```
public double score(Index index)
```
    Description copied from interface: Scorer
    
    Returns a score for the current document of the last document iterator given to Scorer.wrap(DocumentIterator), but considering only a given index (optional operation).
    
    Specified by:
    
    score in interface Scorer
    
    Parameters:
    
    index - the only index to be considered.
    
    Returns:
    
    the score.
  - wrap
```
public void wrap(DocumentIterator d)
          throws IOException
```
    Description copied from class: AbstractScorer
    
    Wraps the given document iterator.
    This method records internally the provided iterator.
    
    Specified by:
    
    wrap in interface Scorer
    
    Overrides:
    
    wrap in class AbstractWeightedScorer
    
    Parameters:
    
    d - the document iterator that will be used in subsequent calls to Scorer.score() and Scorer.score(Index).
    
    Throws:
    
    IOException
  - usesIntervals
```
public boolean usesIntervals()
```
    Description copied from interface: Scorer
    
    Whether this scorer uses intervals.
    This method is essential when aggregating scorers, because if several scores need intervals, a CachingDocumentIterator will be necessary.
    
    Specified by:
    
    usesIntervals in interface Scorer
    
    Returns:
    
    true if this scorer uses intervals.

Class BM25Scorer

Evaluation

Field Summary

Fields inherited from class it.unimi.di.big.mg4j.search.score.AbstractWeightedScorer

Fields inherited from class it.unimi.di.big.mg4j.search.score.AbstractScorer

Constructor Summary

Method Summary

Methods inherited from class it.unimi.di.big.mg4j.search.score.AbstractWeightedScorer

Methods inherited from class it.unimi.di.big.mg4j.search.score.AbstractScorer

Methods inherited from class java.lang.Object

Methods inherited from interface it.unimi.di.big.mg4j.search.score.Scorer

Field Detail

LOGGER

DEBUG

DEFAULT_K1

DEFAULT_B

EPSILON_SCORE

MAX_FLAT_DISJUNCTS

Constructor Detail

BM25Scorer

BM25Scorer

BM25Scorer

Method Detail

copy

score

score

wrap

usesIntervals