java.lang.Object
- it.unimi.di.big.mg4j.search.score.AbstractScorer
- - it.unimi.di.big.mg4j.search.score.AbstractWeightedScorer
  - - it.unimi.di.big.mg4j.search.score.BM25FScorer

All Implemented Interfaces:

DelegatingScorer, Scorer, FlyweightPrototype<Scorer>
```
public class BM25FScorer
extends AbstractWeightedScorer
implements DelegatingScorer
```
A scorer that implements the BM25F ranking scheme.
BM25F is an evolution of BM25 described by Stephen Robertson, Hugo Zaragoza and Michael Taylor in “Simple BM25 extension to multiple weighted fields”, CIKM '04: Proceedings of the thirteenth ACM international Conference on Information and Knowledge Management, pages 42−49, ACM Press, 2004.
The idea behind BM25F is that adding up (albeit with weights) BM25 scores from different fields breaks down the nonlinearity of BM25. Instead, we should work on a virtual document collection: more precisely, we should behave as if all fields were concatenated in a single stream of text. For instance, if weights are integers, the formula behaves as if the text of each field is concatenated as many times as its weight to form a global text, which is then scored using BM25.
Note that, for this to happen, we would need to know the corresponding frequency—that is, for each term, the number of documents in which the term appears in at least one of the fields. This number must be provided at construction time: more precisely, you must specify a StringMap that maps each term appearing in some field to an index into a LongList containing the correct frequencies. These data is accessed only in the preparatory phase, so access can be reasonably slow.
Important: the only source of knowledge about the overall set of indices involved in query resolution is given by calls to AbstractWeightedScorer.setWeights(it.unimi.dsi.fastutil.objects.Reference2DoubleMap). That is, this scorer will assume that all indices appearing in a query are also keys of the weight function passed to AbstractWeightedScorer.setWeights(it.unimi.dsi.fastutil.objects.Reference2DoubleMap). An exception will be raised if these guidelines are not followed.
Computing frequency data

The tool Paste can be used to create the metadata of the virtual collection. To do so, simply run Paste on the indices of all fields over which you want to compute BM25F with the --metadata-only option. The resulting frequency file is what you need to pass to the constructor, and from the term file you can build a StringMap (e.g., using an ImmutableExternalPrefixMap) that will be used to index the frequencies.
Boldi's variant

Providing global frequency data makes it possible to compute the classical BM25F formula. If no frequency data is provided, this class implements Paolo Boldi's variant of BM25. In this case, we multiply the IDF score by the weighted count of each term to form the virtual count that will be passed through BM25's nonlinear function.
Using this scorer

This scorer assigns to each pair index/term reachable by true paths a score that depends on the virtual count of the term, which is the count of the term for the given index multiplied by the weight of the index. To obtain the “classical” BM25F score you must write a query q that contains no index selector and multiplexes it on all indices, e.g., a:q | b:q | c:q. If a term appears only in some specific index/query pair, its score will be computed using a smaller virtual count, obtained just by adding up the values associated with the actually present index/query pairs. Usually, the simplest way to obtain this result is to use a MultiIndexTermExpander, which can be even set from the command-line interface provided by Query.
Correctness

The code in this scorer is verified by unit tests developed jointly with Hugo Zaragoza. This is an important point, as the definition of BM25F contains many subtleties.

Author:

Sebastiano Vigna

See Also:

BM25Scorer

Nested Class Summary

Nested Classes
Modifier and Type Class Description

protected static class BM25FScorer.Visitor

Field Summary

Fields
Modifier and Type	Field	Description
`Reference2DoubleMap<Index>`	`bByIndex`	The parameter `b`; you must provide one value for each index.
`static double`	`DEFAULT_B`	The default value used for the parameter `b`.
`static double`	`DEFAULT_K1`	The default value used for the parameter `k`₁.
`static double`	`EPSILON_SCORE`	The value of the document-frequency part for terms appearing in more than half of the documents.
`double`	`k1`	The parameter `k`₁.

Fields inherited from class it.unimi.di.big.mg4j.search.score.AbstractWeightedScorer
index2Weight

Fields inherited from class it.unimi.di.big.mg4j.search.score.AbstractScorer
documentIterator, indexIterator

Constructor Summary

Constructors
Constructor	Description
`BM25FScorer(double k1, StringMap<? extends CharSequence> termMap, LongBigList frequencies, Object2DoubleMap<String> b)`	Creates a BM25F scorer.
`BM25FScorer(double k1, Reference2DoubleMap<Index> b)`	Creates a BM25F scorer using Boldi's variant (frequencies are not needed).
`BM25FScorer(double k1, Reference2DoubleMap<Index> b, StringMap<? extends CharSequence> termMap, LongBigList frequencies)`	Creates a BM25F scorer.
`BM25FScorer(String... arg)`	Creates a BM25F scorer using parameters specified by strings.

Method Summary

Modifier and Type	Method	Description
`BM25FScorer`	`copy()`
`double`	`score()`	Computes a score by calling `Scorer.score(Index)` for each index in the current document iterator, and adding the weighted results.
`double`	`score(Index index)`	Returns a score for the current document of the last document iterator given to `Scorer.wrap(DocumentIterator)`, but considering only a given index (optional operation).
`boolean`	`usesIntervals()`	Whether this scorer uses intervals.
`void`	`wrap(DocumentIterator d)`	Wraps the given document iterator.

Methods inherited from class it.unimi.di.big.mg4j.search.score.AbstractWeightedScorer
getWeights, setWeights

Methods inherited from class it.unimi.di.big.mg4j.search.score.AbstractScorer
nextDocument

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface it.unimi.di.big.mg4j.search.score.Scorer
getWeights, nextDocument, setWeights

- Field Detail
  - DEFAULT_K1
```
public static final double DEFAULT_K1
```
    The default value used for the parameter k₁.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_B
```
public static final double DEFAULT_B
```
    The default value used for the parameter b.
    
    See Also:
    
    Constant Field Values
  - EPSILON_SCORE
```
public static final double EPSILON_SCORE
```
    The value of the document-frequency part for terms appearing in more than half of the documents.
    
    See Also:
    
    Constant Field Values
  - k1
```
public final double k1
```
    The parameter k₁.
  - bByIndex
```
public final Reference2DoubleMap<Index> bByIndex
```
    The parameter b; you must provide one value for each index.
- Constructor Detail
  - BM25FScorer
```
public BM25FScorer(double k1,
                   Reference2DoubleMap<Index> b,
                   StringMap<? extends CharSequence> termMap,
                   LongBigList frequencies)
```
    Creates a BM25F scorer.
    
    Parameters:
    
    k1 - the k₁ parameter.
    
    b - the b parameter, specified as a map from indices to values.
    
    termMap - a map from terms to positions in frequencies, or null if frequencies is null.
    
    frequencies - the frequencies, or null for Boldi's variant.
  - BM25FScorer
```
public BM25FScorer(double k1,
                   StringMap<? extends CharSequence> termMap,
                   LongBigList frequencies,
                   Object2DoubleMap<String> b)
```
    Creates a BM25F scorer.
    This constructor exists to provide a typed counterpart to the string-based constructor (mainly for documentation purposes).
    
    Parameters:
    
    k1 - the k₁ parameter.
    
    termMap - a map from terms to positions in frequencies, or null if frequencies is null.
    
    frequencies - the frequencies, or null for Boldi's variant.
    
    b - the b parameter, specified as a map from indices to values.
  - BM25FScorer
```
public BM25FScorer(double k1,
                   Reference2DoubleMap<Index> b)
```
    Creates a BM25F scorer using Boldi's variant (frequencies are not needed).
    
    Parameters:
    
    k1 - the k₁ parameter.
    
    b - the b parameter, specified as a map from indices to values.
  - BM25FScorer
```
public BM25FScorer(String... arg)
            throws NumberFormatException,
                   FileNotFoundException,
                   IOException,
                   ClassNotFoundException
```
    Creates a BM25F scorer using parameters specified by strings.
    This constructor has string parameters that correspond to the arguments of BM25FScorer(double, StringMap, LongBigList, Object2DoubleMap). The two middle arguments can be omitted by specifying them as empty. The last argument is represented by a number of assignments index=b, separated by commas (as if they were multiple arguments), which will be compacted into a function representing the values of b.
    
    Throws:
    
    NumberFormatException
    
    FileNotFoundException
    
    IOException
    
    ClassNotFoundException
- Method Detail
  - copy
```
public BM25FScorer copy()
```
    Specified by:
    
    copy in interface FlyweightPrototype<Scorer>
    
    Specified by:
    
    copy in interface Scorer
  - score
```
public double score()
             throws IOException
```
    Description copied from class: AbstractWeightedScorer
    
    Computes a score by calling Scorer.score(Index) for each index in the current document iterator, and adding the weighted results.
    
    Specified by:
    
    score in interface Scorer
    
    Overrides:
    
    score in class AbstractWeightedScorer
    
    Returns:
    
    the combined weighted score.
    
    Throws:
    
    IOException
  - score
```
public double score(Index index)
```
    Description copied from interface: Scorer
    
    Returns a score for the current document of the last document iterator given to Scorer.wrap(DocumentIterator), but considering only a given index (optional operation).
    
    Specified by:
    
    score in interface Scorer
    
    Parameters:
    
    index - the only index to be considered.
    
    Returns:
    
    the score.
  - wrap
```
public void wrap(DocumentIterator d)
          throws IOException
```
    Description copied from class: AbstractScorer
    
    Wraps the given document iterator.
    This method records internally the provided iterator.
    
    Specified by:
    
    wrap in interface Scorer
    
    Overrides:
    
    wrap in class AbstractWeightedScorer
    
    Parameters:
    
    d - the document iterator that will be used in subsequent calls to Scorer.score() and Scorer.score(Index).
    
    Throws:
    
    IOException
  - usesIntervals
```
public boolean usesIntervals()
```
    Description copied from interface: Scorer
    
    Whether this scorer uses intervals.
    This method is essential when aggregating scorers, because if several scores need intervals, a CachingDocumentIterator will be necessary.
    
    Specified by:
    
    usesIntervals in interface Scorer
    
    Returns:
    
    true if this scorer uses intervals.

Class BM25FScorer

Computing frequency data

Boldi's variant

Using this scorer

Correctness

Nested Class Summary

Field Summary

Fields inherited from class it.unimi.di.big.mg4j.search.score.AbstractWeightedScorer

Fields inherited from class it.unimi.di.big.mg4j.search.score.AbstractScorer

Constructor Summary

Method Summary

Methods inherited from class it.unimi.di.big.mg4j.search.score.AbstractWeightedScorer

Methods inherited from class it.unimi.di.big.mg4j.search.score.AbstractScorer

Methods inherited from class java.lang.Object

Methods inherited from interface it.unimi.di.big.mg4j.search.score.Scorer

Field Detail

DEFAULT_K1

DEFAULT_B

EPSILON_SCORE

k1

bByIndex

Constructor Detail

BM25FScorer

BM25FScorer

BM25FScorer

BM25FScorer

Method Detail

copy

score

score

wrap

usesIntervals