it.unimi.di.mg4j.search.score
Class ClarkeCormackScorer

java.lang.Object
  extended by it.unimi.di.mg4j.search.score.AbstractScorer
      extended by it.unimi.di.mg4j.search.score.AbstractWeightedScorer
          extended by it.unimi.di.mg4j.search.score.ClarkeCormackScorer
All Implemented Interfaces:
DelegatingScorer, Scorer, FlyweightPrototype<Scorer>

public class ClarkeCormackScorer
extends AbstractWeightedScorer
implements DelegatingScorer

Computes the Clarke–Cormack score of all interval iterators of a document. This score function is defined in Charles L.A. Clarke and Gordon V. Cormack, “Shortest-Substring Retrieval and Ranking”, ACM Transactions on Information Systems, 18(1):44−78, 2000, at page 65.

The score for each index depends on two parameters: an integer h and a double α. The score is obtained summing up a certain score assigned to all intervals in the interval iterator under examination. The score assigned to an interval is 1 if the interval has length smaller than h; otherwise, it is obtained by dividing h by the interval length, and raising the result to the power of α.

Note that the score assigned to each interval is between 0 and 1 (highest scores corresponding to best intervals). The score assigned to an interval iterator is thus bounded from above by the number of intervals; an alternative version allows one to have normalized scores (in this case, the resulting value is an average instead of a sum). A scorer with similar relative ranks, but inherently (almost) normalised is provided by VignaScorer.

Typically, one sets h=16 (or a bit larger) and α=1 (or a bit smaller), but the authors say that the method is rather stable w.r.t. changes in the values of parameters.


Field Summary
 double alpha
          The parameter alpha.
static int DEFAULT_H
          The default value for h.
 int h
          The parameter h.
 boolean normalize
          Whether the result should be normalized (i.e., between 0 and 1).
 
Fields inherited from class it.unimi.di.mg4j.search.score.AbstractWeightedScorer
index2Weight
 
Fields inherited from class it.unimi.di.mg4j.search.score.AbstractScorer
documentIterator, indexIterator
 
Constructor Summary
ClarkeCormackScorer()
          Default constructor, assigning the default values (h=DEFAULT_H, α=1) to the parameters; the resulting scorer is normalized.
ClarkeCormackScorer(int h, double alpha, boolean normalize)
          Creates a Clarke–Cormack scorer.
ClarkeCormackScorer(String h, String alpha, String normalize)
          Creates a Clarke–Cormack scorer.
 
Method Summary
 ClarkeCormackScorer copy()
           
 double score(Index index)
          Returns a score for the current document of the last document iterator given to Scorer.wrap(DocumentIterator), but considering only a given index (optional operation).
 String toString()
           
 boolean usesIntervals()
          Returns true.
 
Methods inherited from class it.unimi.di.mg4j.search.score.AbstractWeightedScorer
getWeights, score, setWeights, wrap
 
Methods inherited from class it.unimi.di.mg4j.search.score.AbstractScorer
nextDocument
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface it.unimi.di.mg4j.search.score.Scorer
getWeights, nextDocument, score, setWeights, wrap
 

Field Detail

DEFAULT_H

public static final int DEFAULT_H
The default value for h.

See Also:
Constant Field Values

h

public final int h
The parameter h.


alpha

public final double alpha
The parameter alpha.


normalize

public final boolean normalize
Whether the result should be normalized (i.e., between 0 and 1).

Constructor Detail

ClarkeCormackScorer

public ClarkeCormackScorer(int h,
                           double alpha,
                           boolean normalize)
Creates a Clarke–Cormack scorer.

Parameters:
h - the parameter h.
alpha - the parameter α.
normalize - whether the result should be normalized.

ClarkeCormackScorer

public ClarkeCormackScorer(String h,
                           String alpha,
                           String normalize)
Creates a Clarke–Cormack scorer.

Parameters:
h - the parameter h.
alpha - the parameter α.
normalize - whether the result should be normalized.

ClarkeCormackScorer

public ClarkeCormackScorer()
Default constructor, assigning the default values (h=DEFAULT_H, α=1) to the parameters; the resulting scorer is normalized.

Method Detail

copy

public ClarkeCormackScorer copy()
Specified by:
copy in interface DelegatingScorer
Specified by:
copy in interface Scorer
Specified by:
copy in interface FlyweightPrototype<Scorer>

score

public double score(Index index)
             throws IOException
Description copied from interface: Scorer
Returns a score for the current document of the last document iterator given to Scorer.wrap(DocumentIterator), but considering only a given index (optional operation).

Specified by:
score in interface Scorer
Parameters:
index - the only index to be considered.
Returns:
the score.
Throws:
IOException

toString

public String toString()
Overrides:
toString in class Object

usesIntervals

public boolean usesIntervals()
Returns true.

Specified by:
usesIntervals in interface Scorer
Returns:
true.