Class ClarkeCormackScorer

  • All Implemented Interfaces:
    DelegatingScorer, Scorer, FlyweightPrototype<Scorer>

    public class ClarkeCormackScorer
    extends AbstractWeightedScorer
    implements DelegatingScorer
    Computes the Clarke–Cormack score of all interval iterators of a document. This score function is defined in Charles L.A. Clarke and Gordon V. Cormack, “Shortest-Substring Retrieval and Ranking”, ACM Transactions on Information Systems, 18(1):44−78, 2000, at page 65.

    The score for each index depends on two parameters: an integer h and a double α. The score is obtained summing up a certain score assigned to all intervals in the interval iterator under examination. The score assigned to an interval is 1 if the interval has length smaller than h; otherwise, it is obtained by dividing h by the interval length, and raising the result to the power of α.

    Note that the score assigned to each interval is between 0 and 1 (highest scores corresponding to best intervals). The score assigned to an interval iterator is thus bounded from above by the number of intervals; an alternative version allows one to have normalized scores (in this case, the resulting value is an average instead of a sum). A scorer with similar relative ranks, but inherently (almost) normalised is provided by VignaScorer.

    Typically, one sets h=16 (or a bit larger) and α=1 (or a bit smaller), but the authors say that the method is rather stable w.r.t. changes in the values of parameters.

    • Field Detail

      • h

        public final int h
        The parameter h.
      • alpha

        public final double alpha
        The parameter alpha.
      • normalize

        public final boolean normalize
        Whether the result should be normalized (i.e., between 0 and 1).
    • Constructor Detail

      • ClarkeCormackScorer

        public ClarkeCormackScorer​(int h,
                                   double alpha,
                                   boolean normalize)
        Creates a Clarke–Cormack scorer.
        Parameters:
        h - the parameter h.
        alpha - the parameter α.
        normalize - whether the result should be normalized.
      • ClarkeCormackScorer

        public ClarkeCormackScorer​(String h,
                                   String alpha,
                                   String normalize)
        Creates a Clarke–Cormack scorer.
        Parameters:
        h - the parameter h.
        alpha - the parameter α.
        normalize - whether the result should be normalized.
      • ClarkeCormackScorer

        public ClarkeCormackScorer()
        Default constructor, assigning the default values (h=DEFAULT_H, α=1) to the parameters; the resulting scorer is normalized.