Class IndexCluster

  • All Implemented Interfaces:
    Serializable
    Direct Known Subclasses:
    DocumentalCluster, LexicalCluster

    public abstract class IndexCluster
    extends Index
    An abstract index cluster. An index cluster is an index exposing transparently a list of local indices as a single global index. A cluster usually is generated by partitioning an index lexically or documentally, but nothing prevents the creation of hand-made clusters.

    Note that, upon creation of an instance, the main index key of all local indices is set to that instance.

    An index cluster is defined by a property file. The only properties common to all index clusters are localindex, which can be specified multiple times (order is relevant) and contains the URIs of the local indices of the cluster, and strategy, which contains the filename of a serialised ClusteringStrategy. The indices will be loaded using Index.getInstance(CharSequence,boolean,boolean), so there is no restriction on the URIs that can be used (e.g., you can cluster a set of remote indices).

    Alternatively, the property strategyclass can be used to specify a class name (the class will be loaded using MG4JClassParser, so you can omit the package if the class is in MG4J). The class must provide a constructor with a signature like that of ChainedLexicalClusteringStrategy(Index[], BloomFilter[])).

    If you plan to use global document sizes (e.g., for BM25 scoring) you will need to load them explicitly using the property Index.UriKeys.SIZES, which must specify a size file for the whole collection. If you are clustering a partitioned index, this is usually the original size file.

    Optionally, an index cluster may provide Bloom filters to reduce useless access to local indices that do not contain a term. The filters have the standard extension BLOOM_EXTENSION.

    This class exposes a static factory method that uses the indexclass property to load the appropriate implementing subclass; Bloom filters are loaded automatically.

    See Also:
    Serialized Form
    • Field Detail

      • STRATEGY_DEFAULT_EXTENSION

        public static final String STRATEGY_DEFAULT_EXTENSION
        The default extension of a strategy.
        See Also:
        Constant Field Values
      • BLOOM_EXTENSION

        public static final String BLOOM_EXTENSION
        The default extension for Bloom term filters.
        See Also:
        Constant Field Values
      • localIndex

        protected final Index[] localIndex
        The local indices of this cluster.
      • termFilter

        protected final BloomFilter<Void>[] termFilter
        An array of Bloom filter to reduce index access, or null.
    • Constructor Detail

      • IndexCluster

        protected IndexCluster​(Index[] localIndex,
                               BloomFilter<Void>[] termFilter,
                               int numberOfDocuments,
                               int numberOfTerms,
                               long numberOfPostings,
                               long numberOfOccurrences,
                               int maxCount,
                               Payload payload,
                               boolean hasCounts,
                               boolean hasPositions,
                               TermProcessor termProcessor,
                               String field,
                               IntBigList sizes,
                               Properties properties)