IndexCluster (MG4J 5.1)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

it.unimi.di.mg4j.index.cluster
Class IndexCluster

java.lang.Object
  it.unimi.di.mg4j.index.Index
      it.unimi.di.mg4j.index.cluster.IndexCluster

All Implemented Interfaces:: Serializable

Direct Known Subclasses:: DocumentalCluster, LexicalCluster

public abstract class IndexCluster
extends Index
extends Index

An abstract index cluster. An index cluster is an index exposing transparently a list of local indices as a single global index. A cluster usually is generated by partitioning an index lexically or documentally, but nothing prevents the creation of hand-made clusters.

Note that, upon creation of an instance, the main index key of all local indices is set to that instance.

An index cluster is defined by a property file. The only properties common to all index clusters are localindex, which can be specified multiple times (order is relevant) and contains the URIs of the local indices of the cluster, and strategy, which contains the filename of a serialised ClusteringStrategy. The indices will be loaded using Index.getInstance(CharSequence,boolean,boolean), so there is no restriction on the URIs that can be used (e.g., you can cluster a set of remote indices).

Alternatively, the property strategyclass can be used to specify a class name (the class will be loaded using MG4JClassParser, so you can omit the package if the class is in MG4J). The class must provide a constructor with a signature like that of ChainedLexicalClusteringStrategy.ChainedLexicalClusteringStrategy(Index[], BloomFilter[])).

If you plan to use global document sizes (e.g., for BM25 scoring) you will need to load them explicitly using the property Index.UriKeys.SIZES, which must specify a size file for the whole collection. If you are clustering a partitioned index, this is usually the original size file.

Optionally, an index cluster may provide Bloom filters to reduce useless access to local indices that do not contain a term. The filters have the standard extension BLOOM_EXTENSION.

This class exposes a static factory method that uses the indexclass property to load the appropriate implementing subclass; Bloom filters are loaded automatically.

See Also:: Serialized Form

Nested Class Summary
`static class`	`IndexCluster.PropertyKeys` Symbolic names for properties of an `IndexCluster`.

Nested classes/interfaces inherited from class it.unimi.di.mg4j.index.Index
`Index.EmptyIndexIterator, Index.UriKeys`

Field Summary
`static String`	`BLOOM_EXTENSION` The default extension for Bloom term filters.
`protected Index[]`	`localIndex` The local indices of this cluster.
`static String`	`STRATEGY_DEFAULT_EXTENSION` The default extension of a strategy.
`protected BloomFilter[]`	`termFilter` An array of Bloom filter to reduce index access, or `null`.

Fields inherited from class it.unimi.di.mg4j.index.Index
`field, hasCounts, hasPayloads, hasPositions, keyIndex, maxCount, numberOfDocuments, numberOfOccurrences, numberOfPostings, numberOfTerms, payload, prefixMap, properties, singletonSet, sizes, termMap, termProcessor`

Constructor Summary
`protected`	`IndexCluster(Index[] localIndex, BloomFilter[] termFilter, int numberOfDocuments, int numberOfTerms, long numberOfPostings, long numberOfOccurrences, int maxCount, Payload payload, boolean hasCounts, boolean hasPositions, TermProcessor termProcessor, String field, IntList sizes, Properties properties)`

Method Summary
`static Index`	`getInstance(CharSequence basename, boolean randomAccess, boolean documentSizes, EnumMap<Index.UriKeys,String> queryProperties)` Returns a new index cluster.
`void`	`keyIndex(Index newKeyIndex)` Sets the index used as a key to retrieve intervals from iterators generated from this index.

Methods inherited from class it.unimi.di.mg4j.index.Index
`documents, documents, documents, getEmptyIndexIterator, getEmptyIndexIterator, getEmptyIndexIterator, getEmptyIndexIterator, getInstance, getInstance, getInstance, getInstance, getInstance, getReader, getReader, getTermProcessor`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

STRATEGY_DEFAULT_EXTENSION

public static final String STRATEGY_DEFAULT_EXTENSION

The default extension of a strategy.

See Also:: Constant Field Values

BLOOM_EXTENSION

public static final String BLOOM_EXTENSION

The default extension for Bloom term filters.

See Also:: Constant Field Values

localIndex

protected final Index[] localIndex

The local indices of this cluster.

termFilter

protected final BloomFilter[] termFilter

An array of Bloom filter to reduce index access, or null.

Constructor Detail

IndexCluster

protected IndexCluster(Index[] localIndex,
                       BloomFilter[] termFilter,
                       int numberOfDocuments,
                       int numberOfTerms,
                       long numberOfPostings,
                       long numberOfOccurrences,
                       int maxCount,
                       Payload payload,
                       boolean hasCounts,
                       boolean hasPositions,
                       TermProcessor termProcessor,
                       String field,
                       IntList sizes,
                       Properties properties)

Method Detail

getInstance

public static Index getInstance(CharSequence basename,
                                boolean randomAccess,
                                boolean documentSizes,
                                EnumMap<Index.UriKeys,String> queryProperties)
                         throws ConfigurationException,
                                IOException,
                                ClassNotFoundException,
                                SecurityException,
                                URISyntaxException,
                                InstantiationException,
                                IllegalAccessException,
                                InvocationTargetException,
                                NoSuchMethodException

Returns a new index cluster.

This method uses the LOCALINDEX property to locate the local indices, loads them (passing on randomAccess) and builds a new index cluster using the appropriate implementing subclass.

Note that documentSizes is just passed to the local indices. This can be useful in documental clusters, as it allows local scoring, but it is useless in lexical clusters, as scoring is necessarily centralised. In the latter case, the property Index.UriKeys.SIZES can be used to specify a global sizes file (which usually comes from an original global index).

Parameters:: basename - the basename.; randomAccess - whether the index should be accessible randomly.; documentSizes - if true, document sizes will be loaded (note that sometimes document sizes might be loaded anyway because the compression method for positions requires it).; queryProperties - a map containing associations between Index.UriKeys and values, or null.
Throws:: ConfigurationException; IOException; ClassNotFoundException; SecurityException; URISyntaxException; InstantiationException; IllegalAccessException; InvocationTargetException; NoSuchMethodException

keyIndex

public void keyIndex(Index newKeyIndex)

Description copied from class: Index

Sets the index used as a key to retrieve intervals from iterators generated from this index.

This setter is a compromise between clarity of design and efficiency. Each index iterator is based on an index, and when that index is passed to DocumentIterator.intervalIterator(Index), intervals corresponding to the positions of the term in the current document are returned. Analogously, DocumentIterator.indices() returns a singleton set containing the index. However, when composing indices into clusters, often iterators generated by a local index must act as if they really belong to the global index. This method allows to set the index that is used as a key to return intervals, and that is contained in Index.singletonSet.

Note that setting this value will only influence index readers created afterwards.

Overrides:: keyIndex in class Index

Parameters:: newKeyIndex - the new index to be used as a key for interval retrieval.