Class IndexCluster
- java.lang.Object
-
- it.unimi.di.big.mg4j.index.Index
-
- it.unimi.di.big.mg4j.index.cluster.IndexCluster
-
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
DocumentalCluster
,LexicalCluster
public abstract class IndexCluster extends Index
An abstract index cluster. An index cluster is an index exposing transparently a list of local indices as a single global index. A cluster usually is generated by partitioning an index lexically or documentally, but nothing prevents the creation of hand-made clusters.Note that, upon creation of an instance, the main index key of all local indices is set to that instance.
An index cluster is defined by a property file. The only properties common to all index clusters are localindex, which can be specified multiple times (order is relevant) and contains the URIs of the local indices of the cluster, and strategy, which contains the filename of a serialised
ClusteringStrategy
. The indices will be loaded usingIndex.getInstance(CharSequence,boolean,boolean)
, so there is no restriction on the URIs that can be used (e.g., you can cluster a set of remote indices).Alternatively, the property strategyclass can be used to specify a class name (the class will be loaded using
MG4JClassParser
, so you can omit the package if the class is in MG4J). The class must provide a constructor with a signature like that ofChainedLexicalClusteringStrategy(Index[], BloomFilter[])
).If you plan to use global document sizes (e.g., for BM25 scoring) you will need to load them explicitly using the property
Index.UriKeys.SIZES
, which must specify a size file for the whole collection. If you are clustering a partitioned index, this is usually the original size file.Optionally, an index cluster may provide Bloom filters to reduce useless access to local indices that do not contain a term. The filters have the standard extension
BLOOM_EXTENSION
.This class exposes a static factory method that uses the indexclass property to load the appropriate implementing subclass; Bloom filters are loaded automatically.
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
IndexCluster.PropertyKeys
Symbolic names for properties of anIndexCluster
.-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.index.Index
Index.EmptyIndexIterator, Index.UriKeys
-
-
Field Summary
Fields Modifier and Type Field Description static String
BLOOM_EXTENSION
The default extension for Bloom term filters.protected Index[]
localIndex
The local indices of this cluster.static String
STRATEGY_DEFAULT_EXTENSION
The default extension of a strategy.protected BloomFilter<Void>[]
termFilter
An array of Bloom filter to reduce index access, ornull
.-
Fields inherited from class it.unimi.di.big.mg4j.index.Index
field, hasCounts, hasPayloads, hasPositions, keyIndex, maxCount, numberOfDocuments, numberOfOccurrences, numberOfPostings, numberOfTerms, payload, prefixMap, properties, singletonSet, sizes, termMap, termProcessor
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
IndexCluster(Index[] localIndex, BloomFilter<Void>[] termFilter, int numberOfDocuments, int numberOfTerms, long numberOfPostings, long numberOfOccurrences, int maxCount, Payload payload, boolean hasCounts, boolean hasPositions, TermProcessor termProcessor, String field, IntBigList sizes, Properties properties)
-
Method Summary
Modifier and Type Method Description static Index
getInstance(CharSequence basename, boolean randomAccess, boolean documentSizes, EnumMap<Index.UriKeys,String> queryProperties)
Returns a new index cluster.void
keyIndex(Index newKeyIndex)
Sets the index used as a key to retrieve intervals from iterators generated from this index.-
Methods inherited from class it.unimi.di.big.mg4j.index.Index
documents, documents, documents, getEmptyIndexIterator, getEmptyIndexIterator, getEmptyIndexIterator, getEmptyIndexIterator, getInstance, getInstance, getInstance, getInstance, getInstance, getReader, getReader, getTermProcessor
-
-
-
-
Field Detail
-
STRATEGY_DEFAULT_EXTENSION
public static final String STRATEGY_DEFAULT_EXTENSION
The default extension of a strategy.- See Also:
- Constant Field Values
-
BLOOM_EXTENSION
public static final String BLOOM_EXTENSION
The default extension for Bloom term filters.- See Also:
- Constant Field Values
-
localIndex
protected final Index[] localIndex
The local indices of this cluster.
-
termFilter
protected final BloomFilter<Void>[] termFilter
An array of Bloom filter to reduce index access, ornull
.
-
-
Constructor Detail
-
IndexCluster
protected IndexCluster(Index[] localIndex, BloomFilter<Void>[] termFilter, int numberOfDocuments, int numberOfTerms, long numberOfPostings, long numberOfOccurrences, int maxCount, Payload payload, boolean hasCounts, boolean hasPositions, TermProcessor termProcessor, String field, IntBigList sizes, Properties properties)
-
-
Method Detail
-
getInstance
public static Index getInstance(CharSequence basename, boolean randomAccess, boolean documentSizes, EnumMap<Index.UriKeys,String> queryProperties) throws org.apache.commons.configuration.ConfigurationException, IOException, ClassNotFoundException, SecurityException, URISyntaxException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
Returns a new index cluster.This method uses the LOCALINDEX property to locate the local indices, loads them (passing on
randomAccess
) and builds a new index cluster using the appropriate implementing subclass.Note that
documentSizes
is just passed to the local indices. This can be useful in documental clusters, as it allows local scoring, but it is useless in lexical clusters, as scoring is necessarily centralised. In the latter case, the propertyIndex.UriKeys.SIZES
can be used to specify a global sizes file (which usually comes from an original global index).- Parameters:
basename
- the basename.randomAccess
- whether the index should be accessible randomly.documentSizes
- if true, document sizes will be loaded (note that sometimes document sizes might be loaded anyway because the compression method for positions requires it).queryProperties
- a map containing associations betweenIndex.UriKeys
and values, ornull
.- Throws:
org.apache.commons.configuration.ConfigurationException
IOException
ClassNotFoundException
SecurityException
URISyntaxException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
-
keyIndex
public void keyIndex(Index newKeyIndex)
Description copied from class:Index
Sets the index used as a key to retrieve intervals from iterators generated from this index.This setter is a compromise between clarity of design and efficiency. Each index iterator is based on an index, and when that index is passed to
DocumentIterator.intervalIterator(Index)
, intervals corresponding to the positions of the term in the current document are returned. Analogously,DocumentIterator.indices()
returns a singleton set containing the index. However, when composing indices into clusters, often iterators generated by a local index must act as if they really belong to the global index. This method allows to set the index that is used as a key to return intervals, and that is contained inIndex.singletonSet
.Note that setting this value will only influence index readers created afterwards.
-
-