|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object it.unimi.di.mg4j.index.Index it.unimi.di.mg4j.index.cluster.IndexCluster
public abstract class IndexCluster
An abstract index cluster. An index cluster is an index exposing transparently a list of local indices as a single global index. A cluster usually is generated by partitioning an index lexically or documentally, but nothing prevents the creation of hand-made clusters.
Note that, upon creation of an instance, the main index key of all local indices is set to that instance.
An index cluster is defined by a property file. The only properties common
to all index clusters are localindex, which can be specified multiple
times (order is relevant) and contains the URIs of the local indices of the cluster,
and strategy, which contains the filename of a serialised ClusteringStrategy
.
The indices will be loaded using Index.getInstance(CharSequence,boolean,boolean)
,
so there is no restriction on the URIs that can be used (e.g., you can cluster
a set of remote indices).
Alternatively, the property strategyclass can be used to specify a class name (the class will
be loaded using MG4JClassParser
, so you can omit the package if the class is in MG4J). The class
must provide a constructor with a signature like that of
ChainedLexicalClusteringStrategy.ChainedLexicalClusteringStrategy(Index[], BloomFilter[])
).
If you plan to use global document sizes (e.g., for BM25 scoring) you will need
to load them explicitly using the property Index.UriKeys.SIZES
, which must specify
a size file for the whole collection. If you are clustering a partitioned index,
this is usually the original size file.
Optionally, an index cluster may provide Bloom filters
to reduce useless access to local indices that do not contain a term. The filters
have the standard extension BLOOM_EXTENSION
.
This class exposes a static factory method that uses the indexclass property to load the appropriate implementing subclass; Bloom filters are loaded automatically.
Nested Class Summary | |
---|---|
static class |
IndexCluster.PropertyKeys
Symbolic names for properties of an IndexCluster . |
Nested classes/interfaces inherited from class it.unimi.di.mg4j.index.Index |
---|
Index.EmptyIndexIterator, Index.UriKeys |
Field Summary | |
---|---|
static String |
BLOOM_EXTENSION
The default extension for Bloom term filters. |
protected Index[] |
localIndex
The local indices of this cluster. |
static String |
STRATEGY_DEFAULT_EXTENSION
The default extension of a strategy. |
protected BloomFilter[] |
termFilter
An array of Bloom filter to reduce index access, or null . |
Fields inherited from class it.unimi.di.mg4j.index.Index |
---|
field, hasCounts, hasPayloads, hasPositions, keyIndex, maxCount, numberOfDocuments, numberOfOccurrences, numberOfPostings, numberOfTerms, payload, prefixMap, properties, singletonSet, sizes, termMap, termProcessor |
Constructor Summary | |
---|---|
protected |
IndexCluster(Index[] localIndex,
BloomFilter[] termFilter,
int numberOfDocuments,
int numberOfTerms,
long numberOfPostings,
long numberOfOccurrences,
int maxCount,
Payload payload,
boolean hasCounts,
boolean hasPositions,
TermProcessor termProcessor,
String field,
IntList sizes,
Properties properties)
|
Method Summary | |
---|---|
static Index |
getInstance(CharSequence basename,
boolean randomAccess,
boolean documentSizes,
EnumMap<Index.UriKeys,String> queryProperties)
Returns a new index cluster. |
void |
keyIndex(Index newKeyIndex)
Sets the index used as a key to retrieve intervals from iterators generated from this index. |
Methods inherited from class it.unimi.di.mg4j.index.Index |
---|
documents, documents, documents, getEmptyIndexIterator, getEmptyIndexIterator, getEmptyIndexIterator, getEmptyIndexIterator, getInstance, getInstance, getInstance, getInstance, getInstance, getReader, getReader, getTermProcessor |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String STRATEGY_DEFAULT_EXTENSION
public static final String BLOOM_EXTENSION
protected final Index[] localIndex
protected final BloomFilter[] termFilter
null
.
Constructor Detail |
---|
protected IndexCluster(Index[] localIndex, BloomFilter[] termFilter, int numberOfDocuments, int numberOfTerms, long numberOfPostings, long numberOfOccurrences, int maxCount, Payload payload, boolean hasCounts, boolean hasPositions, TermProcessor termProcessor, String field, IntList sizes, Properties properties)
Method Detail |
---|
public static Index getInstance(CharSequence basename, boolean randomAccess, boolean documentSizes, EnumMap<Index.UriKeys,String> queryProperties) throws ConfigurationException, IOException, ClassNotFoundException, SecurityException, URISyntaxException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
This method uses the LOCALINDEX property to locate the local indices,
loads them (passing on randomAccess
) and
builds a new index cluster using the appropriate implementing subclass.
Note that documentSizes
is just passed to the local indices. This can be useful
in documental clusters, as it allows local scoring, but it is useless in
lexical clusters, as scoring is necessarily centralised. In the
latter case, the property Index.UriKeys.SIZES
can be used to specify a global sizes file (which
usually comes from an original global index).
basename
- the basename.randomAccess
- whether the index should be accessible randomly.documentSizes
- if true, document sizes will be loaded (note that sometimes document sizes
might be loaded anyway because the compression method for positions requires it).queryProperties
- a map containing associations between Index.UriKeys
and values, or null
.
ConfigurationException
IOException
ClassNotFoundException
SecurityException
URISyntaxException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
public void keyIndex(Index newKeyIndex)
Index
This setter is a compromise between clarity of design and efficiency.
Each index iterator is based on an index, and when that index is passed
to DocumentIterator.intervalIterator(Index)
, intervals corresponding
to the positions of the term in the current document are returned. Analogously,
DocumentIterator.indices()
returns a singleton
set containing the index. However, when composing indices into clusters,
often iterators generated by a local index must act as if they really belong
to the global index. This method allows to set the index that is used as
a key to return intervals, and that is contained in Index.singletonSet
.
Note that setting this value will only influence index readers created afterwards.
keyIndex
in class Index
newKeyIndex
- the new index to be used as a key for interval retrieval.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |