Package it.unimi.di.big.mg4j.index.cluster
Index partitioning and clustering.
This package contains the classes that provide the infrastructure for index partitioning
and clustering. The tools the actually perform partitioning can be found in it.unimi.di.big.mg4j.tool
.
An index cluster is a set of local indices that are viewed as a single index. In a lexical cluster each local index has a disjoint set of terms, but the document pointers contained in each local index refer to the same documents. In a documental cluster each index contains postings referring to a disjoint subset of a collection.
Clustering indices requires mapping term number and document pointers back and forth between the global index and local indices. This mapping is provided by documental clustering strategies and lexical clustering strategies.
Clusters are often generated by partitioning an index (albeit, for instance,
Scan
produces a cluster as output of the indexing process). In this case, a
documental partitioning strategy
or a lexical partitioning strategy
explain how to divide and remap term numbers and document pointers. Of course, the clustering and partitioning
strategy must be suitably matched.
-
Interface Summary Interface Description ClusteringStrategy A common ancestor interface for all clustering strategies.DocumentalClusteringStrategy A way to associate (quite bidirectionally) local and global document pointers.DocumentalPartitioningStrategy A way to associate a document with a local index out of a given set and a local document number in the local index.LexicalClusteringStrategy A way to associate a term with a local index out of a given set.LexicalPartitioningStrategy A way to associate a term number with a local index out of a given set and a local term number in the local index.PartitioningStrategy A common ancestor interface for all partitioning strategies. -
Class Summary Class Description AbstractIndexClusterIndexReader An abstract implementation of anIndexReader
for anIndexCluster
.ChainedLexicalClusteringStrategy A lexical clustering strategy that uses a chain of responsability to choose the local index: term maps out of a given list are inquired until one contains the given term.ContiguousDocumentalStrategy A documental partitioning and clustering strategy that partitions an index into contiguous segments.ContiguousLexicalStrategy A lexical strategy that partitions terms into contiguous segments.DocumentalCluster A abstract class representing a cluster of local indices containing separate set of documents from the same collection.DocumentalClusterIndexReader An index reader for aDocumentalCluster
.DocumentalConcatenatedCluster ADocumentalCluster
that concatenates the postings of its local indices.DocumentalConcatenatedClusterDocumentIterator A document iterator concatenating iterators from local indices.DocumentalConcatenatedClusterIndexIterator An index iterator concatenating iterators from local indices.DocumentalMergedCluster ADocumentalCluster
that merges the postings of its local indices.DocumentalMergedClusterDocumentIterator A document iterator merging iterators from local indices.DocumentalMergedClusterIndexIterator An index iterator merging iterators from local indices.DocumentalStrategies Static utility methods for documental strategies.FrequencyLexicalStrategy A lexical strategy that creates an index containing a subset of the terms.IdentityDocumentalStrategy A documental strategy that maps identically local to global pointers and viceversa.IndexCluster An abstract index cluster.LexicalCluster A cluster exhibiting local indices referring to the same collection, but containing different set of terms, as a single index.LexicalClusterIndexReader An index reader for a lexical cluster.LexicalStrategies Static utility methods for lexical strategies. -
Enum Summary Enum Description IndexCluster.PropertyKeys Symbolic names for properties of anIndexCluster
.