Index partitioning and clustering.
This package contains the classes that provide the infrastructure for index partitioning
and clustering. The tools the actually perform partitioning can be found in
An index cluster is a set of local indices that are viewed as a single index. In a lexical cluster each local index has a disjoint set of terms, but the document pointers contained in each local index refer to the same documents. In a documental cluster each index contains postings referring to a disjoint subset of a collection.
Clustering indices requires mapping term number and document pointers back and forth between the global index and local indices. This mapping is provided by documental clustering strategies and lexical clustering strategies.
Clusters are often generated by partitioning an index (albeit, for instance,
Scan produces a cluster as output of the indexing process). In this case, a
documental partitioning strategy
or a lexical partitioning strategy
explain how to divide and remap term numbers and document pointers. Of course, the clustering and partitioning
strategy must be suitably matched.
Interface Summary Interface Description ClusteringStrategyA common ancestor interface for all clustering strategies. DocumentalClusteringStrategyA way to associate (quite bidirectionally) local and global document pointers. DocumentalPartitioningStrategyA way to associate a document with a local index out of a given set and a local document number in the local index. LexicalClusteringStrategyA way to associate a term with a local index out of a given set. LexicalPartitioningStrategyA way to associate a term number with a local index out of a given set and a local term number in the local index. PartitioningStrategyA common ancestor interface for all partitioning strategies.
Class Summary Class Description AbstractIndexClusterIndexReader ChainedLexicalClusteringStrategyA lexical clustering strategy that uses a chain of responsability to choose the local index: term maps out of a given list are inquired until one contains the given term. ContiguousDocumentalStrategyA documental partitioning and clustering strategy that partitions an index into contiguous segments. ContiguousLexicalStrategyA lexical strategy that partitions terms into contiguous segments. DocumentalClusterA abstract class representing a cluster of local indices containing separate set of documents from the same collection. DocumentalClusterIndexReaderAn index reader for a
DocumentalClusterthat concatenates the postings of its local indices.
DocumentalConcatenatedClusterDocumentIteratorA document iterator concatenating iterators from local indices. DocumentalConcatenatedClusterIndexIteratorAn index iterator concatenating iterators from local indices. DocumentalMergedClusterA
DocumentalClusterthat merges the postings of its local indices.
DocumentalMergedClusterDocumentIteratorA document iterator merging iterators from local indices. DocumentalMergedClusterIndexIteratorAn index iterator merging iterators from local indices. DocumentalStrategiesStatic utility methods for documental strategies. FrequencyLexicalStrategyA lexical strategy that creates an index containing a subset of the terms. IdentityDocumentalStrategyA documental strategy that maps identically local to global pointers and viceversa. IndexClusterAn abstract index cluster. LexicalClusterA cluster exhibiting local indices referring to the same collection, but containing different set of terms, as a single index. LexicalClusterIndexReaderAn index reader for a lexical cluster. LexicalStrategiesStatic utility methods for lexical strategies.