Package it.unimi.di.mg4j.index.cluster

Index partitioning and clustering.

See:
          Description

Interface Summary
ClusteringStrategy A common ancestor interface for all clustering strategies.
DocumentalClusteringStrategy A way to associate (quite bidirectionally) local and global document pointers.
DocumentalPartitioningStrategy A way to associate a document with a local index out of a given set and a local document number in the local index.
LexicalClusteringStrategy A way to associate a term with a local index out of a given set.
LexicalPartitioningStrategy A way to associate a term number with a local index out of a given set and a local term number in the local index.
PartitioningStrategy A common ancestor interface for all partitioning strategies.
 

Class Summary
AbstractIndexClusterIndexReader An abstract implementation of an IndexReader for an IndexCluster.
ChainedLexicalClusteringStrategy A lexical clustering strategy that uses a chain of responsability to choose the local index: term maps out of a given list are inquired until one contains the given term.
ContiguousDocumentalStrategy A documental partitioning and clustering strategy that partitions an index into contiguous segments.
ContiguousLexicalStrategy A lexical strategy that partitions terms into contiguous segments.
DocumentalCluster A abstract class representing a cluster of local indices containing separate set of documents from the same collection.
DocumentalClusterIndexReader An index reader for a DocumentalCluster.
DocumentalConcatenatedCluster A DocumentalCluster that concatenates the postings of its local indices.
DocumentalConcatenatedClusterDocumentIterator A document iterator concatenating iterators from local indices.
DocumentalConcatenatedClusterIndexIterator An index iterator concatenating iterators from local indices.
DocumentalMergedCluster A DocumentalCluster that merges the postings of its local indices.
DocumentalMergedClusterDocumentIterator A document iterator merging iterators from local indices.
DocumentalMergedClusterIndexIterator An index iterator merging iterators from local indices.
DocumentalStrategies Static utility methods for documental strategies.
IdentityDocumentalStrategy A documental strategy that maps identically local to global pointers and viceversa.
IndexCluster An abstract index cluster.
LexicalCluster A cluster exhibiting local indices referring to the same collection, but containing different set of terms, as a single index.
LexicalClusterIndexReader An index reader for a lexical cluster.
LexicalStrategies Static utility methods for lexical strategies.
 

Enum Summary
IndexCluster.PropertyKeys Symbolic names for properties of an IndexCluster.
 

Package it.unimi.di.mg4j.index.cluster Description

Index partitioning and clustering.

This package contains the classes that provide the infrastructure for index partitioning and clustering. The tools the actually perform partitioning can be found in it.unimi.di.mg4j.tool.

An index cluster is a set of local indices that are viewed as a single index. In a lexical cluster each local index has a disjoint set of terms, but the document pointers contained in each local index refer to the same documents. In a documental cluster each index contains postings referring to a disjoint subset of a collection.

Clustering indices requires mapping term number and document pointers back and forth between the global index and local indices. This mapping is provided by documental clustering strategies and lexical clustering strategies.

Clusters are often generated by partitioning an index (albeit, for instance, Scan produces a cluster as output of the indexing process). In this case, a documental partitioning strategy or a lexical partitioning strategy explain how to divide and remap term numbers and document pointers. Of course, the clustering and partitioning strategy must be suitably matched.