Package it.unimi.di.big.mg4j.index.cluster

Index partitioning and clustering.

This package contains the classes that provide the infrastructure for index partitioning and clustering. The tools the actually perform partitioning can be found in it.unimi.di.big.mg4j.tool.

An index cluster is a set of local indices that are viewed as a single index. In a lexical cluster each local index has a disjoint set of terms, but the document pointers contained in each local index refer to the same documents. In a documental cluster each index contains postings referring to a disjoint subset of a collection.

Clustering indices requires mapping term number and document pointers back and forth between the global index and local indices. This mapping is provided by documental clustering strategies and lexical clustering strategies.

Clusters are often generated by partitioning an index (albeit, for instance, Scan produces a cluster as output of the indexing process). In this case, a documental partitioning strategy or a lexical partitioning strategy explain how to divide and remap term numbers and document pointers. Of course, the clustering and partitioning strategy must be suitably matched.