Chapter 4. Clusters & Partitioning

Table of Contents

Documental vs. Lexical
Partitioning vs. Clustering
Creating a Cluster

Documental vs. Lexical

MG4J provides a completely generic way of combining indices into clusters. This feature can be used, for instance, to support incremental indexing, but it goes way beyond that. An index is just a composite in the design-pattern sense, and can be built by combining different indices. For instance, you can index separately two sets of documents and then use the two resulting indices as a single index using a concatenation-based cluster index. Alternatively, you can actually combine the indices, getting a new index.

More generally, a cluster exhibits a set of local indices as a single global index. Clusters, moreover, can be documental or lexical. In a documental cluster, each document of the global index appears exactly once in each local index. In a lexical cluster, each term of the global index appears exactly once in each local index. These two types of clusters satisfy different needs: documental clusters, for instance, can be used to keep a set of documents with high static rank in a separate index living on faster storage, whereas lexical cluster can be used to load in memory the inverted lists of terms that appear more frequently in user queries.