Combining indices has a counterpart: you can
partition an index into several indices. There are
many reasons to do so: you might want to split an index in several
segments containing different group of documents so to distribute the
load of a multiserver system. Or you might want to store in main memory
the posting lists of the terms that appear more often and just map into
memory the rest. MG4J has two tools that make it possible to partition
an index: PartitionLexically
and
PartitionDocumentally
. The first tool creates
several indices containing distinct subsets of words. The second tool
creates indices containing distinct subsets of documents. To make the
process as customizable as possible, both tools accept a
partitioning strategy, that is, an object that
specifies, for each term or document, where it should be stored. There
are ready-to-use strategies, but you can also write your own.
Once you have created several indices, you can see them again as a
single index using an index cluster—a type of index
that exposes a number of local indices as a single global index. A
cluster uses a clustering strategy which is often
associated with a partitioning strategy. Moreover, you can always
Merge
back the partitioned indices you created
and get back exactly the original index.
The documentation of the package
it.unimi.di.mg4j.index.cluster
and its classes is a good
starting point to understand partitioning and clusters.