The opposite of clustering is partitioning.
Partitioning an index means dividing its inverted lists using some
criterion, and, not surprisingly, partitioning can be documental or
lexical. MG4J provides tool that make it possible partitioning using a
custom strategy specified by a Java class, so it is very easy to process
indices (even large indices) and partition them in several ways (obvious
splitting strategies, such as uniform strategies, are actually
built-in). You should try the
PartitionLexically tools to get an idea of what
can be donem and have a look at the documentation of the
Of course, the suite of combination tools used to combine batches can be used for the opposite process―taking the set of local indices making up a cluster and turning them into a single combined index, which will contain the same data of the original cluster, but in a different format. Clusters, partitioning and combining are thus several facets of the same idea―that is, that an index is actually a composite object.