Partitioning vs. Clustering

The opposite of clustering is partitioning. Partitioning an index means dividing its inverted lists using some criterion, and, not surprisingly, partitioning can be documental or lexical. MG4J provides tool that make it possible partitioning using a custom strategy specified by a Java class, so it is very easy to process indices (even large indices) and partition them in several ways (obvious splitting strategies, such as uniform strategies, are actually built-in). You should try the PartitionDocumentally and PartitionLexically tools to get an idea of what can be donem and have a look at the documentation of the it.unimi.dsi.m4j.cluster package.

Of course, the suite of combination tools used to combine batches can be used for the opposite process―taking the set of local indices making up a cluster and turning them into a single combined index, which will contain the same data of the original cluster, but in a different format. Clusters, partitioning and combining are thus several facets of the same idea―that is, that an index is actually a composite object.