|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object it.unimi.di.mg4j.tool.PartitionDocumentally
public class PartitionDocumentally
Partitions an index documentally.
A global index is partitioned documentally by providing a DocumentalPartitioningStrategy
that specifies a destination local index for each document, and a local document pointer. The global index
is scanned, and the postings are partitioned among the local indices using the provided strategy. For instance,
a ContiguousDocumentalStrategy
divides an index into blocks of contiguous documents.
Since each local index contains a (proper) subset of the original set of documents, it contains in general a (proper)
subset of the terms in the global index. Thus, the local term numbers and the global term numbers will not in general coincide.
As a result, when a set of local indices is accessed transparently as a single index
using a DocumentalCluster
,
a call to Index.documents(int)
will throw an UnsupportedOperationException
,
because there is no way to map the global term numbers to local term numbers.
On the other hand, a call to Index.documents(CharSequence)
will be passed each local index to
build a global iterator. To speed up this phase for not-so-frequent terms, when partitioning an index you can require
the construction of Bloom filters that will be used to try to avoid
inquiring indices that do not contain a term. The precision of the filters is settable.
The property file will use a DocumentalMergedCluster
unless you provide
a ContiguousDocumentalStrategy
, in which case a
DocumentalConcatenatedCluster
will be used instead. Note that there might
be other cases in which the latter is adapt, in which case you can edit manually the property file.
Important: this class just partitions the index. No auxiliary files (most notably, term maps
or prefix maps) will be generated. Please refer to a StringMap
implementation (e.g.,
ShiftAddXorSignedStringMap
or ImmutableExternalPrefixMap
).
Warning: variable quanta are not supported by this class, as it is impossible to predict accurately
the number of bits used for positions when partitioning documentally. If you want to use variable quanta, use a
simple interleaved index without skips as an intermediate step, and pass it through Combine
.
Partitioning the file containing document sizes is a tricky issue. For the time being this class
implements a very simple policy: if DocumentalPartitioningStrategy.numberOfDocuments(int)
returns the number of
documents of the global index, the size file for a local index is generated by replacing all sizes of documents not
belonging to the index with a zero. Otherwise, the file is generated by appending in order the sizes of the documents
belonging to the index. This simple strategy works well with contiguous splitting and with splittings that do not
change the document numbers (e.g., the inverse operation of a Merge
). However, more complex splittings might give rise
to inconsistent size files.
PartitionLexically
—the same comments apply.
Field Summary | |
---|---|
static int |
DEFAULT_BUFFER_SIZE
The default buffer size for all involved indices. |
Constructor Summary | |
---|---|
PartitionDocumentally(String inputBasename,
String outputBasename,
DocumentalPartitioningStrategy strategy,
String strategyFilename,
int bloomFilterPrecision,
int bufferSize,
Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags,
Combine.IndexType indexType,
boolean skips,
int quantum,
int height,
int skipBufferOrCacheSize,
long logInterval)
|
Method Summary | |
---|---|
static void |
main(String[] arg)
|
void |
run()
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int DEFAULT_BUFFER_SIZE
Constructor Detail |
---|
public PartitionDocumentally(String inputBasename, String outputBasename, DocumentalPartitioningStrategy strategy, String strategyFilename, int bloomFilterPrecision, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval) throws ConfigurationException, IOException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, URISyntaxException, InvocationTargetException, NoSuchMethodException
ConfigurationException
IOException
ClassNotFoundException
SecurityException
InstantiationException
IllegalAccessException
URISyntaxException
InvocationTargetException
NoSuchMethodException
Method Detail |
---|
public void run() throws Exception
Exception
public static void main(String[] arg) throws ConfigurationException, IOException, URISyntaxException, ClassNotFoundException, Exception
ConfigurationException
IOException
URISyntaxException
ClassNotFoundException
Exception
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |