it.unimi.di.mg4j.tool
Class Concatenate

java.lang.Object
  extended by it.unimi.di.mg4j.tool.Combine
      extended by it.unimi.di.mg4j.tool.Concatenate

public final class Concatenate
extends Combine

Concatenates several indices.

This implementation of Combine concatenates the involved indices: document 0 of the first index is document 0 of the final collection, but document 0 of the second index is numbered after the number of documents in the first index, and so on. The resulting index is exactly what you would obtain by concatenating the document sequences at the origin of each index.

Note that this class can be used also with a single index, making it possible to recompress easily an index using different compression flags.

Since:
1.0
Author:
Sebastiano Vigna

Nested Class Summary
 
Nested classes/interfaces inherited from class it.unimi.di.mg4j.tool.Combine
Combine.GammaCodedIntIterator, Combine.IndexType
 
Field Summary
 
Fields inherited from class it.unimi.di.mg4j.tool.Combine
additionalProperties, bufferSize, DEFAULT_BUFFER_SIZE, frequency, hasCounts, hasPayloads, hasPositions, haveSumsMaxPos, index, indexIterator, indexReader, indexWriter, inputBasename, ioFactory, maxCount, metadataOnly, needsSizes, numberOfDocuments, numberOfOccurrences, numIndices, outputBasename, p, positionArray, predictedLengthNumBits, predictedSize, quasiSuccinctIndexWriter, size, sumsMaxPos, termQueue, usedIndex, variableQuantumIndexWriter
 
Constructor Summary
Concatenate(IOFactory ioFactory, String outputBasename, String[] inputBasename, boolean metadataOnly, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval)
          Concatenates several indices into one.
Concatenate(IOFactory ioFactory, String outputBasename, String[] inputBasename, IntList delete, boolean metadataOnly, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval)
          Concatenates several indices into one.
 
Method Summary
protected  int combine(int numUsedIndices, long occurrency)
          Combines several indices.
protected  int combineNumberOfDocuments()
          Combines the number of documents.
protected  int combineSizes(OutputBitStream sizesOutputBitStream)
          Combines size lists.
static void main(String[] arg)
           
 
Methods inherited from class it.unimi.di.mg4j.tool.Combine
main, run, sizes
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Concatenate

public Concatenate(IOFactory ioFactory,
                   String outputBasename,
                   String[] inputBasename,
                   boolean metadataOnly,
                   int bufferSize,
                   Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags,
                   Combine.IndexType indexType,
                   boolean skips,
                   int quantum,
                   int height,
                   int skipBufferOrCacheSize,
                   long logInterval)
            throws IOException,
                   ConfigurationException,
                   URISyntaxException,
                   ClassNotFoundException,
                   SecurityException,
                   InstantiationException,
                   IllegalAccessException,
                   InvocationTargetException,
                   NoSuchMethodException
Concatenates several indices into one.

Parameters:
ioFactory - the factory that will be used to perform I/O.
outputBasename - the basename of the combined index.
inputBasename - the basenames of the input indices.
metadataOnly - if true, we save only metadata (term list, frequencies, global counts).
bufferSize - the buffer size for index readers.
writerFlags - the flags for the index writer.
indexType - the type of the index to build.
skips - whether to insert skips in case interleaved is true.
quantum - the quantum of skipping structures; if negative, a percentage of space for variable-quantum indices (irrelevant if skips is false).
height - the height of skipping towers (irrelevant if skips is false).
skipBufferOrCacheSize - the size of the buffer used to hold temporarily inverted lists during the skipping structure construction, or the size of the bit cache used when building a quasi-succinct index.
logInterval - how often we log.
Throws:
IOException
ConfigurationException
URISyntaxException
ClassNotFoundException
SecurityException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException

Concatenate

public Concatenate(IOFactory ioFactory,
                   String outputBasename,
                   String[] inputBasename,
                   IntList delete,
                   boolean metadataOnly,
                   int bufferSize,
                   Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags,
                   Combine.IndexType indexType,
                   boolean skips,
                   int quantum,
                   int height,
                   int skipBufferOrCacheSize,
                   long logInterval)
            throws IOException,
                   ConfigurationException,
                   URISyntaxException,
                   ClassNotFoundException,
                   SecurityException,
                   InstantiationException,
                   IllegalAccessException,
                   InvocationTargetException,
                   NoSuchMethodException
Concatenates several indices into one.

Parameters:
ioFactory - the factory that will be used to perform I/O.
outputBasename - the basename of the combined index.
inputBasename - the basenames of the input indices.
delete - a monotonically increasing list of integers representing documents that will be deleted from the output index, or null.
metadataOnly - if true, we save only metadata (term list, frequencies, global counts).
bufferSize - the buffer size for index readers.
writerFlags - the flags for the index writer.
indexType - the type of the index to build.
skips - whether to insert skips in case interleaved is true.
quantum - the quantum of skipping structures; if negative, a percentage of space for variable-quantum indices (irrelevant if skips is false).
height - the height of skipping towers (irrelevant if skips is false).
skipBufferOrCacheSize - the size of the buffer used to hold temporarily inverted lists during the skipping structure construction, or the size of the bit cache used when building a quasi-succinct index.
logInterval - how often we log.
Throws:
IOException
ConfigurationException
URISyntaxException
ClassNotFoundException
SecurityException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
Method Detail

combineNumberOfDocuments

protected int combineNumberOfDocuments()
Description copied from class: Combine
Combines the number of documents.

Specified by:
combineNumberOfDocuments in class Combine
Returns:
the number of documents of the combined index.

combineSizes

protected int combineSizes(OutputBitStream sizesOutputBitStream)
                    throws IOException
Description copied from class: Combine
Combines size lists.

Specified by:
combineSizes in class Combine
Returns:
the maximum size of a document in the combined index.
Throws:
IOException

combine

protected int combine(int numUsedIndices,
                      long occurrency)
               throws IOException
Description copied from class: Combine
Combines several indices.

When this method is called, exactly numUsedIndices entries of Combine.usedIndex contain, in increasing order, the indices containing inverted lists for the current term. Implementations of this method must combine the inverted list and return the total frequency.

Specified by:
combine in class Combine
Parameters:
numUsedIndices - the number of valid entries in Combine.usedIndex.
occurrency - the occurrency of the term (used only when building Combine.IndexType.QUASI_SUCCINCT indices).
Returns:
the total frequency.
Throws:
IOException

main

public static void main(String[] arg)
                 throws ConfigurationException,
                        SecurityException,
                        com.martiansoftware.jsap.JSAPException,
                        IOException,
                        URISyntaxException,
                        ClassNotFoundException,
                        InstantiationException,
                        IllegalAccessException,
                        InvocationTargetException,
                        NoSuchMethodException,
                        IllegalArgumentException
Throws:
ConfigurationException
SecurityException
com.martiansoftware.jsap.JSAPException
IOException
URISyntaxException
ClassNotFoundException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
IllegalArgumentException