Class Concatenate
- java.lang.Object
-
- it.unimi.di.big.mg4j.tool.Combine
-
- it.unimi.di.big.mg4j.tool.Concatenate
-
public final class Concatenate extends Combine
Concatenates several indices.This implementation of
Combine
concatenates the involved indices: document 0 of the first index is document 0 of the final collection, but document 0 of the second index is numbered after the number of documents in the first index, and so on. The resulting index is exactly what you would obtain by concatenating the document sequences at the origin of each index.Note that this class can be used also with a single index, making it possible to recompress easily an index using different compression flags.
- Since:
- 1.0
- Author:
- Sebastiano Vigna
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.tool.Combine
Combine.GammaCodedIntIterator, Combine.IndexType
-
-
Field Summary
-
Fields inherited from class it.unimi.di.big.mg4j.tool.Combine
additionalProperties, bufferSize, DEFAULT_BUFFER_SIZE, frequency, hasCounts, hasPayloads, hasPositions, haveSumsMaxPos, index, indexIterator, indexReader, indexWriter, inputBasename, ioFactory, maxCount, metadataOnly, needsSizes, numberOfDocuments, numberOfOccurrences, numIndices, outputBasename, p, positionArray, predictedLengthNumBits, predictedSize, quasiSuccinctIndexWriter, size, sumsMaxPos, termQueue, usedIndex, variableQuantumIndexWriter
-
-
Constructor Summary
Constructors Constructor Description Concatenate(IOFactory ioFactory, String outputBasename, String[] inputBasename, boolean metadataOnly, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval)
Concatenates several indices into one.Concatenate(IOFactory ioFactory, String outputBasename, String[] inputBasename, IntList delete, boolean metadataOnly, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval)
Concatenates several indices into one.
-
Method Summary
Modifier and Type Method Description protected long
combine(int numUsedIndices, long occurrency)
Combines several indices.protected long
combineNumberOfDocuments()
Combines the number of documents.protected int
combineSizes(OutputBitStream sizesOutputBitStream)
Combines size lists.static void
main(String[] arg)
-
-
-
Constructor Detail
-
Concatenate
public Concatenate(IOFactory ioFactory, String outputBasename, String[] inputBasename, boolean metadataOnly, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval) throws IOException, org.apache.commons.configuration.ConfigurationException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
Concatenates several indices into one.- Parameters:
ioFactory
- the factory that will be used to perform I/O.outputBasename
- the basename of the combined index.inputBasename
- the basenames of the input indices.metadataOnly
- if true, we save only metadata (term list, frequencies, global counts).bufferSize
- the buffer size for index readers.writerFlags
- the flags for the index writer.indexType
- the type of the index to build.skips
- whether to insert skips in caseinterleaved
is true.quantum
- the quantum of skipping structures; if negative, a percentage of space for variable-quantum indices (irrelevant ifskips
is false).height
- the height of skipping towers (irrelevant ifskips
is false).skipBufferOrCacheSize
- the size of the buffer used to hold temporarily inverted lists during the skipping structure construction, or the size of the bit cache used when building a quasi-succinct index.logInterval
- how often we log.- Throws:
IOException
org.apache.commons.configuration.ConfigurationException
URISyntaxException
ClassNotFoundException
SecurityException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
-
Concatenate
public Concatenate(IOFactory ioFactory, String outputBasename, String[] inputBasename, IntList delete, boolean metadataOnly, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, Combine.IndexType indexType, boolean skips, int quantum, int height, int skipBufferOrCacheSize, long logInterval) throws IOException, org.apache.commons.configuration.ConfigurationException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
Concatenates several indices into one.- Parameters:
ioFactory
- the factory that will be used to perform I/O.outputBasename
- the basename of the combined index.inputBasename
- the basenames of the input indices.delete
- a monotonically increasing list of integers representing documents that will be deleted from the output index, ornull
.metadataOnly
- if true, we save only metadata (term list, frequencies, global counts).bufferSize
- the buffer size for index readers.writerFlags
- the flags for the index writer.indexType
- the type of the index to build.skips
- whether to insert skips in caseinterleaved
is true.quantum
- the quantum of skipping structures; if negative, a percentage of space for variable-quantum indices (irrelevant ifskips
is false).height
- the height of skipping towers (irrelevant ifskips
is false).skipBufferOrCacheSize
- the size of the buffer used to hold temporarily inverted lists during the skipping structure construction, or the size of the bit cache used when building a quasi-succinct index.logInterval
- how often we log.- Throws:
IOException
org.apache.commons.configuration.ConfigurationException
URISyntaxException
ClassNotFoundException
SecurityException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
-
-
Method Detail
-
combineNumberOfDocuments
protected long combineNumberOfDocuments()
Description copied from class:Combine
Combines the number of documents.- Specified by:
combineNumberOfDocuments
in classCombine
- Returns:
- the number of documents of the combined index.
-
combineSizes
protected int combineSizes(OutputBitStream sizesOutputBitStream) throws IOException
Description copied from class:Combine
Combines size lists.- Specified by:
combineSizes
in classCombine
- Returns:
- the maximum size of a document in the combined index.
- Throws:
IOException
-
combine
protected long combine(int numUsedIndices, long occurrency) throws IOException
Description copied from class:Combine
Combines several indices.When this method is called, exactly
numUsedIndices
entries ofCombine.usedIndex
contain, in increasing order, the indices containing inverted lists for the current term. Implementations of this method must combine the inverted list and return the total frequency.- Specified by:
combine
in classCombine
- Parameters:
numUsedIndices
- the number of valid entries inCombine.usedIndex
.occurrency
- the occurrency of the term (used only when buildingCombine.IndexType.QUASI_SUCCINCT
indices).- Returns:
- the total frequency.
- Throws:
IOException
-
main
public static void main(String[] arg) throws org.apache.commons.configuration.ConfigurationException, SecurityException, com.martiansoftware.jsap.JSAPException, IOException, URISyntaxException, ClassNotFoundException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
- Throws:
org.apache.commons.configuration.ConfigurationException
SecurityException
com.martiansoftware.jsap.JSAPException
IOException
URISyntaxException
ClassNotFoundException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
-
-