Class QuasiSuccinctIndexWriter
- java.lang.Object
-
- it.unimi.di.big.mg4j.index.QuasiSuccinctIndexWriter
-
- All Implemented Interfaces:
IndexWriter
public class QuasiSuccinctIndexWriter extends Object implements IndexWriter
An index writer for quasi-succinct indices.- Author:
- Sebastiano Vigna
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
QuasiSuccinctIndexWriter.Accumulator
protected static class
QuasiSuccinctIndexWriter.LongWordCache
static class
QuasiSuccinctIndexWriter.LongWordOutputBitStream
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_CACHE_SIZE
The default size of the bit cache.
-
Constructor Summary
Constructors Constructor Description QuasiSuccinctIndexWriter(IOFactory ioFactory, CharSequence basename, long numberOfDocuments, int log2Quantum, int cacheSize, Map<CompressionFlags.Component,CompressionFlags.Coding> flags, ByteOrder byteOrder)
Creates a new index writer, with the specified basename.
-
Method Summary
Modifier and Type Method Description void
close()
Closes this index writer, completing the index creation process and releasing all resources.static int
lowerBits(long length, long upperBound, boolean strict)
Returns the number of lower bits for the Elias–Fano encoding of a list of given length, upper bound and strictness.OutputBitStream
newDocumentRecord()
Starts a new document record.long
newInvertedList()
Starts a new inverted list.void
newInvertedList(long frequency, long occurrency, long sumMaxPos)
Starts a new inverted list.static long
numberOfPointers(long length, long upperBound, int log2Quantum, boolean strict, boolean indexZeroes)
Returns the number of forward or skip pointers to the Elias–Fano encoding of a list of given length, upper bound and strictness.static int
pointerSize(long length, long upperBound, boolean strict, boolean indexZeroes)
Returns the size in bits of forward or skip pointers to the Elias–Fano encoding of a list of given length, upper bound and strictness.void
printStats(PrintStream stats)
Writes to the given print stream statistical information about the index just built.Properties
properties()
Returns properties of the index generated by this index writer.void
writeDocumentPointer(OutputBitStream out, long pointer)
Writes a document pointer.void
writeDocumentPositions(OutputBitStream unused, int[] position, int offset, int count, int docSize)
Writes the positions of the occurrences of the current term in the current document to the givenOutputBitStream
.void
writeFrequency(long frequency)
Writes the frequency.void
writePayload(OutputBitStream unused, Payload payload)
Writes the payload for the current document.void
writePositionCount(OutputBitStream unused, int count)
Writes the count of the occurrences of the current term in the current document to the givenOutputBitStream
.long
writtenBits()
Returns the overall number of bits written onto the underlying stream(s).
-
-
-
Field Detail
-
DEFAULT_CACHE_SIZE
public static final int DEFAULT_CACHE_SIZE
The default size of the bit cache.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
QuasiSuccinctIndexWriter
public QuasiSuccinctIndexWriter(IOFactory ioFactory, CharSequence basename, long numberOfDocuments, int log2Quantum, int cacheSize, Map<CompressionFlags.Component,CompressionFlags.Coding> flags, ByteOrder byteOrder) throws IOException
Creates a new index writer, with the specified basename.- Parameters:
ioFactory
- the factory that will be used to perform I/O.basename
- the basename.numberOfDocuments
- the number of documents in the collection to be indexed.log2Quantum
- the logarithm of the quantum.cacheSize
- the size in byte of the bit caches.byteOrder
- the byte order of the index (ifnull
,ByteOrder.nativeOrder()
).- Throws:
IOException
-
-
Method Detail
-
lowerBits
public static int lowerBits(long length, long upperBound, boolean strict)
Returns the number of lower bits for the Elias–Fano encoding of a list of given length, upper bound and strictness.- Parameters:
length
- the number of elements of the list.upperBound
- an upper bound for the elements of the list.strict
- if true, the elements of the list are strictly increasing, and the returned number of bits is for the strict representation (e.g., storing the k-th element decreased by k).- Returns:
- the number of bits for the Elias–Fano encoding of a list with the specified parameters.
-
pointerSize
public static int pointerSize(long length, long upperBound, boolean strict, boolean indexZeroes)
Returns the size in bits of forward or skip pointers to the Elias–Fano encoding of a list of given length, upper bound and strictness.- Parameters:
length
- the number of elements of the list.upperBound
- an upper bound for the elements of the list.strict
- if true, the elements of the list are strictly increasing, and the returned number of bits is for the strict representation (e.g., storing the k-th element decreased by k).indexZeroes
- if true, the number of bits for skip pointers is returned; otherwise, the number of bits for forward pointers is returned.- Returns:
- the size of bits of forward or skip pointers the Elias–Fano encoding of a list with the specified parameters.
-
numberOfPointers
public static long numberOfPointers(long length, long upperBound, int log2Quantum, boolean strict, boolean indexZeroes)
Returns the number of forward or skip pointers to the Elias–Fano encoding of a list of given length, upper bound and strictness.- Parameters:
length
- the number of elements of the list.upperBound
- an upper bound for the elements of the list.log2Quantum
- the logarithm of the quantum size.strict
- if true, the elements of the list are strictly increasing, and the returned number of bits is for the strict representation (e.g., storing the k-th element decreased by k).indexZeroes
- if true, an upper bound on the number of skip pointers is returned; otherwise, the (exact) number of forward pointers is returned.- Returns:
- an upper bound on the number of skip pointers or the (exact) number of forward pointers.
-
newInvertedList
public void newInvertedList(long frequency, long occurrency, long sumMaxPos) throws IOException
Starts a new inverted list. The previous inverted list, if any, is actually written to the underlying bit stream.This method provides additional information which is necessary to build the posting list. The information can be omitted if only part of the index is being written (e.g., no positions or even no counts and positions).
- Parameters:
frequency
- the frequency of the inverted list.occurrency
- the occurrency of the inverted list (use -1 if you are not writing counts).sumMaxPos
- the sum of the maximum position in each document (unused if positions are not indexed).- Throws:
IllegalStateException
- if too few records were written for the previous inverted list.IOException
- See Also:
IndexWriter.newInvertedList()
-
newInvertedList
public long newInvertedList() throws IOException
Description copied from interface:IndexWriter
Starts a new inverted list. The previous inverted list, if any, is actually written to the underlying bit stream.- Specified by:
newInvertedList
in interfaceIndexWriter
- Returns:
- the position (in bits) of the underlying bit stream where the new inverted list starts.
- Throws:
IOException
-
writeFrequency
public void writeFrequency(long frequency)
Description copied from interface:IndexWriter
Writes the frequency.- Specified by:
writeFrequency
in interfaceIndexWriter
- Parameters:
frequency
- the (positive) number of document records that this inverted list will contain.
-
newDocumentRecord
public OutputBitStream newDocumentRecord() throws IOException
Description copied from interface:IndexWriter
Starts a new document record.This method must be called exactly exactly f times, where f is the frequency specified with
IndexWriter.writeFrequency(long)
.- Specified by:
newDocumentRecord
in interfaceIndexWriter
- Returns:
- the output bit stream where the next document record data should be written, if necessary, or
null
, ifIndexWriter.writeDocumentPointer(OutputBitStream, long)
ignores its first argument. - Throws:
IOException
-
writeDocumentPointer
public void writeDocumentPointer(OutputBitStream out, long pointer) throws IOException
Description copied from interface:IndexWriter
Writes a document pointer.This method must be called immediately after
IndexWriter.newDocumentRecord()
.- Specified by:
writeDocumentPointer
in interfaceIndexWriter
- Parameters:
out
- the output bit stream where the pointer will be written.pointer
- the document pointer.- Throws:
IOException
-
writePayload
public void writePayload(OutputBitStream unused, Payload payload) throws IOException
Description copied from interface:IndexWriter
Writes the payload for the current document.This method must be called immediately after
IndexWriter.writeDocumentPointer(OutputBitStream, long)
.- Specified by:
writePayload
in interfaceIndexWriter
- Parameters:
unused
- the output bit stream where the payload will be written.payload
- the payload.- Throws:
IOException
-
writePositionCount
public void writePositionCount(OutputBitStream unused, int count) throws IOException
Description copied from interface:IndexWriter
Writes the count of the occurrences of the current term in the current document to the givenOutputBitStream
.- Specified by:
writePositionCount
in interfaceIndexWriter
- Parameters:
unused
- the output stream where the occurrences should be written.count
- the count.- Throws:
IOException
-
writeDocumentPositions
public void writeDocumentPositions(OutputBitStream unused, int[] position, int offset, int count, int docSize) throws IOException
Description copied from interface:IndexWriter
Writes the positions of the occurrences of the current term in the current document to the givenOutputBitStream
.- Specified by:
writeDocumentPositions
in interfaceIndexWriter
- Parameters:
unused
- the output stream where the occurrences should be written.position
- the position vector (a sequence of strictly increasing natural numbers).offset
- the first valid entry inposition
.count
- the number of valid entries inposition
starting fromoffset
.docSize
- the size of the current document (only for Golomb and interpolative coding; you can safely pass -1 otherwise).- Throws:
IOException
-
writtenBits
public long writtenBits()
Description copied from interface:IndexWriter
Returns the overall number of bits written onto the underlying stream(s).- Specified by:
writtenBits
in interfaceIndexWriter
- Returns:
- the number of bits written, according to the variables keeping statistical records.
-
properties
public Properties properties()
Description copied from interface:IndexWriter
Returns properties of the index generated by this index writer.This method should only be called after
IndexWriter.close()
. It returns a new property object containing values for (whenever appropriate)Index.PropertyKeys.DOCUMENTS
,Index.PropertyKeys.TERMS
,Index.PropertyKeys.POSTINGS
,Index.PropertyKeys.MAXCOUNT
,Index.PropertyKeys.INDEXCLASS
,Index.PropertyKeys.CODING
,Index.PropertyKeys.PAYLOADCLASS
,BitStreamIndex.PropertyKeys.SKIPQUANTUM
, andBitStreamIndex.PropertyKeys.SKIPHEIGHT
.- Specified by:
properties
in interfaceIndexWriter
- Returns:
- properties a new set of properties for the just created index.
-
close
public void close() throws IOException
Description copied from interface:IndexWriter
Closes this index writer, completing the index creation process and releasing all resources.- Specified by:
close
in interfaceIndexWriter
- Throws:
IOException
-
printStats
public void printStats(PrintStream stats)
Description copied from interface:IndexWriter
Writes to the given print stream statistical information about the index just built. This method must be called afterIndexWriter.close()
.- Specified by:
printStats
in interfaceIndexWriter
- Parameters:
stats
- a print stream where statistical information will be written.
-
-