Class QuasiSuccinctIndex
- java.lang.Object
-
- it.unimi.di.big.mg4j.index.Index
-
- it.unimi.di.big.mg4j.index.QuasiSuccinctIndex
-
- All Implemented Interfaces:
Serializable
public class QuasiSuccinctIndex extends Index
A quasi-succinct index.A quasi-succinct index does not use gap-compression to represent its various components, but rather the Elias–Fano representation of monotone sequences. The index was described in detail by Sebastiano Vigna in the paper “Quasi-Succinct Indices”, Proceedings of the 6th ACM International Conference on Web Search and Data Mining, WSDM'13, pages 83−92. ACM, 2013. It is smaller than a γ/δ-code gap-compressed index, and significantly faster when computing conjunctive, phrasal or proximity operators, as it provides constant-time access on average to every piece of information in the index.
In a quasi-succinct index pointers, counters and positions are represented in three different files, each of which has its own offset file. The file do not contain a byte-oriented bitstream representation, but rather arrays of 64-bit longwords with specified byte order (by default the native one for performance reasons). The longwords are either loaded in memory as a
LongBigArrayBigList
or mapped using aByteBufferLongBigList
. The bit k of a file is the bit k mod 64 of the longword of index ⌊k / 64⌋.Note that the methods providing pointers, counts and positions to index readers use reflection to detect whether the
LongBigList
storing a component is aByteBufferLongBigList
, and in that case they return a copy.- Author:
- Sebastiano Vigna
- See Also:
QuasiSuccinctIndexReader
,QuasiSuccinctIndexWriter
, Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
QuasiSuccinctIndex.PropertyKeys
Symbolic names for additional properties of aQuasiSuccinctIndex
.-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.index.Index
Index.EmptyIndexIterator, Index.UriKeys
-
-
Field Summary
Fields Modifier and Type Field Description protected LongBigList
countsOffsets
The list of offsets into counts.static int
DEFAULT_QUANTUM
The default quantum.int
log2Quantum
The logarithm of the skipping quantum.protected LongBigList
pointersOffsets
The list of offsets into pointers.protected LongBigList
positionsOffsets
The list of offsets into positions.-
Fields inherited from class it.unimi.di.big.mg4j.index.Index
field, hasCounts, hasPayloads, hasPositions, keyIndex, maxCount, numberOfDocuments, numberOfOccurrences, numberOfPostings, numberOfTerms, payload, prefixMap, properties, singletonSet, sizes, termMap, termProcessor
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
QuasiSuccinctIndex(LongBigList index, LongBigList counts, LongBigList positions, long numberOfDocuments, long numberOfTerms, long numberOfPostings, long numberOfOccurrences, int maxCount, Payload payload, int log2Quantum, boolean hasCounts, boolean hasPositions, TermProcessor termProcessor, String field, Properties properties, StringMap<? extends CharSequence> termMap, PrefixMap<? extends CharSequence> prefixMap, IntBigList sizes, LongBigList indexOffsets, LongBigList countsOffsets, LongBigList positionsOffsets)
-
Method Summary
Modifier and Type Method Description protected LongBigList
getCountsList()
protected LongBigList
getPointersList()
protected LongBigList
getPositionsList()
IndexReader
getReader(int bufferSize)
Creates and returns a newIndexReader
based on this index.String
toString()
-
Methods inherited from class it.unimi.di.big.mg4j.index.Index
documents, documents, documents, getEmptyIndexIterator, getEmptyIndexIterator, getEmptyIndexIterator, getEmptyIndexIterator, getInstance, getInstance, getInstance, getInstance, getInstance, getReader, getTermProcessor, keyIndex
-
-
-
-
Field Detail
-
DEFAULT_QUANTUM
public static final int DEFAULT_QUANTUM
The default quantum.- See Also:
- Constant Field Values
-
pointersOffsets
protected final LongBigList pointersOffsets
The list of offsets into pointers.
-
countsOffsets
protected final LongBigList countsOffsets
The list of offsets into counts.
-
positionsOffsets
protected final LongBigList positionsOffsets
The list of offsets into positions.
-
log2Quantum
public final int log2Quantum
The logarithm of the skipping quantum.
-
-
Constructor Detail
-
QuasiSuccinctIndex
protected QuasiSuccinctIndex(LongBigList index, LongBigList counts, LongBigList positions, long numberOfDocuments, long numberOfTerms, long numberOfPostings, long numberOfOccurrences, int maxCount, Payload payload, int log2Quantum, boolean hasCounts, boolean hasPositions, TermProcessor termProcessor, String field, Properties properties, StringMap<? extends CharSequence> termMap, PrefixMap<? extends CharSequence> prefixMap, IntBigList sizes, LongBigList indexOffsets, LongBigList countsOffsets, LongBigList positionsOffsets)
-
-
Method Detail
-
getReader
public IndexReader getReader(int bufferSize) throws IOException
Description copied from class:Index
Creates and returns a newIndexReader
based on this index. After that, you can use the reader to read this index.- Specified by:
getReader
in classIndex
- Parameters:
bufferSize
- the size of the buffer to be used accessing the reader, or -1 for a default buffer size.- Returns:
- a new
IndexReader
to read this index. - Throws:
IOException
-
getPointersList
protected LongBigList getPointersList()
-
getCountsList
protected LongBigList getCountsList()
-
getPositionsList
protected LongBigList getPositionsList()
-
-