|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unimi.di.mg4j.index.Index
it.unimi.di.mg4j.index.BitStreamIndex
public abstract class BitStreamIndex
A bitstream-based index. Instances of this class contains additional index data related to compression, such as the codes used for each part of the index.
Implementing subclasses must provide access to the index bitstream both at byte and bit level. A bitstream-based index usually exposes the offset list.
The standard readers associated with an instance of this class are of type BitStreamIndexReader
.
Nonetheless, it is possible to generate automatically sources for wired classes that
work only for a particular set of codings and flags. The wired classes will be fetched
automagically by reflection, if available. Please read the section about performance in the MG4J manual.
Nested Class Summary | |
---|---|
static class |
BitStreamIndex.PropertyKeys
Symbolic names for additional properties of a BitStreamIndex . |
Nested classes/interfaces inherited from class it.unimi.di.mg4j.index.Index |
---|
Index.EmptyIndexIterator, Index.UriKeys |
Field Summary | |
---|---|
int |
bufferSize
The size of the buffer used to read the bit stream. |
CompressionFlags.Coding |
countCoding
The coding for counts. |
static int |
DEFAULT_BUFFER_SIZE
The default buffer size. |
static int |
DEFAULT_FIXED_QUANTUM
The default fixed quantum (each 64 postings). |
static int |
DEFAULT_HEIGHT
The default height (fairly low, due to memory consumption). |
static int |
DEFAULT_QUANTUM
The default variable quantum (1% of index size). |
static int |
FIXED_POINT_BITS
Fixed number of fractional binary digits used in fixed-point computation of Golomb moduli. |
static long |
FIXED_POINT_MULTIPLIER
1L << . |
CompressionFlags.Coding |
frequencyCoding
The coding for frequencies. |
int |
height
The parameter h (the maximum height of a skip tower), or -1 if this index has no skips. |
LongList |
offsets
The offset of each term, if offsets were loaded or specified at creation time, or null . |
CompressionFlags.Coding |
pointerCoding
The coding for pointers. |
CompressionFlags.Coding |
positionCoding
The coding for positions. |
int |
quantum
The quantum, or -1 if this index has no skips, or 0 if this is a BitStreamHPIndex and quanta are variable. |
Constructor<? extends IndexReader> |
readerConstructor
The constructor that will be used to create new index readers. |
Fields inherited from class it.unimi.di.mg4j.index.Index |
---|
field, hasCounts, hasPayloads, hasPositions, keyIndex, maxCount, numberOfDocuments, numberOfOccurrences, numberOfPostings, numberOfTerms, payload, prefixMap, properties, singletonSet, sizes, termMap, termProcessor |
Constructor Summary | |
---|---|
BitStreamIndex(int numberOfDocuments,
int numberOfTerms,
long numberOfPostings,
long numberOfOccurrences,
int maxCount,
Payload payload,
CompressionFlags.Coding frequencyCoding,
CompressionFlags.Coding pointerCoding,
CompressionFlags.Coding countCoding,
CompressionFlags.Coding positionCoding,
int quantum,
int height,
int bufferSize,
TermProcessor termProcessor,
String field,
Properties properties,
StringMap<? extends CharSequence> termMap,
PrefixMap<? extends CharSequence> prefixMap,
IntList sizes,
LongList offsets)
|
Method Summary | |
---|---|
protected static String |
featureName(CompressionFlags.Coding coding)
|
static int |
gaussianGolombModulus(long quantumSigma,
int shift)
Computes the Gaussian Golomb modulus for a given standard deviation and shift using fixed-point arithmetic. |
protected Constructor<? extends IndexReader> |
getConstructor()
|
abstract InputBitStream |
getInputBitStream(int bufferSize)
Returns an input bit stream over the index. |
abstract InputStream |
getInputStream()
Returns an input stream over the index. |
IndexReader |
getReader(int bufferSize)
Creates and returns a new IndexReader based on this index. |
static int |
golombModulus(int p,
int q)
Computes the Golomb modulus for a given fraction using fixed-point arithmetic and a precomputed table for small values. |
static long |
quantumSigma(int frequency,
int numberOfDocuments,
int quantum)
Computes the standard deviation associated with a given quantum and document frequency. |
String |
toString()
|
Methods inherited from class it.unimi.di.mg4j.index.Index |
---|
documents, documents, documents, getEmptyIndexIterator, getEmptyIndexIterator, getEmptyIndexIterator, getEmptyIndexIterator, getInstance, getInstance, getInstance, getInstance, getInstance, getReader, getTermProcessor, keyIndex |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final int DEFAULT_HEIGHT
public static final int DEFAULT_QUANTUM
public static final int DEFAULT_FIXED_QUANTUM
public static final int DEFAULT_BUFFER_SIZE
public final CompressionFlags.Coding frequencyCoding
CompressionFlags
.
public final CompressionFlags.Coding pointerCoding
CompressionFlags
.
public final CompressionFlags.Coding countCoding
CompressionFlags
.
public final CompressionFlags.Coding positionCoding
CompressionFlags
.
public final LongList offsets
null
.
public final int height
h
(the maximum height of a skip tower), or -1 if this index has no skips.
public final int quantum
BitStreamHPIndex
and quanta are variable.
public final int bufferSize
public transient Constructor<? extends IndexReader> readerConstructor
public static final int FIXED_POINT_BITS
public static final long FIXED_POINT_MULTIPLIER
1L << FIXED_POINT_BITS
.
Constructor Detail |
---|
public BitStreamIndex(int numberOfDocuments, int numberOfTerms, long numberOfPostings, long numberOfOccurrences, int maxCount, Payload payload, CompressionFlags.Coding frequencyCoding, CompressionFlags.Coding pointerCoding, CompressionFlags.Coding countCoding, CompressionFlags.Coding positionCoding, int quantum, int height, int bufferSize, TermProcessor termProcessor, String field, Properties properties, StringMap<? extends CharSequence> termMap, PrefixMap<? extends CharSequence> prefixMap, IntList sizes, LongList offsets)
Method Detail |
---|
protected Constructor<? extends IndexReader> getConstructor()
protected static String featureName(CompressionFlags.Coding coding)
public abstract InputBitStream getInputBitStream(int bufferSize) throws IOException
bufferSize
- a suggested buffer size.
IOException
public abstract InputStream getInputStream() throws IOException
IOException
public IndexReader getReader(int bufferSize) throws IOException
Index
IndexReader
based on this index. After that, you
can use the reader to read this index.
getReader
in class Index
bufferSize
- the size of the buffer to be used accessing the reader, or -1
for a default buffer size.
IndexReader
to read this index.
IOException
public static int golombModulus(int p, int q)
p
/q
) / log( 1 - p
/q
) ⌉,
but the computation is orders of magnitude quicker.
p
- the numerator.q
- the denominator (larger than or equal to p
).
p
/q
.public static int gaussianGolombModulus(long quantumSigma, int shift)
The Golomb modulus for (positive and negative) integers normally distributed with standard deviation σ can be computed using the formula ⌈ 2 sqrt( 2 / π ) ln(2) σ ⌉.
The resulting Golomb modulus is near to optimal for coding such
integers after they have been passed through Fast.int2nat(int)
. Note,
however, that Golomb coding is not optimal for a normal distribution.
This function is used to compute the correct Golomb modulus for skip towers.
quantumSigma
- the standard deviation of a quantum as returned by quantumSigma(int, int, int)
.shift
- a shift parameter.
quantumSigma
by
the square root of 2shift
-1.public static long quantumSigma(int frequency, int numberOfDocuments, int quantum)
frequency
- the document frequency.numberOfDocuments
- the overall number of documents.quantum
- the quantum.
Math.sqrt( quantum * ( 1 - p ) ) / p
, where
p
is the relative frequency.public String toString()
toString
in class Object
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |