java.lang.Object
- it.unimi.di.big.mg4j.index.Index
- - it.unimi.di.big.mg4j.index.BitStreamIndex

All Implemented Interfaces:

Serializable

Direct Known Subclasses:

BitStreamHPIndex, FileIndex, InMemoryIndex, MemoryMappedIndex
```
public abstract class BitStreamIndex
extends Index
```
A bitstream-based index. Instances of this class contains additional index data related to compression, such as the codes used for each part of the index.
Implementing subclasses must provide access to the index bitstream both at byte and bit level. A bitstream-based index usually exposes the offset list.
Wired implementations

The standard readers associated with an instance of this class are of type BitStreamIndexReader. Nonetheless, it is possible to generate automatically sources for wired classes that work only for a particular set of codings and flags. The wired classes will be fetched automagically by reflection, if available. Please read the section about performance in the MG4J manual.

Since:

1.1

Author:

Sebastiano Vigna

See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type Class Description

static class BitStreamIndex.PropertyKeys
Symbolic names for additional properties of a BitStreamIndex.
- Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.index.Index
  Index.EmptyIndexIterator, Index.UriKeys

Field Summary

Fields
Modifier and Type	Field	Description
`int`	`bufferSize`	The size of the buffer used to read the bit stream.
`CompressionFlags.Coding`	`countCoding`	The coding for counts.
`static int`	`DEFAULT_BUFFER_SIZE`	The default buffer size.
`static int`	`DEFAULT_FIXED_QUANTUM`	The default fixed quantum (each 64 postings).
`static int`	`DEFAULT_HEIGHT`	The default height (fairly low, due to memory consumption).
`static int`	`DEFAULT_QUANTUM`	The default variable quantum (1% of index size).
`static int`	`FIXED_POINT_BITS`	Fixed number of fractional binary digits used in fixed-point computation of Golomb moduli.
`static long`	`FIXED_POINT_MULTIPLIER`	`1L << FIXED_POINT_BITS`.
`CompressionFlags.Coding`	`frequencyCoding`	The coding for frequencies.
`int`	`height`	The parameter `h` (the maximum height of a skip tower), or -1 if this index has no skips.
`LongBigList`	`offsets`	The offset of each term, if offsets were loaded or specified at creation time, or `null`.
`CompressionFlags.Coding`	`pointerCoding`	The coding for pointers.
`CompressionFlags.Coding`	`positionCoding`	The coding for positions.
`int`	`quantum`	The quantum, or -1 if this index has no skips, or 0 if this is a `BitStreamHPIndex` and quanta are variable.
`Constructor<? extends IndexReader>`	`readerConstructor`	The constructor that will be used to create new index readers.

Fields inherited from class it.unimi.di.big.mg4j.index.Index
field, hasCounts, hasPayloads, hasPositions, keyIndex, maxCount, numberOfDocuments, numberOfOccurrences, numberOfPostings, numberOfTerms, payload, prefixMap, properties, singletonSet, sizes, termMap, termProcessor

Constructor Summary

Constructors
Constructor	Description
BitStreamIndex(long numberOfDocuments, long numberOfTerms, long numberOfPostings, long numberOfOccurrences, int maxCount, Payload payload, CompressionFlags.Coding frequencyCoding, CompressionFlags.Coding pointerCoding, CompressionFlags.Coding countCoding, CompressionFlags.Coding positionCoding, int quantum, int height, int bufferSize, TermProcessor termProcessor, String field, Properties properties, StringMap<? extends CharSequence> termMap, PrefixMap<? extends CharSequence> prefixMap, IntBigList sizes, LongBigList offsets)

Method Summary

Modifier and Type	Method	Description
`protected static String`	`featureName(CompressionFlags.Coding coding)`
`static int`	`gaussianGolombModulus(long quantumSigma, int shift)`	Computes the Gaussian Golomb modulus for a given standard deviation and shift using fixed-point arithmetic.
`protected Constructor<? extends IndexReader>`	`getConstructor()`
`abstract InputBitStream`	`getInputBitStream(int bufferSize)`	Returns an input bit stream over the index.
`abstract InputStream`	`getInputStream()`	Returns an input stream over the index.
`IndexReader`	`getReader(int bufferSize)`	Creates and returns a new `IndexReader` based on this index.
`static int`	`golombModulus(long p, long q)`	Computes the Golomb modulus for a given fraction using fixed-point arithmetic and a precomputed table for small values.
`static long`	`quantumSigma(long frequency, long numberOfDocuments, long quantum)`	Computes the standard deviation associated with a given quantum and document frequency.
`String`	`toString()`

Methods inherited from class it.unimi.di.big.mg4j.index.Index
documents, documents, documents, getEmptyIndexIterator, getEmptyIndexIterator, getEmptyIndexIterator, getEmptyIndexIterator, getInstance, getInstance, getInstance, getInstance, getInstance, getReader, getTermProcessor, keyIndex

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Field Detail
  - DEFAULT_HEIGHT
```
public static final int DEFAULT_HEIGHT
```
    The default height (fairly low, due to memory consumption).
    
    See Also:
    
    Constant Field Values
  - DEFAULT_QUANTUM
```
public static final int DEFAULT_QUANTUM
```
    The default variable quantum (1% of index size).
    
    See Also:
    
    Constant Field Values
  - DEFAULT_FIXED_QUANTUM
```
public static final int DEFAULT_FIXED_QUANTUM
```
    The default fixed quantum (each 64 postings).
    
    See Also:
    
    Constant Field Values
  - DEFAULT_BUFFER_SIZE
```
public static final int DEFAULT_BUFFER_SIZE
```
    The default buffer size.
    
    See Also:
    
    Constant Field Values
  - frequencyCoding
```
public final CompressionFlags.Coding frequencyCoding
```
    The coding for frequencies. See CompressionFlags.
  - pointerCoding
```
public final CompressionFlags.Coding pointerCoding
```
    The coding for pointers. See CompressionFlags.
  - countCoding
```
public final CompressionFlags.Coding countCoding
```
    The coding for counts. See CompressionFlags.
  - positionCoding
```
public final CompressionFlags.Coding positionCoding
```
    The coding for positions. See CompressionFlags.
  - offsets
```
public final LongBigList offsets
```
    The offset of each term, if offsets were loaded or specified at creation time, or null.
  - height
```
public final int height
```
    The parameter h (the maximum height of a skip tower), or -1 if this index has no skips.
  - quantum
```
public final int quantum
```
    The quantum, or -1 if this index has no skips, or 0 if this is a BitStreamHPIndex and quanta are variable.
  - bufferSize
```
public final int bufferSize
```
    The size of the buffer used to read the bit stream.
  - readerConstructor
```
public transient Constructor<? extends IndexReader> readerConstructor
```
    The constructor that will be used to create new index readers.
  - FIXED_POINT_BITS
```
public static final int FIXED_POINT_BITS
```
    Fixed number of fractional binary digits used in fixed-point computation of Golomb moduli.
    
    See Also:
    
    Constant Field Values
  - FIXED_POINT_MULTIPLIER
```
public static final long FIXED_POINT_MULTIPLIER
```
    1L << FIXED_POINT_BITS.
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - BitStreamIndex
```
public BitStreamIndex(long numberOfDocuments,
                      long numberOfTerms,
                      long numberOfPostings,
                      long numberOfOccurrences,
                      int maxCount,
                      Payload payload,
                      CompressionFlags.Coding frequencyCoding,
                      CompressionFlags.Coding pointerCoding,
                      CompressionFlags.Coding countCoding,
                      CompressionFlags.Coding positionCoding,
                      int quantum,
                      int height,
                      int bufferSize,
                      TermProcessor termProcessor,
                      String field,
                      Properties properties,
                      StringMap<? extends CharSequence> termMap,
                      PrefixMap<? extends CharSequence> prefixMap,
                      IntBigList sizes,
                      LongBigList offsets)
```
- Method Detail
  - getConstructor
```
protected Constructor<? extends IndexReader> getConstructor()
```
  - featureName
```
protected static String featureName(CompressionFlags.Coding coding)
```
  - getInputBitStream
```
public abstract InputBitStream getInputBitStream(int bufferSize)
                                          throws IOException
```
    Returns an input bit stream over the index.
    
    Parameters:
    
    bufferSize - a suggested buffer size.
    
    Returns:
    
    an input bit stream over the index.
    
    Throws:
    
    IOException
  - getInputStream
```
public abstract InputStream getInputStream()
                                    throws IOException
```
    Returns an input stream over the index.
    
    Returns:
    
    an input stream over the index.
    
    Throws:
    
    IOException
  - getReader
```
public IndexReader getReader(int bufferSize)
                      throws IOException
```
    Description copied from class: Index
    
    Creates and returns a new IndexReader based on this index. After that, you can use the reader to read this index.
    
    Specified by:
    
    getReader in class Index
    
    Parameters:
    
    bufferSize - the size of the buffer to be used accessing the reader, or -1 for a default buffer size.
    
    Returns:
    
    a new IndexReader to read this index.
    
    Throws:
    
    IOException
  - golombModulus
```
public static int golombModulus(long p,
                                long q)
```
    Computes the Golomb modulus for a given fraction using fixed-point arithmetic and a precomputed table for small values. This gives results that are extremely close to ⌈ log( 2 - p/q ) / log( 1 - p/q ) ⌉, but the computation is orders of magnitude quicker.
    
    Parameters:
    
    p - the numerator.
    
    q - the denominator (larger than or equal to p).
    
    Returns:
    
    the Golomb modulus for p/q.
  - gaussianGolombModulus
```
public static int gaussianGolombModulus(long quantumSigma,
                                        int shift)
```
    Computes the Gaussian Golomb modulus for a given standard deviation and shift using fixed-point arithmetic.
    The Golomb modulus for (positive and negative) integers normally distributed with standard deviation σ can be computed using the formula ⌈ 2 sqrt( 2 / π ) ln(2) σ ⌉.
    The resulting Golomb modulus is near to optimal for coding such integers after they have been passed through Fast.int2nat(int). Note, however, that Golomb coding is not optimal for a normal distribution.
    This function is used to compute the correct Golomb modulus for skip towers.
    
    Parameters:
    
    quantumSigma - the standard deviation of a quantum as returned by quantumSigma(long, long, long).
    
    shift - a shift parameter.
    
    Returns:
    
    the Golomb modulus for the standard deviation obtained multiplying quantumSigma by the square root of 2^shift-1.
  - quantumSigma
```
public static long quantumSigma(long frequency,
                                long numberOfDocuments,
                                long quantum)
```
    Computes the standard deviation associated with a given quantum and document frequency.
    
    Parameters:
    
    frequency - the document frequency.
    
    numberOfDocuments - the overall number of documents.
    
    quantum - the quantum.
    
    Returns:
    
    a long representing in fixed-point arithmetic the value Math.sqrt( quantum * ( 1 - p ) ) / p, where p is the relative frequency.
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class Object

Class BitStreamIndex

Wired implementations

Nested Class Summary

Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.index.Index

Field Summary

Fields inherited from class it.unimi.di.big.mg4j.index.Index

Constructor Summary

Method Summary

Methods inherited from class it.unimi.di.big.mg4j.index.Index

Methods inherited from class java.lang.Object

Field Detail

DEFAULT_HEIGHT

DEFAULT_QUANTUM

DEFAULT_FIXED_QUANTUM

DEFAULT_BUFFER_SIZE

frequencyCoding

pointerCoding

countCoding

positionCoding

offsets

height

quantum

bufferSize

readerConstructor

FIXED_POINT_BITS

FIXED_POINT_MULTIPLIER

Constructor Detail

BitStreamIndex

Method Detail

getConstructor

featureName

getInputBitStream

getInputStream

getReader

golombModulus

gaussianGolombModulus

quantumSigma

toString