it.unimi.di.mg4j.io
Class ByteArrayPostingList

java.lang.Object
  extended by it.unimi.di.mg4j.io.ByteArrayPostingList
All Implemented Interfaces:
Closeable, Flushable

public class ByteArrayPostingList
extends Object
implements Flushable, Closeable

Lightweight posting accumulator with format similar to that generated by BitStreamIndexWriter.

This class is essentially a dirty trick: it borrows some code and precomputed tables from OutputBitStream and exposes two simple methods (setDocumentPointer(int) and addPosition(int)) with obvious semantics. The resulting posting list is compressed exactly like an BitStreamIndexWriter would do (also in this case, duplicating some logic found therein). As a result, after completing the calls and after a call to close() the internal buffer can be written directly to a bit stream to build an index (but see stripPointers(OutputBitStream, long)).

Scan uses an instance of this class for each indexed term. Instances can be differential, in which case they assume setDocumentPointer(int) will be called with increasing values and store gaps rather than document pointers. A completeness level can be used to set whether an instance of this class should store positions or counts.

Since:
1.2
Author:
Sebastiano Vigna

Field Summary
 byte[] buffer
          The internal buffer.
 int frequency
          The current frequency (number of calls to setDocumentPointer(int)).
 int maxCount
          The maximum count ever seen.
 long occurrency
          The current occurrency.
 boolean outOfMemoryError
          If true, this list experienced an OutOfMemoryError during some buffer reallocation.
 long posNumBits
          The number of bits used for positions.
 long sumMaxPos
          The current sum of maximum positions.
 
Constructor Summary
ByteArrayPostingList(byte[] a, boolean differential, Scan.Completeness completeness)
          Creates a new posting list wrapping a given byte array.
 
Method Summary
 void addPosition(int pos)
          Adds a new position for the current document pointer.
 int align()
          Flushes the internal bit buffer to the byte buffer.
 void close()
          Calls flush() and then releases resources allocated by this byte-array posting list, keeping just the internal buffer.
 void flush()
          Flushes the positions cached internally.
 void setDocumentPointer(int pointer)
          Sets the current document pointer.
 void stripPointers(OutputBitStream obs, long bitLength)
          Writes the given number of bits of the internal buffer to the provided output bit stream, stripping all document pointers.
 long writtenBits()
          Returns the number of bits written by this posting list.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

buffer

public byte[] buffer
The internal buffer.


frequency

public int frequency
The current frequency (number of calls to setDocumentPointer(int)).


occurrency

public long occurrency
The current occurrency.


sumMaxPos

public long sumMaxPos
The current sum of maximum positions.


posNumBits

public long posNumBits
The number of bits used for positions.


maxCount

public int maxCount
The maximum count ever seen.


outOfMemoryError

public boolean outOfMemoryError
If true, this list experienced an OutOfMemoryError during some buffer reallocation.

Constructor Detail

ByteArrayPostingList

public ByteArrayPostingList(byte[] a,
                            boolean differential,
                            Scan.Completeness completeness)
Creates a new posting list wrapping a given byte array.

Parameters:
a - the byte array to wrap.
differential - whether this stream should be differential (e.g., whether it should store document pointers as gaps).
completeness -
Method Detail

align

public int align()
Flushes the internal bit buffer to the byte buffer.

Returns:
the number of bits written.

flush

public void flush()
Flushes the positions cached internally.

Specified by:
flush in interface Flushable

setDocumentPointer

public void setDocumentPointer(int pointer)
Sets the current document pointer.

If the document pointer is changed since the last call, the positions currently stored are flushed and the new pointer is written to the stream.

Parameters:
pointer - a document pointer.

addPosition

public void addPosition(int pos)
Adds a new position for the current document pointer.

It is mandatory that successive calls to this method for the same document pointer have increasing arguments.

Parameters:
pos - a position.

writtenBits

public long writtenBits()
Returns the number of bits written by this posting list.

Returns:
the number of bits written by this posting list.

stripPointers

public void stripPointers(OutputBitStream obs,
                          long bitLength)
                   throws IOException
Writes the given number of bits of the internal buffer to the provided output bit stream, stripping all document pointers.

This method is a horrible kluge solving the problem of terms appearing in all documents: BitStreamIndexWriter would not write pointers in this case, but we do not know whether we will need pointers or not while we are filling the internal buffer. Thus, for those (hopefully few) terms appearing in all documents this method can be used to dump the internal buffer stripping all pointers.

Note that the valid number of bits should be retrieved using writtenBits() after a flush(). Then, a call to align() will dump to the buffer the bits still floating in the bit buffer; at that point this method can be called safely.

Parameters:
obs - an output bit stream.
bitLength - the number of bits to be scanned.
Throws:
IOException

close

public void close()
Calls flush() and then releases resources allocated by this byte-array posting list, keeping just the internal buffer.

Specified by:
close in interface Closeable