Class ByteArrayPostingList
- java.lang.Object
-
- it.unimi.di.big.mg4j.io.ByteArrayPostingList
-
- All Implemented Interfaces:
Closeable
,Flushable
,AutoCloseable
public class ByteArrayPostingList extends Object implements Flushable, Closeable
Lightweight posting accumulator with format similar to that generated byBitStreamIndexWriter
.This class is essentially a dirty trick: it borrows some code and precomputed tables from
OutputBitStream
and exposes two simple methods (setDocumentPointer(long)
andaddPosition(int)
) with obvious semantics. The resulting posting list is compressed exactly like anBitStreamIndexWriter
would do (also in this case, duplicating some logic found therein). As a result, after completing the calls and after a call toclose()
the internalbuffer
can be written directly to a bit stream to build an index (but seestripPointers(OutputBitStream, long)
).Scan
uses an instance of this class for each indexed term. Instances can be differential, in which case they assumesetDocumentPointer(long)
will be called with increasing values and store gaps rather than document pointers. A completeness level can be used to set whether an instance of this class should store positions or counts.- Since:
- 1.2
- Author:
- Sebastiano Vigna
-
-
Field Summary
Fields Modifier and Type Field Description byte[]
buffer
The internal buffer.long
frequency
The current frequency (number of calls tosetDocumentPointer(long)
).int
maxCount
The maximum count ever seen.long
occurrency
The current occurrency.boolean
outOfMemoryError
If true, this list experienced anOutOfMemoryError
during some buffer reallocation.long
posNumBits
The number of bits used for positions.long
sumMaxPos
The current sum of maximum positions.
-
Constructor Summary
Constructors Constructor Description ByteArrayPostingList(byte[] a, boolean differential, Scan.Completeness completeness)
Creates a new posting list wrapping a given byte array.
-
Method Summary
Modifier and Type Method Description void
addPosition(int pos)
Adds a new position for the current document pointer.int
align()
Flushes the internal bit buffer to the byte buffer.void
close()
Callsflush()
and then releases resources allocated by this byte-array posting list, keeping just the internal buffer.void
flush()
Flushes the positions cached internally.void
setDocumentPointer(long pointer)
Sets the current document pointer.void
stripPointers(OutputBitStream obs, long bitLength)
Writes the given number of bits of the internal buffer to the provided output bit stream, stripping all document pointers.int
writeLong(long x, int len)
Writes a fixed number of bits from a long.long
writtenBits()
Returns the number of bits written by this posting list.
-
-
-
Field Detail
-
buffer
public byte[] buffer
The internal buffer.
-
frequency
public long frequency
The current frequency (number of calls tosetDocumentPointer(long)
).
-
occurrency
public long occurrency
The current occurrency.
-
sumMaxPos
public long sumMaxPos
The current sum of maximum positions.
-
posNumBits
public long posNumBits
The number of bits used for positions.
-
maxCount
public int maxCount
The maximum count ever seen.
-
outOfMemoryError
public boolean outOfMemoryError
If true, this list experienced anOutOfMemoryError
during some buffer reallocation.
-
-
Constructor Detail
-
ByteArrayPostingList
public ByteArrayPostingList(byte[] a, boolean differential, Scan.Completeness completeness)
Creates a new posting list wrapping a given byte array.- Parameters:
a
- the byte array to wrap.differential
- whether this stream should be differential (e.g., whether it should store document pointers as gaps).completeness
-
-
-
Method Detail
-
align
public int align()
Flushes the internal bit buffer to the byte buffer.- Returns:
- the number of bits written.
-
writeLong
public int writeLong(long x, int len)
Writes a fixed number of bits from a long.- Parameters:
x
- a long.len
- a bit length; this many lower bits of the first argument will be written (the most significant bit first).- Returns:
- the number of bits written (
len
).
-
flush
public void flush()
Flushes the positions cached internally.
-
setDocumentPointer
public void setDocumentPointer(long pointer)
Sets the current document pointer.If the document pointer is changed since the last call, the positions currently stored are flushed and the new pointer is written to the stream.
- Parameters:
pointer
- a document pointer.
-
addPosition
public void addPosition(int pos)
Adds a new position for the current document pointer.It is mandatory that successive calls to this method for the same document pointer have increasing arguments.
- Parameters:
pos
- a position.
-
writtenBits
public long writtenBits()
Returns the number of bits written by this posting list.- Returns:
- the number of bits written by this posting list.
-
stripPointers
public void stripPointers(OutputBitStream obs, long bitLength) throws IOException
Writes the given number of bits of the internal buffer to the provided output bit stream, stripping all document pointers.This method is a horrible kluge solving the problem of terms appearing in all documents:
BitStreamIndexWriter
would not write pointers in this case, but we do not know whether we will need pointers or not while we are filling the internal buffer. Thus, for those (hopefully few) terms appearing in all documents this method can be used to dump the internal buffer stripping all pointers.Note that the valid number of bits should be retrieved using
writtenBits()
after aflush()
. Then, a call toalign()
will dump to the buffer the bits still floating in the bit buffer; at that point this method can be called safely.- Parameters:
obs
- an output bit stream.bitLength
- the number of bits to be scanned.- Throws:
IOException
-
close
public void close()
Callsflush()
and then releases resources allocated by this byte-array posting list, keeping just the internal buffer.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
-
-