|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object it.unimi.di.mg4j.io.ByteArrayPostingList
public class ByteArrayPostingList
Lightweight posting accumulator with format similar to that generated by BitStreamIndexWriter
.
This class is essentially a dirty trick: it borrows some code and precomputed tables from OutputBitStream
and exposes two simple methods (setDocumentPointer(int)
and addPosition(int)
) with obvious
semantics. The resulting posting list is compressed exactly like an BitStreamIndexWriter
would do (also in this
case, duplicating some logic found therein). As a result, after completing the calls and after a call to close()
the internal buffer
can be written directly to a bit stream to build an index (but see stripPointers(OutputBitStream, long)
).
Scan
uses an instance of this class for each indexed term. Instances can be differential, in which
case they assume setDocumentPointer(int)
will be called with increasing values and store gaps rather
than document pointers. A completeness level can be used to set whether an instance of this class
should store positions or counts.
Field Summary | |
---|---|
byte[] |
buffer
The internal buffer. |
int |
frequency
The current frequency (number of calls to setDocumentPointer(int) ). |
int |
maxCount
The maximum count ever seen. |
long |
occurrency
The current occurrency. |
boolean |
outOfMemoryError
If true, this list experienced an OutOfMemoryError during some buffer reallocation. |
long |
posNumBits
The number of bits used for positions. |
long |
sumMaxPos
The current sum of maximum positions. |
Constructor Summary | |
---|---|
ByteArrayPostingList(byte[] a,
boolean differential,
Scan.Completeness completeness)
Creates a new posting list wrapping a given byte array. |
Method Summary | |
---|---|
void |
addPosition(int pos)
Adds a new position for the current document pointer. |
int |
align()
Flushes the internal bit buffer to the byte buffer. |
void |
close()
Calls flush() and then releases resources allocated by this byte-array posting list, keeping just the internal buffer. |
void |
flush()
Flushes the positions cached internally. |
void |
setDocumentPointer(int pointer)
Sets the current document pointer. |
void |
stripPointers(OutputBitStream obs,
long bitLength)
Writes the given number of bits of the internal buffer to the provided output bit stream, stripping all document pointers. |
long |
writtenBits()
Returns the number of bits written by this posting list. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public byte[] buffer
public int frequency
setDocumentPointer(int)
).
public long occurrency
public long sumMaxPos
public long posNumBits
public int maxCount
public boolean outOfMemoryError
OutOfMemoryError
during some buffer reallocation.
Constructor Detail |
---|
public ByteArrayPostingList(byte[] a, boolean differential, Scan.Completeness completeness)
a
- the byte array to wrap.differential
- whether this stream should be differential (e.g., whether it should store document pointers as gaps).completeness
- Method Detail |
---|
public int align()
public void flush()
flush
in interface Flushable
public void setDocumentPointer(int pointer)
If the document pointer is changed since the last call, the positions currently stored are flushed and the new pointer is written to the stream.
pointer
- a document pointer.public void addPosition(int pos)
It is mandatory that successive calls to this method for the same document pointer have increasing arguments.
pos
- a position.public long writtenBits()
public void stripPointers(OutputBitStream obs, long bitLength) throws IOException
This method is a horrible kluge solving the problem of terms appearing in all documents:
BitStreamIndexWriter
would not write pointers in this case, but we do not know
whether we will need pointers or not while we are filling the internal buffer. Thus, for
those (hopefully few) terms appearing in all documents this method can be used to
dump the internal buffer stripping all pointers.
Note that the valid number of bits should be retrieved using writtenBits()
after a flush()
. Then, a call to align()
will dump to the buffer
the bits still floating in the bit buffer; at that point this method can be called safely.
obs
- an output bit stream.bitLength
- the number of bits to be scanned.
IOException
public void close()
flush()
and then releases resources allocated by this byte-array posting list, keeping just the internal buffer.
close
in interface Closeable
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |