public class SemiExternalOffsetBigList extends AbstractLongBigList
This class is a semi-external
MG4J uses as default for accessing term offsets.
When the number of terms in the index grows, storing each offset as a long in an array can consume hundred of megabytes of memory, and most of this memory is wasted, as it is occupied by offsets of hapax legomena (terms occurring just once in the collection). Instead, this class accesses offsets in their compressed forms, and provides entry points for random access to each offset. At construction time, entry points are computed with a certain step, which is the number of offsets accessible from each entry point, or, equivalently, the maximum number of offsets that will be necessary to read to access a given offset.
This class uses a small (
CACHE_MAX_SIZE entries) map to keep track of the most recently used
indices, so to answer queries to those indices more quickly.
Warning: This class is not thread safe, and needs to be synchronised to be used in a multithreaded environment.
|Modifier and Type||Field and Description|
The maximum number of entry in the cache map.
|Constructor and Description|
Creates a new semi-external list.
|Modifier and Type||Method and Description|
add, add, add, addAll, addAll, addAll, addAll, addAll, addAll, addAll, addElements, addElements, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, get, getElements, getLong, hashCode, indexOf, indexOf, iterator, lastIndexOf, lastIndexOf, listIterator, listIterator, listIterator, peek, peekLong, pop, popLong, push, push, rem, remove, remove, removeElements, removeLong, removeLong, set, set, set, size, size, size, subList, top, topLong, toString
add, contains, containsAll, containsAll, isEmpty, longIterator, rem, remove, removeAll, removeAll, retainAll, retainAll, toArray, toArray, toArray, toLongArray, toLongArray
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
containsAll, longIterator, removeAll, retainAll, toArray, toArray, toLongArray, toLongArray
add, clear, contains, containsAll, isEmpty, parallelStream, remove, removeAll, removeIf, retainAll, spliterator, stream, toArray
public static final int CACHE_MAX_SIZE
public SemiExternalOffsetBigList(InputBitStream offsetRawData, int offsetStep, long numOffsets) throws IOException
offsetRawData- a bit stream containing the offsets in compressed form (γ-encoded deltas).
offsetStep- the step used to build random-access entry points.
numOffsets- the overall number of offsets (i.e., the number of terms).