it.unimi.di.mg4j.util
Class SemiExternalOffsetList
java.lang.Object
java.util.AbstractCollection<Long>
it.unimi.dsi.fastutil.longs.AbstractLongCollection
it.unimi.dsi.fastutil.longs.AbstractLongList
it.unimi.di.mg4j.util.SemiExternalOffsetList
- All Implemented Interfaces:
- LongCollection, LongIterable, LongList, LongStack, Stack<Long>, Comparable<List<? extends Long>>, Iterable<Long>, Collection<Long>, List<Long>
public class SemiExternalOffsetList
- extends AbstractLongList
Provides semi-external random access to offsets of an index
.
This class is a semi-external LongList
that
MG4J uses as default for accessing term offsets.
When the number of terms in the index grows, storing each offset as a long in an
array can consume hundred of megabytes of memory, and most of this memory is wasted,
as it is occupied by offsets of hapax legomena (terms occurring just once in the
collection). Instead, this class accesses offsets in their
compressed forms, and provides entry points for random access to each offset. At construction
time, entry points are computed with a certain step, which is the number of offsets
accessible from each entry point, or, equivalently, the maximum number of offsets that will
be necessary to read to access a given offset.
This class uses a small (CACHE_MAX_SIZE
entries) map to keep track of the most recently used
indices, so to answer queries to those indices more quickly.
Warning: This class is not thread safe, and needs to be synchronised to be used in a
multithreaded environment.
- Author:
- Fabien Campagne, Sebastiano Vigna
Field Summary |
static int |
CACHE_MAX_SIZE
The maximum number of entry in the cache map. |
Methods inherited from class it.unimi.dsi.fastutil.longs.AbstractLongList |
add, add, add, addAll, addAll, addAll, addAll, addAll, addAll, addElements, addElements, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, get, getElements, hashCode, indexOf, indexOf, iterator, lastIndexOf, lastIndexOf, listIterator, listIterator, longListIterator, longListIterator, longSubList, peek, peekLong, pop, popLong, push, push, rem, remove, remove, removeElements, removeLong, set, set, size, subList, top, topLong, toString |
Methods inherited from class it.unimi.dsi.fastutil.longs.AbstractLongCollection |
add, contains, containsAll, containsAll, isEmpty, longIterator, rem, removeAll, removeAll, retainAll, retainAll, toArray, toArray, toArray, toLongArray, toLongArray |
Methods inherited from interface it.unimi.dsi.fastutil.Stack |
isEmpty |
CACHE_MAX_SIZE
public static final int CACHE_MAX_SIZE
- The maximum number of entry in the cache map.
- See Also:
- Constant Field Values
SemiExternalOffsetList
public SemiExternalOffsetList(InputBitStream offsetRawData,
int offsetStep,
int numOffsets)
throws IOException
- Creates a new semi-external list.
- Parameters:
offsetRawData
- a bit stream containing the offsets in compressed form (γ-encoded deltas).offsetStep
- the step used to build random-access entry points.numOffsets
- the overall number of offsets (i.e., the number of terms).
- Throws:
IOException
getLong
public final long getLong(int index)
size
public int size()
- Specified by:
size
in interface Collection<Long>
- Specified by:
size
in interface List<Long>
- Specified by:
size
in class AbstractCollection<Long>