Class SemiExternalOffsetBigList
- java.lang.Object
-
- java.util.AbstractCollection<Long>
-
- it.unimi.dsi.fastutil.longs.AbstractLongCollection
-
- it.unimi.dsi.fastutil.longs.AbstractLongBigList
-
- it.unimi.di.big.mg4j.util.SemiExternalOffsetBigList
-
- All Implemented Interfaces:
BigList<Long>
,LongBigList
,LongCollection
,LongIterable
,LongStack
,Size64
,Stack<Long>
,Comparable<BigList<? extends Long>>
,Iterable<Long>
,Collection<Long>
public class SemiExternalOffsetBigList extends AbstractLongBigList
Provides semi-external random access to offsets of anindex
.This class is a semi-external
LongList
that MG4J uses as default for accessing term offsets.When the number of terms in the index grows, storing each offset as a long in an array can consume hundred of megabytes of memory, and most of this memory is wasted, as it is occupied by offsets of hapax legomena (terms occurring just once in the collection). Instead, this class accesses offsets in their compressed forms, and provides entry points for random access to each offset. At construction time, entry points are computed with a certain step, which is the number of offsets accessible from each entry point, or, equivalently, the maximum number of offsets that will be necessary to read to access a given offset.
This class uses a small (
CACHE_MAX_SIZE
entries) map to keep track of the most recently used indices, so to answer queries to those indices more quickly.Warning: This class is not thread safe, and needs to be synchronised to be used in a multithreaded environment.
- Author:
- Fabien Campagne, Sebastiano Vigna
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class it.unimi.dsi.fastutil.longs.AbstractLongBigList
AbstractLongBigList.LongSubList
-
-
Field Summary
Fields Modifier and Type Field Description static int
CACHE_MAX_SIZE
The maximum number of entry in the cache map.
-
Constructor Summary
Constructors Constructor Description SemiExternalOffsetBigList(InputBitStream offsetRawData, int offsetStep, long numOffsets)
Creates a new semi-external list.
-
Method Summary
-
Methods inherited from class it.unimi.dsi.fastutil.longs.AbstractLongBigList
add, add, add, addAll, addAll, addAll, addAll, addAll, addAll, addElements, addElements, clear, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, get, getElements, hashCode, indexOf, indexOf, iterator, lastIndexOf, lastIndexOf, listIterator, listIterator, peek, peekLong, pop, popLong, push, push, rem, remove, removeElements, removeLong, set, set, size, size, subList, top, topLong, toString
-
Methods inherited from class it.unimi.dsi.fastutil.longs.AbstractLongCollection
add, contains, containsAll, remove, removeAll, retainAll, toArray, toLongArray, toLongArray
-
Methods inherited from class java.util.AbstractCollection
containsAll, isEmpty, removeAll, retainAll, toArray, toArray
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface java.util.Collection
containsAll, isEmpty, parallelStream, removeAll, retainAll, spliterator, stream, toArray, toArray, toArray
-
Methods inherited from interface it.unimi.dsi.fastutil.longs.LongCollection
add, contains, containsAll, remove, removeAll, removeIf, removeIf, retainAll, toArray, toLongArray, toLongArray
-
Methods inherited from interface it.unimi.dsi.fastutil.longs.LongIterable
forEach, forEach
-
-
-
-
Field Detail
-
CACHE_MAX_SIZE
public static final int CACHE_MAX_SIZE
The maximum number of entry in the cache map.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
SemiExternalOffsetBigList
public SemiExternalOffsetBigList(InputBitStream offsetRawData, int offsetStep, long numOffsets) throws IOException
Creates a new semi-external list.- Parameters:
offsetRawData
- a bit stream containing the offsets in compressed form (γ-encoded deltas).offsetStep
- the step used to build random-access entry points.numOffsets
- the overall number of offsets (i.e., the number of terms).- Throws:
IOException
-
-