it.unimi.di.mg4j.util
Class SemiExternalOffsetList

java.lang.Object
  extended by java.util.AbstractCollection<Long>
      extended by it.unimi.dsi.fastutil.longs.AbstractLongCollection
          extended by it.unimi.dsi.fastutil.longs.AbstractLongList
              extended by it.unimi.di.mg4j.util.SemiExternalOffsetList
All Implemented Interfaces:
LongCollection, LongIterable, LongList, LongStack, Stack<Long>, Comparable<List<? extends Long>>, Iterable<Long>, Collection<Long>, List<Long>

public class SemiExternalOffsetList
extends AbstractLongList

Provides semi-external random access to offsets of an index.

This class is a semi-external LongList that MG4J uses as default for accessing term offsets.

When the number of terms in the index grows, storing each offset as a long in an array can consume hundred of megabytes of memory, and most of this memory is wasted, as it is occupied by offsets of hapax legomena (terms occurring just once in the collection). Instead, this class accesses offsets in their compressed forms, and provides entry points for random access to each offset. At construction time, entry points are computed with a certain step, which is the number of offsets accessible from each entry point, or, equivalently, the maximum number of offsets that will be necessary to read to access a given offset.

This class uses a small (CACHE_MAX_SIZE entries) map to keep track of the most recently used indices, so to answer queries to those indices more quickly.

Warning: This class is not thread safe, and needs to be synchronised to be used in a multithreaded environment.

Author:
Fabien Campagne, Sebastiano Vigna

Nested Class Summary
 
Nested classes/interfaces inherited from class it.unimi.dsi.fastutil.longs.AbstractLongList
AbstractLongList.LongSubList
 
Field Summary
static int CACHE_MAX_SIZE
          The maximum number of entry in the cache map.
 
Constructor Summary
SemiExternalOffsetList(InputBitStream offsetRawData, int offsetStep, int numOffsets)
          Creates a new semi-external list.
 
Method Summary
 long getLong(int index)
           
 int size()
           
 
Methods inherited from class it.unimi.dsi.fastutil.longs.AbstractLongList
add, add, add, addAll, addAll, addAll, addAll, addAll, addAll, addElements, addElements, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, get, getElements, hashCode, indexOf, indexOf, iterator, lastIndexOf, lastIndexOf, listIterator, listIterator, longListIterator, longListIterator, longSubList, peek, peekLong, pop, popLong, push, push, rem, remove, remove, removeElements, removeLong, set, set, size, subList, top, topLong, toString
 
Methods inherited from class it.unimi.dsi.fastutil.longs.AbstractLongCollection
add, contains, containsAll, containsAll, isEmpty, longIterator, rem, removeAll, removeAll, retainAll, retainAll, toArray, toArray, toArray, toLongArray, toLongArray
 
Methods inherited from class java.util.AbstractCollection
clear
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface java.util.List
add, clear, contains, containsAll, isEmpty, removeAll, retainAll, toArray, toArray
 
Methods inherited from interface it.unimi.dsi.fastutil.longs.LongCollection
containsAll, longIterator, removeAll, retainAll, toArray, toArray, toLongArray, toLongArray
 
Methods inherited from interface it.unimi.dsi.fastutil.Stack
isEmpty
 

Field Detail

CACHE_MAX_SIZE

public static final int CACHE_MAX_SIZE
The maximum number of entry in the cache map.

See Also:
Constant Field Values
Constructor Detail

SemiExternalOffsetList

public SemiExternalOffsetList(InputBitStream offsetRawData,
                              int offsetStep,
                              int numOffsets)
                       throws IOException
Creates a new semi-external list.

Parameters:
offsetRawData - a bit stream containing the offsets in compressed form (γ-encoded deltas).
offsetStep - the step used to build random-access entry points.
numOffsets - the overall number of offsets (i.e., the number of terms).
Throws:
IOException
Method Detail

getLong

public final long getLong(int index)

size

public int size()
Specified by:
size in interface Collection<Long>
Specified by:
size in interface List<Long>
Specified by:
size in class AbstractCollection<Long>