Class QuasiSuccinctIndex

  • All Implemented Interfaces:
    Serializable

    public class QuasiSuccinctIndex
    extends Index
    A quasi-succinct index.

    A quasi-succinct index does not use gap-compression to represent its various components, but rather the Elias–Fano representation of monotone sequences. The index was described in detail by Sebastiano Vigna in the paper “Quasi-Succinct Indices”, Proceedings of the 6th ACM International Conference on Web Search and Data Mining, WSDM'13, pages 83−92. ACM, 2013. It is smaller than a γ/δ-code gap-compressed index, and significantly faster when computing conjunctive, phrasal or proximity operators, as it provides constant-time access on average to every piece of information in the index.

    In a quasi-succinct index pointers, counters and positions are represented in three different files, each of which has its own offset file. The file do not contain a byte-oriented bitstream representation, but rather arrays of 64-bit longwords with specified byte order (by default the native one for performance reasons). The longwords are either loaded in memory as a LongBigArrayBigList or mapped using a ByteBufferLongBigList. The bit k of a file is the bit k mod 64 of the longword of index ⌊k / 64⌋.

    Note that the methods providing pointers, counts and positions to index readers use reflection to detect whether the LongBigList storing a component is a ByteBufferLongBigList, and in that case they return a copy.

    Author:
    Sebastiano Vigna
    See Also:
    QuasiSuccinctIndexReader, QuasiSuccinctIndexWriter, Serialized Form