Class SkipBitStreamIndexWriter

  • All Implemented Interfaces:
    IndexWriter, VariableQuantumIndexWriter

    public class SkipBitStreamIndexWriter
    extends BitStreamIndexWriter
    implements VariableQuantumIndexWriter
    Writes a bitstream-based interleaved index with skips.

    These indices are managed by MG4J mainly for historical reasons, as quasi-succinct indices are just better under every respect.

    An interleaved inverted index with skips makes it possible to skip ahead quickly while reading inverted lists. More specifically, when reading the inverted list relative to a certain term, one may want to decide to skip all document records that concern documents with pointer less than a given integer. In a normal inverted index this is impossible: one would have to read all document records sequentially.

    The skipping structure used by this class is new, and has been described by Paolo Boldi and Sebastiano Vigna in “Compressed perfect embedded skip lists for quick inverted-index lookups”, Proc. SPIRE 2005, volume 3772 of Lecture Notes in Computer Science, pages 25−28. Springer, 2005.

    Paolo Boldi, Sebastiano Vigna
    • Field Detail


        public static final int DEFAULT_TEMP_BUFFER_SIZE
        The size of the buffer for the temporary file used to build an inverted list. Inverted lists shorter than this number of bytes will be directly rebuilt from the buffer, and never flushed to disk.
        See Also:
        Constant Field Values
      • bitsForVariableQuanta

        public long bitsForVariableQuanta
        The number of bits written for variable quanta.
      • bitsForQuantumBitLengths

        public long bitsForQuantumBitLengths
        The number of bits written for quantum lengths.
      • bitsForEntryBitLengths

        public long bitsForEntryBitLengths
        The number of bits written for entry lenghts.
      • numberOfBlocks

        public long numberOfBlocks
        The number of written blocks.
      • prevEntryBitLength

        public int prevEntryBitLength
        An estimate on the number of bits occupied per tower entry in the last written cache, or -1 if no cache has been written for the current inverted list.
      • prevQuantumBitLength

        public int prevQuantumBitLength
        An estimate on the number of bits occupied per quantum in the last written cache, or -1 if no cache has been written for the current inverted list.
    • Constructor Detail

      • SkipBitStreamIndexWriter

        public SkipBitStreamIndexWriter​(IOFactory ioFactory,
                                        CharSequence basename,
                                        long numberOfDocuments,
                                        boolean writeOffsets,
                                        int tempBufferSize,
                                        Map<CompressionFlags.Component,​CompressionFlags.Coding> flags,
                                        int quantum,
                                        int height)
                                 throws IOException
        Creates a new skip index writer with the specified basename. The index will be written on a file (stemmed with .index). If writeOffsets, also an offset file will be produced (stemmed with .offsets).
        ioFactory - the factory that will be used to perform I/O.
        basename - the basename.
        numberOfDocuments - the number of documents in the collection to be indexed.
        writeOffsets - if true, the offset file will also be produced.
        tempBufferSize - the size in bytes of the internal temporary buffer (inverted lists shorter than this size will never be flushed to disk).
        flags - a flag map setting the coding techniques to be used (see CompressionFlags).
        quantum - the quantum; it must be zero, or a power of two; if it is zero, a variable-quantum index is assumed.
        height - the maximum height of a skip tower; the cache will contain at most 2h document records.