Class BitStreamHPIndexWriter

    • Field Detail

      • DEFAULT_TEMP_BUFFER_SIZE

        public static final int DEFAULT_TEMP_BUFFER_SIZE
        The size of the buffer for the temporary file used to build an inverted list. Inverted lists shorter than this number of bytes will be directly rebuilt from the buffer, and never flushed to disk.
        See Also:
        Constant Field Values
      • BEFORE_PAYLOAD

        protected static final int BEFORE_PAYLOAD
        This value of state can be assumed only in indices that contain payloads; it means that we are positioned just before the payload for the current document record.
        See Also:
        Constant Field Values
      • BEFORE_COUNT

        protected static final int BEFORE_COUNT
        This value of state can be assumed only in indices that contain counts; it means that we are positioned just before the count for the current document record.
        See Also:
        Constant Field Values
      • BEFORE_POSITIONS

        protected static final int BEFORE_POSITIONS
        This value of state can be assumed only in indices that contain document positions; it means that we are positioned just before the position list of the current document record.
        See Also:
        Constant Field Values
      • FIRST_UNUSED_STATE

        protected static final int FIRST_UNUSED_STATE
        This is the first unused state. Subclasses may start from this value to define new states.
        See Also:
        Constant Field Values
      • state

        protected int state
        The current state of the writer.
      • frequency

        protected long frequency
        The number of document records that the current inverted list will contain.
      • writtenDocuments

        protected long writtenDocuments
        The number of document records already written for the current inverted list.
      • currentDocument

        protected long currentDocument
        The current document pointer.
      • lastDocument

        protected long lastDocument
        The last document pointer in the current list.
      • b

        protected int b
        The parameter b for Golomb coding of pointers.
      • log2b

        protected int log2b
        The parameter log2b for Golomb coding of pointers; it is the most significant bit of b.
      • maxCount

        public int maxCount
        The maximum number of positions in a document record so far.
      • bitsForPositionsOffsets

        public long bitsForPositionsOffsets
        The number of bits written for offsets in the file of positions.
      • bitsForVariableQuanta

        public long bitsForVariableQuanta
        The number of bits written for variable quanta.
      • bitsForQuantumBitLengths

        public long bitsForQuantumBitLengths
        The number of bits written for quantum lengths.
      • bitsForPositionsQuantumBitLengths

        public long bitsForPositionsQuantumBitLengths
        The number of bits written for quantum lengths in the positions stream.
      • bitsForEntryBitLengths

        public long bitsForEntryBitLengths
        The number of bits written for entry lenghts.
      • numberOfBlocks

        public long numberOfBlocks
        The number of written blocks.
      • prevEntryBitLength

        public int prevEntryBitLength
        An estimate on the number of bits occupied per tower entry in the last written cache, or -1 if no cache has been written for the current inverted list.
      • prevQuantumBitLength

        public int prevQuantumBitLength
        An estimate on the number of bits occupied per quantum in the last written cache, or -1 if no cache has been written for the current inverted list.
      • prevPositionsQuantumBitLength

        public int prevPositionsQuantumBitLength
        An estimate on the number of bits occupied per quantum in the positions stream in the last written cache, or -1 if no cache has been written for the current inverted list.
    • Constructor Detail

      • BitStreamHPIndexWriter

        public BitStreamHPIndexWriter​(CharSequence basename,
                                      long numberOfDocuments,
                                      boolean writeOffsets,
                                      int tempBufferSize,
                                      Map<CompressionFlags.Component,​CompressionFlags.Coding> flags,
                                      int quantum,
                                      int height)
                               throws IOException
        Creates a new index writer, with the specified basename. The index will be written on a file (stemmed with .index). If writeOffsets, also an offset file will be produced (stemmed with .offsets).
        Parameters:
        basename - the basename.
        numberOfDocuments - the number of documents in the collection to be indexed.
        writeOffsets - if true, the offset file will also be produced.
        tempBufferSize - the size of the write buffer of the cache.
        flags - a flag map setting the coding techniques to be used (see CompressionFlags).
        quantum - the quantum; it must be zero, or a power of two; if it is zero, a variable-quantum index is assumed.
        height - the maximum height of a skip tower; the cache will contain at most 2h document records.
        Throws:
        IOException