Class QuasiSuccinctIndexWriter

    • Field Detail

      • DEFAULT_CACHE_SIZE

        public static final int DEFAULT_CACHE_SIZE
        The default size of the bit cache.
        See Also:
        Constant Field Values
    • Constructor Detail

      • QuasiSuccinctIndexWriter

        public QuasiSuccinctIndexWriter​(IOFactory ioFactory,
                                        CharSequence basename,
                                        long numberOfDocuments,
                                        int log2Quantum,
                                        int cacheSize,
                                        Map<CompressionFlags.Component,​CompressionFlags.Coding> flags,
                                        ByteOrder byteOrder)
                                 throws IOException
        Creates a new index writer, with the specified basename.
        Parameters:
        ioFactory - the factory that will be used to perform I/O.
        basename - the basename.
        numberOfDocuments - the number of documents in the collection to be indexed.
        log2Quantum - the logarithm of the quantum.
        cacheSize - the size in byte of the bit caches.
        byteOrder - the byte order of the index (if null, ByteOrder.nativeOrder()).
        Throws:
        IOException
    • Method Detail

      • lowerBits

        public static int lowerBits​(long length,
                                    long upperBound,
                                    boolean strict)
        Returns the number of lower bits for the Elias–Fano encoding of a list of given length, upper bound and strictness.
        Parameters:
        length - the number of elements of the list.
        upperBound - an upper bound for the elements of the list.
        strict - if true, the elements of the list are strictly increasing, and the returned number of bits is for the strict representation (e.g., storing the k-th element decreased by k).
        Returns:
        the number of bits for the Elias–Fano encoding of a list with the specified parameters.
      • pointerSize

        public static int pointerSize​(long length,
                                      long upperBound,
                                      boolean strict,
                                      boolean indexZeroes)
        Returns the size in bits of forward or skip pointers to the Elias–Fano encoding of a list of given length, upper bound and strictness.
        Parameters:
        length - the number of elements of the list.
        upperBound - an upper bound for the elements of the list.
        strict - if true, the elements of the list are strictly increasing, and the returned number of bits is for the strict representation (e.g., storing the k-th element decreased by k).
        indexZeroes - if true, the number of bits for skip pointers is returned; otherwise, the number of bits for forward pointers is returned.
        Returns:
        the size of bits of forward or skip pointers the Elias–Fano encoding of a list with the specified parameters.
      • numberOfPointers

        public static long numberOfPointers​(long length,
                                            long upperBound,
                                            int log2Quantum,
                                            boolean strict,
                                            boolean indexZeroes)
        Returns the number of forward or skip pointers to the Elias–Fano encoding of a list of given length, upper bound and strictness.
        Parameters:
        length - the number of elements of the list.
        upperBound - an upper bound for the elements of the list.
        log2Quantum - the logarithm of the quantum size.
        strict - if true, the elements of the list are strictly increasing, and the returned number of bits is for the strict representation (e.g., storing the k-th element decreased by k).
        indexZeroes - if true, an upper bound on the number of skip pointers is returned; otherwise, the (exact) number of forward pointers is returned.
        Returns:
        an upper bound on the number of skip pointers or the (exact) number of forward pointers.
      • newInvertedList

        public void newInvertedList​(long frequency,
                                    long occurrency,
                                    long sumMaxPos)
                             throws IOException
        Starts a new inverted list. The previous inverted list, if any, is actually written to the underlying bit stream.

        This method provides additional information which is necessary to build the posting list. The information can be omitted if only part of the index is being written (e.g., no positions or even no counts and positions).

        Parameters:
        frequency - the frequency of the inverted list.
        occurrency - the occurrency of the inverted list (use -1 if you are not writing counts).
        sumMaxPos - the sum of the maximum position in each document (unused if positions are not indexed).
        Throws:
        IllegalStateException - if too few records were written for the previous inverted list.
        IOException
        See Also:
        IndexWriter.newInvertedList()
      • newInvertedList

        public long newInvertedList()
                             throws IOException
        Description copied from interface: IndexWriter
        Starts a new inverted list. The previous inverted list, if any, is actually written to the underlying bit stream.
        Specified by:
        newInvertedList in interface IndexWriter
        Returns:
        the position (in bits) of the underlying bit stream where the new inverted list starts.
        Throws:
        IOException
      • writeFrequency

        public void writeFrequency​(long frequency)
        Description copied from interface: IndexWriter
        Writes the frequency.
        Specified by:
        writeFrequency in interface IndexWriter
        Parameters:
        frequency - the (positive) number of document records that this inverted list will contain.
      • writeDocumentPositions

        public void writeDocumentPositions​(OutputBitStream unused,
                                           int[] position,
                                           int offset,
                                           int count,
                                           int docSize)
                                    throws IOException
        Description copied from interface: IndexWriter
        Writes the positions of the occurrences of the current term in the current document to the given OutputBitStream.
        Specified by:
        writeDocumentPositions in interface IndexWriter
        Parameters:
        unused - the output stream where the occurrences should be written.
        position - the position vector (a sequence of strictly increasing natural numbers).
        offset - the first valid entry in position.
        count - the number of valid entries in position starting from offset.
        docSize - the size of the current document (only for Golomb and interpolative coding; you can safely pass -1 otherwise).
        Throws:
        IOException
      • writtenBits

        public long writtenBits()
        Description copied from interface: IndexWriter
        Returns the overall number of bits written onto the underlying stream(s).
        Specified by:
        writtenBits in interface IndexWriter
        Returns:
        the number of bits written, according to the variables keeping statistical records.
      • close

        public void close()
                   throws IOException
        Description copied from interface: IndexWriter
        Closes this index writer, completing the index creation process and releasing all resources.
        Specified by:
        close in interface IndexWriter
        Throws:
        IOException
      • printStats

        public void printStats​(PrintStream stats)
        Description copied from interface: IndexWriter
        Writes to the given print stream statistical information about the index just built. This method must be called after IndexWriter.close().
        Specified by:
        printStats in interface IndexWriter
        Parameters:
        stats - a print stream where statistical information will be written.