it.unimi.di.mg4j.index
Interface IndexWriter

All Known Implementing Classes:
AbstractBitStreamIndexWriter, BitStreamHPIndexWriter, BitStreamIndexWriter, QuasiSuccinctIndexWriter, SkipBitStreamIndexWriter

public interface IndexWriter

An interface for classes that generate indices.

Implementations of this interface are used to write inverted lists in sequential order, as follows:

newDocumentRecord() returns an OutputBitStream that must be used to write the document-record data. Note that there is no guarantee that the returned OutputBitStream coincides with the underlying bit stream, or that is even null. Moreover, there is no guarantee as to when the bits will be actually written on the underlying stream, except that when starting a new inverted list, the previous inverted list, if any, will be written onto the underlying stream.

Indices with special needs, such as variable-quantum index writers or quasi-succinct index writers might require ad hoc methods to start a new inverted list (e.g., QuasiSuccinctIndexWriter.newInvertedList(int, long, long)). If you want to use these writers, your code must use instanceof and act accordingly.

Since:
1.2
Author:
Paolo Boldi, Sebastiano Vigna

Method Summary
 void close()
          Closes this index writer, completing the index creation process and releasing all resources.
 OutputBitStream newDocumentRecord()
          Starts a new document record.
 long newInvertedList()
          Starts a new inverted list.
 void printStats(PrintStream stats)
          Writes to the given print stream statistical information about the index just built.
 Properties properties()
          Returns properties of the index generated by this index writer.
 void writeDocumentPointer(OutputBitStream out, int pointer)
          Writes a document pointer.
 void writeDocumentPositions(OutputBitStream out, int[] position, int offset, int count, int docSize)
          Writes the positions of the occurrences of the current term in the current document to the given OutputBitStream.
 void writeFrequency(int frequency)
          Writes the frequency.
 void writePayload(OutputBitStream out, Payload payload)
          Writes the payload for the current document.
 void writePositionCount(OutputBitStream out, int count)
          Writes the count of the occurrences of the current term in the current document to the given OutputBitStream.
 long writtenBits()
          Returns the overall number of bits written onto the underlying stream(s).
 

Method Detail

newInvertedList

long newInvertedList()
                     throws IOException
Starts a new inverted list. The previous inverted list, if any, is actually written to the underlying bit stream.

Returns:
the position (in bits) of the underlying bit stream where the new inverted list starts.
Throws:
IllegalStateException - if too few records were written for the previous inverted list.
IOException

writeFrequency

void writeFrequency(int frequency)
                    throws IOException
Writes the frequency.

Parameters:
frequency - the (positive) number of document records that this inverted list will contain.
Throws:
IOException

newDocumentRecord

OutputBitStream newDocumentRecord()
                                  throws IOException
Starts a new document record.

This method must be called exactly exactly f times, where f is the frequency specified with writeFrequency(int).

Returns:
the output bit stream where the next document record data should be written, if necessary, or null, if writeDocumentPointer(OutputBitStream, int) ignores its first argument.
Throws:
IllegalStateException - if too many records were written for the current inverted list, or if there is no current inverted list.
IOException

writeDocumentPointer

void writeDocumentPointer(OutputBitStream out,
                          int pointer)
                          throws IOException
Writes a document pointer.

This method must be called immediately after newDocumentRecord().

Parameters:
out - the output bit stream where the pointer will be written.
pointer - the document pointer.
Throws:
IOException

writePayload

void writePayload(OutputBitStream out,
                  Payload payload)
                  throws IOException
Writes the payload for the current document.

This method must be called immediately after writeDocumentPointer(OutputBitStream, int).

Parameters:
out - the output bit stream where the payload will be written.
payload - the payload.
Throws:
IOException

writePositionCount

void writePositionCount(OutputBitStream out,
                        int count)
                        throws IOException
Writes the count of the occurrences of the current term in the current document to the given OutputBitStream.

Parameters:
out - the output stream where the occurrences should be written.
count - the count.
Throws:
IOException

writeDocumentPositions

void writeDocumentPositions(OutputBitStream out,
                            int[] position,
                            int offset,
                            int count,
                            int docSize)
                            throws IOException
Writes the positions of the occurrences of the current term in the current document to the given OutputBitStream.

Parameters:
out - the output stream where the occurrences should be written.
position - the position vector (a sequence of strictly increasing natural numbers).
offset - the first valid entry in position.
count - the number of valid entries in position starting from offset.
docSize - the size of the current document (only for Golomb and interpolative coding; you can safely pass -1 otherwise).
Throws:
IllegalStateException - if there is no current inverted list.
IOException

writtenBits

long writtenBits()
Returns the overall number of bits written onto the underlying stream(s).

Returns:
the number of bits written, according to the variables keeping statistical records.

properties

Properties properties()
Returns properties of the index generated by this index writer.

This method should only be called after close(). It returns a new property object containing values for (whenever appropriate) Index.PropertyKeys.DOCUMENTS, Index.PropertyKeys.TERMS, Index.PropertyKeys.POSTINGS, Index.PropertyKeys.MAXCOUNT, Index.PropertyKeys.INDEXCLASS, Index.PropertyKeys.CODING, Index.PropertyKeys.PAYLOADCLASS, BitStreamIndex.PropertyKeys.SKIPQUANTUM, and BitStreamIndex.PropertyKeys.SKIPHEIGHT.

Returns:
properties a new set of properties for the just created index.

close

void close()
           throws IOException
Closes this index writer, completing the index creation process and releasing all resources.

Throws:
IllegalStateException - if too few records were written for the last inverted list.
IOException

printStats

void printStats(PrintStream stats)
Writes to the given print stream statistical information about the index just built. This method must be called after close().

Parameters:
stats - a print stream where statistical information will be written.