Interface IndexWriter
-
- All Known Implementing Classes:
AbstractBitStreamIndexWriter
,BitStreamHPIndexWriter
,BitStreamIndexWriter
,QuasiSuccinctIndexWriter
,SkipBitStreamIndexWriter
public interface IndexWriter
An interface for classes that generate indices.Implementations of this interface are used to write inverted lists in sequential order, as follows:
- to create a new inverted list, you must call
newInvertedList()
; - then, you must specified the frequency using
writeFrequency(long)
; - the document records follow; before writing a new document record, you must call
newDocumentRecord()
; note that, all in all, the number of calls tonewDocumentRecord()
must be equal to the frequency; - for each document record, you must supply the information needed for the index you are building (pointer, payload, count, and positions, in this order).
newDocumentRecord()
returns anOutputBitStream
that must be used to write the document-record data. Note that there is no guarantee that the returnedOutputBitStream
coincides with the underlying bit stream, or that is evennull
. Moreover, there is no guarantee as to when the bits will be actually written on the underlying stream, except that when starting a new inverted list, the previous inverted list, if any, will be written onto the underlying stream.Indices with special needs, such as variable-quantum index writers or quasi-succinct index writers might require ad hoc methods to start a new inverted list (e.g.,
QuasiSuccinctIndexWriter.newInvertedList(long, long, long)
). If you want to use these writers, your code must useinstanceof
and act accordingly.- Since:
- 1.2
- Author:
- Paolo Boldi, Sebastiano Vigna
-
-
Method Summary
Modifier and Type Method Description void
close()
Closes this index writer, completing the index creation process and releasing all resources.OutputBitStream
newDocumentRecord()
Starts a new document record.long
newInvertedList()
Starts a new inverted list.void
printStats(PrintStream stats)
Writes to the given print stream statistical information about the index just built.Properties
properties()
Returns properties of the index generated by this index writer.void
writeDocumentPointer(OutputBitStream out, long pointer)
Writes a document pointer.void
writeDocumentPositions(OutputBitStream out, int[] position, int offset, int count, int docSize)
Writes the positions of the occurrences of the current term in the current document to the givenOutputBitStream
.void
writeFrequency(long frequency)
Writes the frequency.void
writePayload(OutputBitStream out, Payload payload)
Writes the payload for the current document.void
writePositionCount(OutputBitStream out, int count)
Writes the count of the occurrences of the current term in the current document to the givenOutputBitStream
.long
writtenBits()
Returns the overall number of bits written onto the underlying stream(s).
-
-
-
Method Detail
-
newInvertedList
long newInvertedList() throws IOException
Starts a new inverted list. The previous inverted list, if any, is actually written to the underlying bit stream.- Returns:
- the position (in bits) of the underlying bit stream where the new inverted list starts.
- Throws:
IllegalStateException
- if too few records were written for the previous inverted list.IOException
-
writeFrequency
void writeFrequency(long frequency) throws IOException
Writes the frequency.- Parameters:
frequency
- the (positive) number of document records that this inverted list will contain.- Throws:
IOException
-
newDocumentRecord
OutputBitStream newDocumentRecord() throws IOException
Starts a new document record.This method must be called exactly exactly f times, where f is the frequency specified with
writeFrequency(long)
.- Returns:
- the output bit stream where the next document record data should be written, if necessary, or
null
, ifwriteDocumentPointer(OutputBitStream, long)
ignores its first argument. - Throws:
IllegalStateException
- if too many records were written for the current inverted list, or if there is no current inverted list.IOException
-
writeDocumentPointer
void writeDocumentPointer(OutputBitStream out, long pointer) throws IOException
Writes a document pointer.This method must be called immediately after
newDocumentRecord()
.- Parameters:
out
- the output bit stream where the pointer will be written.pointer
- the document pointer.- Throws:
IOException
-
writePayload
void writePayload(OutputBitStream out, Payload payload) throws IOException
Writes the payload for the current document.This method must be called immediately after
writeDocumentPointer(OutputBitStream, long)
.- Parameters:
out
- the output bit stream where the payload will be written.payload
- the payload.- Throws:
IOException
-
writePositionCount
void writePositionCount(OutputBitStream out, int count) throws IOException
Writes the count of the occurrences of the current term in the current document to the givenOutputBitStream
.- Parameters:
out
- the output stream where the occurrences should be written.count
- the count.- Throws:
IOException
-
writeDocumentPositions
void writeDocumentPositions(OutputBitStream out, int[] position, int offset, int count, int docSize) throws IOException
Writes the positions of the occurrences of the current term in the current document to the givenOutputBitStream
.- Parameters:
out
- the output stream where the occurrences should be written.position
- the position vector (a sequence of strictly increasing natural numbers).offset
- the first valid entry inposition
.count
- the number of valid entries inposition
starting fromoffset
.docSize
- the size of the current document (only for Golomb and interpolative coding; you can safely pass -1 otherwise).- Throws:
IllegalStateException
- if there is no current inverted list.IOException
-
writtenBits
long writtenBits()
Returns the overall number of bits written onto the underlying stream(s).- Returns:
- the number of bits written, according to the variables keeping statistical records.
-
properties
Properties properties()
Returns properties of the index generated by this index writer.This method should only be called after
close()
. It returns a new property object containing values for (whenever appropriate)Index.PropertyKeys.DOCUMENTS
,Index.PropertyKeys.TERMS
,Index.PropertyKeys.POSTINGS
,Index.PropertyKeys.MAXCOUNT
,Index.PropertyKeys.INDEXCLASS
,Index.PropertyKeys.CODING
,Index.PropertyKeys.PAYLOADCLASS
,BitStreamIndex.PropertyKeys.SKIPQUANTUM
, andBitStreamIndex.PropertyKeys.SKIPHEIGHT
.- Returns:
- properties a new set of properties for the just created index.
-
close
void close() throws IOException
Closes this index writer, completing the index creation process and releasing all resources.- Throws:
IllegalStateException
- if too few records were written for the last inverted list.IOException
-
printStats
void printStats(PrintStream stats)
Writes to the given print stream statistical information about the index just built. This method must be called afterclose()
.- Parameters:
stats
- a print stream where statistical information will be written.
-
-