it.unimi.di.mg4j.search
Class AbstractIntersectionDocumentIterator
java.lang.Object
it.unimi.di.mg4j.search.AbstractDocumentIterator
it.unimi.di.mg4j.search.AbstractIntervalDocumentIterator
it.unimi.di.mg4j.search.AbstractCompositeDocumentIterator
it.unimi.di.mg4j.search.AbstractIntersectionDocumentIterator
- All Implemented Interfaces:
- DocumentIterator
- Direct Known Subclasses:
- AbstractOrderedIntervalDocumentIterator, AndDocumentIterator
public abstract class AbstractIntersectionDocumentIterator
- extends AbstractCompositeDocumentIterator
An abstract iterator on documents, generating the intersection of the documents returned by
a number of document iterators.
The important invariant is that only after a call to DocumentIterator.nextDocument()
, a call
to DocumentIterator.intervalIterator(Index)
will return an interval iterator over the document
just returned, and that for at least one index in AbstractIntervalDocumentIterator.indices()
the iterator will not be empty
or TRUE
.
The intersection algorithm
Since MG4J 1.1, this class implements a new intersection algorithm that should be significantly
faster than the previous one. The main idea is that of letting sparser iterator interact as much
as possible to obtain a candidate common document, and then trying to align the others. At construction
time, the component iterators are sorted so that index iterators are separated, and sorted by frequency.
Then, each time we have to align the iterators we align them greedily starting from the index
iterators, in frequency order. This has the effect of skipping very quickly (and usually by
large jumps, which are handled nicely by indices with skips),
as the main interaction happens between low-frequency index iterators.
Moreover, this class treats in a special way
index iterators coming from payload-based indices. Such
iterators are checked at the end of the alignment process,
after all standard index iterators (and general document iterators)
are aligned. At that point, the special method PayloadPredicateDocumentIterator.skipUnconditionallyTo(int)
is used to position unconditionally such iterators and check whether the payload predicate is satisfied.
If this doesn't happen, the current candidate (obtained by alignment of standard iterators) is increased and the
whole process is restarted. This procedure guarantees that we will never search exhaustively in a
payload-based index a document record satisfying the predicate (unless, of course, we have a query
containing just PayloadPredicateDocumentIterator
s), which is very efficient if the payload-based
index uses skipping.
sortedIterator
protected final DocumentIterator[] sortedIterator
- The provided document iterators, suitably sorted.
lastIterator
protected final DocumentIterator lastIterator
- The last element of
sortedIterator
, which is usually the rarest term in the query.
AbstractIntersectionDocumentIterator
protected AbstractIntersectionDocumentIterator(Index index,
Object arg,
DocumentIterator[] documentIterator)
- Creates a new intersection iterator using a given array of iterators and a given index.
- Parameters:
index
- an index that will be passed to AbstractCompositeDocumentIterator.AbstractCompositeDocumentIterator(Index, Object, DocumentIterator...)
.arg
- an argument that will be passed to AbstractIntervalDocumentIterator.getIntervalIterator(Index, int, boolean, Object)
.documentIterator
- the iterators to be intersected (at least one).
AbstractIntersectionDocumentIterator
protected AbstractIntersectionDocumentIterator(Object arg,
DocumentIterator... documentIterator)
- Creates a new intersection iterator using a given array of iterators.
- Parameters:
arg
- an argument that will be passed to AbstractIntervalDocumentIterator.getIntervalIterator(Index, int, boolean, Object)
.documentIterator
- the iterators to be intersected (at least one).
align
protected final int align(int to)
throws IOException
- Advances all iterators to the first common document pointer after the one specified.
lastIterator
is assumed to be positioned on the specified document.
After a call to this method, all component iterators are positioned
on the returned document.
- Returns:
- the document on which all iterators are aligned, or
DocumentIterator.END_OF_LIST
.
- Throws:
IOException
intervalIterators
public Reference2ReferenceMap<Index,IntervalIterator> intervalIterators()
throws IOException
- Description copied from interface:
DocumentIterator
- Returns an unmodifiable map from indices to interval iterators.
After a call to DocumentIterator.nextDocument()
, this map
can be used to retrieve the intervals in the current document. An invocation of Map.get(java.lang.Object)
on this map with argument index
yields the same result as
intervalIterator(index)
.
- Returns:
- a map from indices to interval iterators over the current document.
- Throws:
IOException
- See Also:
DocumentIterator.intervalIterator(Index)