Class AbstractIntersectionDocumentIterator
- java.lang.Object
-
- it.unimi.di.big.mg4j.search.AbstractDocumentIterator
-
- it.unimi.di.big.mg4j.search.AbstractIntervalDocumentIterator
-
- it.unimi.di.big.mg4j.search.AbstractCompositeDocumentIterator
-
- it.unimi.di.big.mg4j.search.AbstractIntersectionDocumentIterator
-
- All Implemented Interfaces:
DocumentIterator
- Direct Known Subclasses:
AbstractOrderedIntervalDocumentIterator
,AndDocumentIterator
public abstract class AbstractIntersectionDocumentIterator extends AbstractCompositeDocumentIterator
An abstract iterator on documents, generating the intersection of the documents returned by a number of document iterators.The important invariant is that only after a call to
DocumentIterator.nextDocument()
, a call toDocumentIterator.intervalIterator(Index)
will return an interval iterator over the document just returned, and that for at least one index inAbstractIntervalDocumentIterator.indices()
the iterator will not be empty orTRUE
.The intersection algorithm
Since MG4J 1.1, this class implements a new intersection algorithm that should be significantly faster than the previous one. The main idea is that of letting sparser iterator interact as much as possible to obtain a candidate common document, and then trying to align the others. At construction time, the component iterators are sorted so that index iterators are separated, and sorted by frequency. Then, each time we have to align the iterators we align them greedily starting from the index iterators, in frequency order. This has the effect of skipping very quickly (and usually by large jumps, which are handled nicely by indices with skips), as the main interaction happens between low-frequency index iterators.
Moreover, this class treats in a special way index iterators coming from payload-based indices. Such iterators are checked at the end of the alignment process, after all standard index iterators (and general document iterators) are aligned. At that point, the special method
PayloadPredicateDocumentIterator.skipUnconditionallyTo(long)
is used to position unconditionally such iterators and check whether the payload predicate is satisfied. If this doesn't happen, the current candidate (obtained by alignment of standard iterators) is increased and the whole process is restarted. This procedure guarantees that we will never search exhaustively in a payload-based index a document record satisfying the predicate (unless, of course, we have a query containing justPayloadPredicateDocumentIterator
s), which is very efficient if the payload-based index uses skipping.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.search.AbstractCompositeDocumentIterator
AbstractCompositeDocumentIterator.AbstractCompositeIndexIntervalIterator, AbstractCompositeDocumentIterator.AbstractCompositeIntervalIterator
-
-
Field Summary
Fields Modifier and Type Field Description protected DocumentIterator
lastIterator
The last element ofsortedIterator
, which is usually the rarest term in the query.protected DocumentIterator[]
sortedIterator
The provided document iterators, suitably sorted.-
Fields inherited from class it.unimi.di.big.mg4j.search.AbstractCompositeDocumentIterator
documentIterator, indexIterator, indexIteratorsWithoutPositions, n
-
Fields inherited from class it.unimi.di.big.mg4j.search.AbstractIntervalDocumentIterator
currentIterators, indices, intervalIterators, soleIndex, soleIntervalIterator, unmodifiableCurrentIterators
-
Fields inherited from class it.unimi.di.big.mg4j.search.AbstractDocumentIterator
curr, weight
-
Fields inherited from interface it.unimi.di.big.mg4j.search.DocumentIterator
END_OF_LIST
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
AbstractIntersectionDocumentIterator(Index index, Object arg, DocumentIterator[] documentIterator)
Creates a new intersection iterator using a given array of iterators and a given index.protected
AbstractIntersectionDocumentIterator(Object arg, DocumentIterator... documentIterator)
Creates a new intersection iterator using a given array of iterators.
-
Method Summary
Modifier and Type Method Description protected long
align(long to)
Advances all iterators to the first common document pointer after the one specified.Reference2ReferenceMap<Index,IntervalIterator>
intervalIterators()
Returns an unmodifiable map from indices to interval iterators.-
Methods inherited from class it.unimi.di.big.mg4j.search.AbstractCompositeDocumentIterator
accept, acceptOnTruePaths, dispose, toString
-
Methods inherited from class it.unimi.di.big.mg4j.search.AbstractIntervalDocumentIterator
allIndexIterators, getIntervalIterator, indices, indices
-
Methods inherited from class it.unimi.di.big.mg4j.search.AbstractDocumentIterator
document, ensureOnADocument, mayHaveNext, weight, weight
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface it.unimi.di.big.mg4j.search.DocumentIterator
document, indices, intervalIterator, intervalIterator, mayHaveNext, nextDocument, skipTo, weight, weight
-
-
-
-
Field Detail
-
sortedIterator
protected final DocumentIterator[] sortedIterator
The provided document iterators, suitably sorted.
-
lastIterator
protected final DocumentIterator lastIterator
The last element ofsortedIterator
, which is usually the rarest term in the query.
-
-
Constructor Detail
-
AbstractIntersectionDocumentIterator
protected AbstractIntersectionDocumentIterator(Index index, Object arg, DocumentIterator[] documentIterator)
Creates a new intersection iterator using a given array of iterators and a given index.- Parameters:
index
- an index that will be passed toAbstractCompositeDocumentIterator(Index, Object, DocumentIterator...)
.arg
- an argument that will be passed toAbstractIntervalDocumentIterator.getIntervalIterator(Index, int, boolean, Object)
.documentIterator
- the iterators to be intersected (at least one).
-
AbstractIntersectionDocumentIterator
protected AbstractIntersectionDocumentIterator(Object arg, DocumentIterator... documentIterator)
Creates a new intersection iterator using a given array of iterators.- Parameters:
arg
- an argument that will be passed toAbstractIntervalDocumentIterator.getIntervalIterator(Index, int, boolean, Object)
.documentIterator
- the iterators to be intersected (at least one).
-
-
Method Detail
-
align
protected final long align(long to) throws IOException
Advances all iterators to the first common document pointer after the one specified.lastIterator
is assumed to be positioned on the specified document.After a call to this method, all component iterators are positioned on the returned document.
- Returns:
- the document on which all iterators are aligned, or
DocumentIterator.END_OF_LIST
. - Throws:
IOException
-
intervalIterators
public Reference2ReferenceMap<Index,IntervalIterator> intervalIterators() throws IOException
Description copied from interface:DocumentIterator
Returns an unmodifiable map from indices to interval iterators.After a call to
DocumentIterator.nextDocument()
, this map can be used to retrieve the intervals in the current document. An invocation ofMap.get(java.lang.Object)
on this map with argumentindex
yields the same result asintervalIterator(index)
.- Returns:
- a map from indices to interval iterators over the current document.
- Throws:
IOException
- See Also:
DocumentIterator.intervalIterator(Index)
-
-