Class ConsecutiveDocumentIterator
- java.lang.Object
-
- it.unimi.di.big.mg4j.search.AbstractDocumentIterator
-
- it.unimi.di.big.mg4j.search.AbstractIntervalDocumentIterator
-
- it.unimi.di.big.mg4j.search.AbstractCompositeDocumentIterator
-
- it.unimi.di.big.mg4j.search.AbstractIntersectionDocumentIterator
-
- it.unimi.di.big.mg4j.search.AbstractOrderedIntervalDocumentIterator
-
- it.unimi.di.big.mg4j.search.ConsecutiveDocumentIterator
-
- All Implemented Interfaces:
DocumentIterator
public class ConsecutiveDocumentIterator extends AbstractOrderedIntervalDocumentIterator
An iterator returning documents containing consecutive intervals (in query order) satisfying the underlying queries.As an additional service, this class makes it possible to specify gaps between intervals. If gaps are specified, a match will satisfy the condition that the left extreme of the first interval is larger than or equal to the first gap, the left extreme of the second interval is equal to the right extreme of the first interval plus the second gap plus one, the left extreme of the third interval is equal to the right extreme of the second interval plus the third gap plus one and so on. The standard semantics corresponds thus to the everywhere zero gap array. That the returned intervals will contain the leftmost gap, too.
This semantics makes it possible to perform phrasal searches “with holes”, typically because of stopwords that have not been indexed. Note that it is possible to specify a gap before the first interval, but not after the last interval, as in general the document length is not known at this level of query resolution.
This class will handle correctly
TRUE
iterators; in this case, the semantics is defined as follows: an interval is in the output if it is formed by the union of disjoint intervals, one from each input list, and each gap of value k corresponds to k iterators returning all document positions as singleton intervals. SinceTRUE
represents a list containing just the empty interval, the result is equivalent to droppingTRUE
iterators from the input; as a consequence, the gap of aTRUE
iterator is merged with that of the following iterator.Warning: In case gaps are specified, the mathematically correct semantics would require that gaps before
TRUE
iterators that are not followed by any non-TRUE
iterators have the effect of enlarging the resulting intervals on the right side. However, this behaviour is very difficult to implement at this level because document lengths are not known. For this reason, if one or moreTRUE
iterators appear a the end of the component iterator list they will be simply dropped.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected class
ConsecutiveDocumentIterator.ConsecutiveIndexIntervalIterator
protected class
ConsecutiveDocumentIterator.ConsecutiveIntervalIterator
-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.search.AbstractCompositeDocumentIterator
AbstractCompositeDocumentIterator.AbstractCompositeIndexIntervalIterator, AbstractCompositeDocumentIterator.AbstractCompositeIntervalIterator
-
-
Field Summary
-
Fields inherited from class it.unimi.di.big.mg4j.search.AbstractIntersectionDocumentIterator
lastIterator, sortedIterator
-
Fields inherited from class it.unimi.di.big.mg4j.search.AbstractCompositeDocumentIterator
documentIterator, indexIterator, indexIteratorsWithoutPositions, n
-
Fields inherited from class it.unimi.di.big.mg4j.search.AbstractIntervalDocumentIterator
currentIterators, indices, intervalIterators, soleIndex, soleIntervalIterator, unmodifiableCurrentIterators
-
Fields inherited from class it.unimi.di.big.mg4j.search.AbstractDocumentIterator
curr, weight
-
Fields inherited from interface it.unimi.di.big.mg4j.search.DocumentIterator
END_OF_LIST
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
ConsecutiveDocumentIterator(DocumentIterator[] documentIterator, int[] gap)
-
Method Summary
Modifier and Type Method Description static DocumentIterator
getInstance(Index index, DocumentIterator... documentIterator)
Returns a document iterator that computes the consecutive AND of the given array of iterators.static DocumentIterator
getInstance(DocumentIterator... documentIterator)
Returns a document iterator that computes the consecutive AND of the given nonzero-length array of iterators.static DocumentIterator
getInstance(DocumentIterator[] documentIterator, int[] gap)
Returns a document iterator that computes the consecutive AND of the given nonzero-length array of iterators, adding gaps between intervals.protected IntervalIterator
getIntervalIterator(Index unused, int n, boolean allIndexIterators, Object arg)
Creates an interval iterator suitable for thisAbstractIntervalDocumentIterator
.-
Methods inherited from class it.unimi.di.big.mg4j.search.AbstractOrderedIntervalDocumentIterator
intervalIterator, intervalIterator, intervalIterators, nextDocument, skipTo
-
Methods inherited from class it.unimi.di.big.mg4j.search.AbstractIntersectionDocumentIterator
align
-
Methods inherited from class it.unimi.di.big.mg4j.search.AbstractCompositeDocumentIterator
accept, acceptOnTruePaths, dispose, toString
-
Methods inherited from class it.unimi.di.big.mg4j.search.AbstractIntervalDocumentIterator
allIndexIterators, indices, indices
-
Methods inherited from class it.unimi.di.big.mg4j.search.AbstractDocumentIterator
document, ensureOnADocument, mayHaveNext, weight, weight
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface it.unimi.di.big.mg4j.search.DocumentIterator
document, indices, mayHaveNext, weight, weight
-
-
-
-
Constructor Detail
-
ConsecutiveDocumentIterator
protected ConsecutiveDocumentIterator(DocumentIterator[] documentIterator, int[] gap)
-
-
Method Detail
-
getInstance
public static DocumentIterator getInstance(Index index, DocumentIterator... documentIterator) throws IOException
Returns a document iterator that computes the consecutive AND of the given array of iterators.Note that the special case of the empty and of the singleton arrays are handled efficiently.
- Parameters:
index
- the default index; relevant only ifit
has zero length.documentIterator
- the iterators to be composed.- Returns:
- a document iterator that computes the consecutive AND of
it
. - Throws:
IOException
-
getInstance
public static DocumentIterator getInstance(DocumentIterator... documentIterator) throws IOException
Returns a document iterator that computes the consecutive AND of the given nonzero-length array of iterators.Note that the special case of the singleton array is handled efficiently.
- Parameters:
documentIterator
- the iterators to be composed (at least one).- Returns:
- a document iterator that computes the consecutive AND of
documentIterator
. - Throws:
IOException
-
getInstance
public static DocumentIterator getInstance(DocumentIterator[] documentIterator, int[] gap) throws IOException
Returns a document iterator that computes the consecutive AND of the given nonzero-length array of iterators, adding gaps between intervals.A match will satisfy the condition that the left extreme of the first interval is larger than or equal to the first gap, the left extreme of the second interval is larger than the right extreme of the first interval plus the second gap, and so on. This semantics makes it possible to perform phrasal searches “with holes”, typically because of stopwords that have not been indexed.
- Parameters:
documentIterator
- the iterators to be composed (at least one).gap
- an array of gaps parallel todocumentIterator
, ornull
for no gaps.- Returns:
- a document iterator that computes the consecutive AND of
documentIterator
using the given gaps. - Throws:
IOException
-
getIntervalIterator
protected IntervalIterator getIntervalIterator(Index unused, int n, boolean allIndexIterators, Object arg)
Description copied from class:AbstractIntervalDocumentIterator
Creates an interval iterator suitable for thisAbstractIntervalDocumentIterator
.- Specified by:
getIntervalIterator
in classAbstractIntervalDocumentIterator
- Parameters:
unused
- the reference index for the iterator, ornull
.n
- the number of underlying or component iterators.allIndexIterators
- whether all underlying or component iterators are index iterators.arg
- an optional argument.- Returns:
- an interval iterator suitable for this
AbstractIntervalDocumentIterator
.
-
-