Class TermCollectionVisitor
- java.lang.Object
-
- it.unimi.di.big.mg4j.search.visitor.AbstractDocumentIteratorVisitor
-
- it.unimi.di.big.mg4j.search.visitor.TermCollectionVisitor
-
- All Implemented Interfaces:
DocumentIteratorVisitor<Boolean>
public class TermCollectionVisitor extends AbstractDocumentIteratorVisitor
A visitor collecting information about terms appearing in aDocumentIterator
.The purpose of this visitor is that of exploring before iteration the structure of a
DocumentIterator
to count how many terms are actually used, and set up some appearing in all leaves of nonzero frequency (the latter condition is used to skip empty iterators), possibly considering just a subset of indices. For this visitor to work, all leaves of nonzero frequency must return a non-null
value on a call toIndexIterator.term()
.During the visit, we keep track of which index/term pair have been already seen. Each pair is assigned an distinct offset—a number between zero and the overall number of distinct pairs—which is stored into each index iterator id and is used afterwards to access quickly data about the pair. Note that duplicate index/term pairs get the same offset. The overall number of distinct pairs is returned by
numberOfPairs()
after a visit.The indices appearing in some valid pair are recorded; they are accessible as a vector returned by
indices()
, and the map from positions in this vector to indices is inverted byindexMap()
.If you need to fix the index map, there's a special
prepare(ReferenceSet)
method. In that case only terms associated with indices in the provided set will be collected.Warning: the semantics of
prepare(ReferenceSet)
described above has been implemented in MG4J 4.0. Previously, the effect ofprepare(ReferenceSet)
was just that of adding artificially indices to the index set.The offset assigned to each pair index/term is returned by
offset(Index, String)
. Should you need to know the terms associated with each index, they are returned byterms(Index)
.After a term collection, usually counters are set up by a visit of
CounterSetupVisitor
.
-
-
Constructor Summary
Constructors Constructor Description TermCollectionVisitor()
Creates a new term-collection visitor.
-
Method Summary
Modifier and Type Method Description Reference2IntMap<Index>
indexMap()
Returns a map from indices met during term collection to their position intoindices()
.Index[]
indices()
Returns the indices met during pair collection.int
numberOfPairs()
Returns the number of distinct index/term pair corresponding to nonzero-frequency index iterators in the last visit.int
offset(Index index, String term)
Returns the offset associated with a given pair index/term.TermCollectionVisitor
prepare()
Prepares this term-collection visitor.TermCollectionVisitor
prepare(ReferenceSet<Index> indices)
Prepares this term-collection visitor, possibly specifying the indices that should be collected.Object2IntLinkedOpenHashMap<String>
term2Id()
Returns the a map associating terms appearing in the query with ids.String[]
terms(Index index)
Returns the terms associated with the given index.String
toString()
Boolean
visit(IndexIterator indexIterator)
Visits anIndexIterator
leaf.
-
-
-
Method Detail
-
prepare
public TermCollectionVisitor prepare()
Prepares this term-collection visitor.- Specified by:
prepare
in interfaceDocumentIteratorVisitor<Boolean>
- Overrides:
prepare
in classAbstractDocumentIteratorVisitor
- Returns:
- this term-collection visitor.
-
prepare
public TermCollectionVisitor prepare(ReferenceSet<Index> indices)
Prepares this term-collection visitor, possibly specifying the indices that should be collected.- Parameters:
indices
- the set of indices that will be collected; if empty, the all indices will be collected (e.g., the call is equivalent toprepare()
).- Returns:
- this term-collection visitor.
-
visit
public Boolean visit(IndexIterator indexIterator) throws IOException
Description copied from interface:DocumentIteratorVisitor
Visits anIndexIterator
leaf.- Parameters:
indexIterator
- the leaf to be visited.- Returns:
- an appropriate return value if the visit should continue, or
null
. - Throws:
IOException
-
numberOfPairs
public int numberOfPairs()
Returns the number of distinct index/term pair corresponding to nonzero-frequency index iterators in the last visit.- Returns:
- the number distinct index/term pair corresponding to nonzero-frequency index iterators.
-
indices
public Index[] indices()
Returns the indices met during pair collection.Note that the returned array does not include indices only associated to index iterators of zero frequency, unless
prepare(ReferenceSet)
was called with a nonempty argument.- Returns:
- the indices met during term collection.
-
indexMap
public Reference2IntMap<Index> indexMap()
Returns a map from indices met during term collection to their position intoindices()
.Note that the returned map does not include as keys indices only associated to index iterators of zero frequency, unless
prepare(ReferenceSet)
was called with a nonempty argument.- Returns:
- a map from indices met during term collection to their position
into
indices()
.
-
terms
public String[] terms(Index index)
Returns the terms associated with the given index.- Parameters:
index
- an index.- Returns:
- the terms associated with
index
, in the same order in which they appeared during the visit, skipping duplicates, if some nonzero-frequency iterator based onindex
was found;null
otherwise.
-
term2Id
public Object2IntLinkedOpenHashMap<String> term2Id()
Returns the a map associating terms appearing in the query with ids.- Returns:
- a map from terms appearing in the query (in indices with counts) to ids.
-
offset
public int offset(Index index, String term)
Returns the offset associated with a given pair index/term.- Parameters:
index
- an index appearing inindices()
.term
- a term appearing in the array returned byterms(Index)
with argumentindex
.- Returns:
- the offset associated with the pair
index
/term
.
-
-