|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object it.unimi.di.mg4j.search.visitor.AbstractDocumentIteratorVisitor it.unimi.di.mg4j.search.visitor.TermCollectionVisitor
public class TermCollectionVisitor
A visitor collecting information about terms appearing
in a DocumentIterator
.
The purpose of this visitor is that of exploring before iteration the structure
of a DocumentIterator
to count how many terms are actually used, and set up some
preliminary access data. More precisely, we count the distinct pairs index/term
appearing in all leaves of nonzero frequency (the latter
condition is used to skip empty iterators), possibly considering
just a subset of indices. For this visitor to work, all leaves
of nonzero frequency must return a non-null
value on
a call to IndexIterator.term()
.
During the visit, we keep track of which index/term pair have been already
seen. Each pair is assigned an distinct offset—a number between
zero and the overall number of distinct pairs—which is stored into
each index iterator id
and is used afterwards to access quickly data about the pair. Note that duplicate index/term pairs
get the same offset. The overall number of distinct pairs is returned
by numberOfPairs()
after a visit.
The indices appearing in some valid pair are recorded; they are accessible as a vector returned
by indices()
, and the map from positions in this vector to indices
is inverted by indexMap()
.
If you need to fix the index map, there's a special prepare(ReferenceSet)
method.
In that case only terms associated with indices in the provided set will be
collected.
Warning: the semantics of prepare(ReferenceSet)
described above has
been implemented in MG4J 4.0. Previously, the effect of prepare(ReferenceSet)
was
just that of adding artificially indices to the index set.
The offset assigned to each pair index/term
is returned by offset(Index, String)
. Should you need to know the terms
associated with each index, they are returned by terms(Index)
.
After a term collection, usually counters are set
up by a visit of CounterSetupVisitor
.
Constructor Summary | |
---|---|
TermCollectionVisitor()
Creates a new term-collection visitor. |
Method Summary | |
---|---|
Reference2IntMap<Index> |
indexMap()
Returns a map from indices met during term collection to their position into indices() . |
Index[] |
indices()
Returns the indices met during pair collection. |
int |
numberOfPairs()
Returns the number of distinct index/term pair corresponding to nonzero-frequency index iterators in the last visit. |
int |
offset(Index index,
String term)
Returns the offset associated with a given pair index/term. |
TermCollectionVisitor |
prepare()
Prepares this term-collection visitor. |
TermCollectionVisitor |
prepare(ReferenceSet<Index> indices)
Prepares this term-collection visitor, possibly specifying the indices that should be collected. |
Object2IntLinkedOpenHashMap<String> |
term2Id()
Returns the a map associating terms appearing in the query with ids. |
String[] |
terms(Index index)
Returns the terms associated with the given index. |
String |
toString()
|
Boolean |
visit(IndexIterator indexIterator)
Visits an IndexIterator leaf. |
Methods inherited from class it.unimi.di.mg4j.search.visitor.AbstractDocumentIteratorVisitor |
---|
newArray, visit, visit, visit, visitPost, visitPre |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public TermCollectionVisitor()
Method Detail |
---|
public TermCollectionVisitor prepare()
prepare
in interface DocumentIteratorVisitor<Boolean>
prepare
in class AbstractDocumentIteratorVisitor
public TermCollectionVisitor prepare(ReferenceSet<Index> indices)
indices
- the set of indices that will be collected; if empty, the all indices will be collected
(e.g., the call is equivalent to prepare()
).
public Boolean visit(IndexIterator indexIterator) throws IOException
DocumentIteratorVisitor
IndexIterator
leaf.
indexIterator
- the leaf to be visited.
null
.
IOException
public int numberOfPairs()
public Index[] indices()
Note that the returned array does not include indices only associated
to index iterators of zero frequency, unless prepare(ReferenceSet)
was
called with a nonempty argument.
public Reference2IntMap<Index> indexMap()
indices()
.
Note that the returned map does not include as keys indices only associated
to index iterators of zero frequency, unless prepare(ReferenceSet)
was
called with a nonempty argument.
indices()
.public String[] terms(Index index)
index
- an index.
index
, in the same order in which
they appeared during the visit, skipping duplicates, if some nonzero-frequency iterator
based on index
was found; null
otherwise.public Object2IntLinkedOpenHashMap<String> term2Id()
public int offset(Index index, String term)
index
- an index appearing in indices()
.term
- a term appearing in the array returned by terms(Index)
with argument index
.
index
/term
.public String toString()
toString
in class Object
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |