|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unimi.di.mg4j.search.visitor.AbstractDocumentIteratorVisitor
it.unimi.di.mg4j.search.visitor.CounterCollectionVisitor
public class CounterCollectionVisitor
A visitor collecting the counts of terms in a DocumentIterator
tree.
Note: in fact, the documentation of this class clarifies also the usage of
TermCollectionVisitor
and
CounterSetupVisitor
.
Several scoring schemes, such as BM25 or cosine-based measures, require the counts (number of occurrences in a given document) of terms in the query. Since we do not want restrict the ability of the user to specify sophisticated constraints such as term proximity, order, consecutivity, etc., we prefer not to use a bag-of-words query model, in which the user simply inputs a number of terms (in that case, of course, the definition of the count of each term is trivial). Rather, we provide a group of visitors that make it possible to retrieve counts for each term appearing in the query.
Since MG4J provides multi-index queries, each count is actually associated with a pair index/term (e.g., the count of class in the main text might be different from the count of class in the title). Moreover, we must be careful to define a sensible semantics, as when logical operators alternate there might be occurrences of a term in a query whose count might give misleading information (in particular if the same term appear several times).
Thus, we define a true path on the query tree (which parallels the composite tree of
the associated DocumentIterator
) as a path from the root that
passes only through nodes whose associated subquery evaluates to true (in the Boolean sense). A
counter-collection visitor records in the counter arrays only the counts of index/term pairs appearing at
the end of a true path.
For instance, in a query like a OR (title:b AND c) in a document that contains a and c in the main text, but does not contain b in the title, only the count of a will be taken into consideration. In the same way, for a query whose outmost operation is a negation no counter will ever be written.
Instance of this class are useful only in connection with a
CounterSetupVisitor
(and, in turn, with a
TermCollectionVisitor
). More precisely, there are three
phases:
DocumentIterator
,
prepare
a TermCollectionVisitor
and perform a
visit
to gather term information and possibly cache some data about the
terms appearing in the iterator;
CounterSetupVisitor
based on the previous
TermCollectionVisitor
, and perform a visit to
read frequencies and prepare counters;
nextDocument()
,
clear the counters,
perform a
visit along
true paths using an instance of this class and inspect the data gathered in the
CounterSetupVisitor (see, for example, the source code of
CountScorer
).
Note that all visitors are reusable: just
prepare()
them before usage, but
be careful as a CounterSetupVisitor
must be prepared and visited
after the associated TermCollectionVisitor
has been prepared and visited. The prepare()
method of this class is a no-op, so it is not necessary to call it.
Constructor Summary | |
---|---|
CounterCollectionVisitor(CounterSetupVisitor counterSetupVisitor)
Creates a new counter-collection visitor based on a given counter-setup visitor. |
Method Summary | |
---|---|
String |
toString()
|
Boolean |
visit(IndexIterator indexIterator)
Visits an IndexIterator leaf. |
Methods inherited from class it.unimi.di.mg4j.search.visitor.AbstractDocumentIteratorVisitor |
---|
newArray, prepare, visit, visit, visit, visitPost, visitPre |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public CounterCollectionVisitor(CounterSetupVisitor counterSetupVisitor)
counterSetupVisitor
- a counter-setup visitor.Method Detail |
---|
public Boolean visit(IndexIterator indexIterator) throws IOException
DocumentIteratorVisitor
IndexIterator
leaf.
indexIterator
- the leaf to be visited.
null
.
IOException
public String toString()
toString
in class Object
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |