Package it.unimi.di.mg4j.search.visitor

Visitors for composite document iterators.

See:
          Description

Interface Summary
DocumentIteratorVisitor<T> A visitor for the tree defined by a DocumentIterator.
 

Class Summary
AbstractDocumentIteratorVisitor An abstract implementation of a DocumentIteratorVisitor without return values.
CounterCollectionVisitor A visitor collecting the counts of terms in a DocumentIterator tree.
CounterSetupVisitor A visitor using the information collected by a TermCollectionVisitor to set up term frequencies and counters.
TermCollectionVisitor A visitor collecting information about terms appearing in a DocumentIterator.
TrueTermsCollectionVisitor A visitor collecting terms that satisfy a query for the current document.
 

Package it.unimi.di.mg4j.search.visitor Description

Visitors for composite document iterators.

Composites and visitors

A DocumentIterator (in particular, those provided by MG4J in the package it.unimi.di.mg4j.search) is usually structured as a composite, with operators as internal nodes and IndexIterators as leaves. A composite can be explored using a visitor: thus, the DocumentIterator interface provides two methods, accept(DocumentIteratorVisitor) and acceptOnTruePaths(DocumentIteratorVisitor), that let a DocumentIteratorVisitor visit the composite structure.

A DocumentIteratorVisitor provides methods for visiting in preorder and in postorder all internal nodes. Leaves have two visit methods, DocumentIteratorVisitor.visit(it.unimi.di.mg4j.index.IndexIterator) and DocumentIteratorVisitor.visit(it.unimi.di.mg4j.index.MultiTermIndexIterator).

Note that a DocumentIteratorVisitor must be (re)usable after each call to prepare().

The abstract class AbstractDocumentIteratorVisitor provides stubs implementing internal visits and prepare() as no-ops for visitors that do not return values.

Computing true terms

A simple example of a visitor is TrueTermsCollectionVisitor, which just collects all terms that make a query true.

Counting term occurrences

Another example of the utility of visitors for document iterators is given by term counting: using a number of coordinated visitors, it is possible to compute a count for each term appearing in a (no matter how complex) query. The count can be used as an input for counting-based scoring schemes, such as BM25 or cosine-based measures. For more information, please read the documentation of CounterCollectionVisitor.