|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
---|---|
Query | A node of a composite representing a query. |
QueryBuilderVisitor<T> | A visitor for a composite query. |
QueryTransformer | A strategy for transforming queries. |
Class Summary | |
---|---|
AbstractQueryBuilderVisitor<T> | A QueryBuilderVisitor that
returns true on all visitPre() methods and does nothing
on AbstractQueryBuilderVisitor.prepare() . |
AbstractTermExpander | A query transformer that just requires implementing a method that expands terms (e.g., into disjunctive queries). |
Align | A node representing the alignment of the two iterators. |
And | A node representing the logical and of the underlying queries. |
CheckForSelectQueryVisitor | A QueryBuilderVisitor that
returns Boolean.FALSE only if the visited query contains a Select node
that does not lie in the aligner of an Align query (as in that case
the index is not part of the answer). |
Composite | A abstract composite node containing an array of component queries. |
Consecutive | A node representing the consecutive composition of the underlying queries. |
Difference | A node representing a difference of two queries. |
False | A node representing falseness (i.e., no documents are returned). |
LowPass | A node representing a low-pass filtering of the only underlying query. |
MultiIndexTermExpander | A term expander that replaces every term or prefix with a disjunction of queries; each query is made by the same term or prefix preceded by a selection over a different index. |
MultiTerm | A node representing a virtual term obtained by merging the occurrences of the given (possibly weighted) terms. |
Not | A node representing the logical not of the underlying query. |
Or | A node representing the logical or of the underlying queries. |
OrderedAnd | A node representing the logical ordered and of the underlying queries. |
Prefix | A node representing a set of terms defined by a common prefix. |
Queries | Static methods and objects related to queries. |
Range | A node representing a range query on a payload-only index. |
Remap | A node representing an index remapping. |
Select | A node representing an index selection. |
Term | A node representing a single term. |
True | A node representing truth (i.e., all documents are returned with interval iterator IntervalIterators.TRUE ). |
Weight | A node representing a weight selection. |
Exception Summary | |
---|---|
QueryBuilderVisitorException | A wrapper for unchecked exceptions thrown during a visit. |
This package contains the classes that represent queries as an abstract syntax tree, or, in design-pattern jargon, as a composite.
Warning: Before reading this overview, the reader is encouraged to consult the first part
of the it.unimi.di.mg4j.search
package
documentation, where the notion of query is briefly presented.
The basic idea is to perform a complete decoupling of the query construction and resolution into three levels:
it.unimi.di.mg4j.search
package.This package overview shows how to pass from the string to the tree level using the built-in
QueryParser
, and explains what is needed to implement the tree level, as well as the basic instruments
needed to convert the tree level into the iterator level.
In MG4J, a query parser is an implementation of QueryParser
:
essentially, an object with a method that transforms strings in queries.
You can use any parser you like, or build your queries programmatically.
The parser provided with MG4J and whose syntax is
described in the manual is currently generated using JavaCC: unfortunately, this tool
produces some public classes that somehow clutter the package
documentation. All the parsing logic is contained in the
SimpleParser
class, which is generated by the
JavaCC source SimpleParser.jj.
MG4J allows one to perform queries on multiple indices; by saying so, we mean that you may have many indices concerning the same document collection, and you want to perform a query that may search on some of them. Think, for example, of a collection of emails. You might have generated a number of indices, to index their subjects, sender, recipient(s), body etc. All these indices can be thought of as individual entities, their only relation being that they index collections with the same number of documents, and that same-numbered documents conceptually "come" from the same source.
To get a query tree from a string representation, you must first build a QueryParser
object.
For instance, if you use the built-in SimpleParser
you have
to provide a map whose keys are index names (called index aliases) and
with each index alias mapped to the corresponding Index
.
Moreover, one of the aliases is taken to be the default index alias used for the queries.
In our example, we will assume that we have three index aliases (say, subject, from and body), and that subject is the default index alias. The parser would then be created as follows (here, we are stipulating that index with alias x has basename mail-x):
Object2ReferenceMap<String,Index> indexAlias2Index = new Object2ReferenceOpenHashMap<String,Index>(); Index defaultIndex; indexAlias2Index.put( "subject", defaultIndex = Index.getInstance( "mail-subject" ) ); indexAlias2Index.put( "from", Index.getInstance( "mail-from" ) ); indexAlias2Index.put( "body", Index.getInstance( "mail-body" ) ); QueryParser parser = new QueryParser( indexAlias2Index.keySet(), "subject" );after which you can use it for example as follows:
Query query = parser.parse( "meeting AND body:(schedule OR urgent)" ); DocumentIteratorBuilderVisitor visitor = new DocumentIteratorBuilderVisitor( indexAlias2Index, defaultIndex, 0 ); DocumentIterator it = query.accept( visitor ); while ( it.hasNext() ) { System.out.println( "Document #: " + it.nextDocument() ); System.out.print( "\tPositions:" ); for ( Interval interval: it ) System.out.print( " " + interval ); System.out.println(); }
Query
As explained above, in this package a Query
is simply an abstract tree: its leaves will correspond to ground queries, whereas
internal nodes correspond to query operators.
Ground queries can be term queries, prefix queries,
multiterm queries (the latter usually generated by some query-expansion mechanism),
True
and False
.
Any other query is either a composite query (e.g.,
a conjunctive query, a disjunctive query etc.),
that is, a query composed by other subqueries, or it is obtained by some other operator (e.g.,
a low-pass query etc.).
The Weight class allows one to assign a weight to a query (it wraps a query and assigns a weight with it; queries that are not assigned a weight explicitly have a default weight of 1). A special treatment is reserved to weighted terms (that is, to nodes that are made by applying a Weight to a Term: they are considered essentially as variants of terms, and so they can be used to build a multiterm query.
Every query has a method that accepts
a QueryBuilderVisitor
object, through which you can visit the query tree.
More precisely, when the accept method is invoked on a query, a recursive visit of the query tree is performed: at every
node n
of type Q
(Q
being any class implementing Query
),
the following steps are performed:
visitPre(Q n)
method is called; if the method returns false, the visit is
interrupted; otherwise…T[] x
; if at a certain
point during this visit some call decides that the visit should be interrupted, we stop; otherwise…visitPost(Q n, T[] x)
method is called, and the result (of type T
)
is returned.To make the idea of a visitor easier to understand, consider the following simple example of a visitor:
static class PrinterVisitor extends AbstractQueryBuilderVisitor<String> { private static void appendArray( MutableString res, String t[], char sep ) { for ( int i = 0; i < t.length - 1; i++ ) res.append( t[ i ] + sep ); res.append( t[ t.length - 1 ] ); } public String[] newArray( int len ) { return new String[ len ]; } public String visitPost( OrderedAnd node, String[] t ) { MutableString res = new MutableString(); res.append( "OAND(" ); appendArray( res, t, ',' ); res.append( ")" ); return res.toString(); } public String visitPost( Or node, String[] t ) { MutableString res = new MutableString(); res.append( "OR(" ); appendArray( res, t, ',' ); res.append( ")" ); return res.toString(); } public String visitPost( And node, String[] t ) { MutableString res = new MutableString(); res.append( "AND(" ); appendArray( res, t, ',' ); res.append( ")" ); return res.toString(); } public String visitPost( Consecutive node, String[] t ) { MutableString res = new MutableString(); res.append( "\"" ); appendArray( res, t, ' ' ); res.append( "\"" ); return res.toString(); } public String visitPost( Not node, String t ) { return "NOT(" + t + ")"; } public String visitPost( LowPass node, String t ) { return t + "~" + node.k; } public String visitPost( Select node, String t ) { return node.index + ":" + t; } public String visitPost( Weight node, String t ) { return t + "{" + node.weight + "}"; } public String visit( Term node ) { return node.term.toString(); } public String visit( Prefix node ) { return node.prefix + "*"; } public String visit( MultiTerm node ) { return node.term.toString(); } }
Now, you can pass an instance of this visitor to a query, and as a result get a string (linear) representation
of the query itself. (Of course, there is a simple way to get the same result, that is, calling the toString()
method directly, but for this example is easy enough for illustrative purposes.)
Here is an example of how the above visitor can be used:
public static void main( String arg[] ) throws Exception { Query qa = new Term( "a" ); Query qb = new Prefix( "b" ); Query qc = new Term( "c" ); Query qd = new Term( "d" ); Query q = new LowPass( new And( qa, new Select( "another_index", qb ), new Or( qc, qd ) ), 20 ); System.out.println( q.accept( new PrinterVisitor() ) ); }that produces AND(a,another_index:b*,OR(c,d))~20.
The interface Query
and its implementations provide the basic classes
for the tree-level representation of a query. Building an iterator out of it can be thought of as a process of recursive
instantiation of a query tree into an iterator tree; this is actually a sort of a copy visit, and it is indeed implemented
as a special kind of visitor: the DocumentIteratorBuilderVisitor
.
You can create one such visitor and use it to visit a query tree, obtaining as a result a document iterator
that you can use to get the results.
To be more precise, let us recall the example presented in the last part of the overview of the search package:
Index subjectIndex = Index.getInstance( "mail-subject" ); DocumentIterator it = AndDocumentIterator.getInstance( subjectIndex.documents( "meeting" ), subjectIndex.documents( "schedule" ), subjectIndex.documents( "monday" ) ); while ( it.hasNext() ) { System.out.println( "Document #: " + it.nextDocument() ); System.out.print( "\tPositions:" ); for ( Interval interval: it ) System.out.print( " " + interval ); System.out.println(); }
Here is an equivalent piece of code that uses the idea of decoupling the tree level from the iterator level.
Index subjectIndex = Index.getInstance( "mail-subject" ); Query q = new And( new Term( "meeting" ), new Term( "schedule" ), new Term( "monday" ) ); DocumentIteratorBuilderVisitor visitor = new DocumentIteratorBuilderVisitor( null, subjectIndex, 0 ); DocumentIterator it = q.accept( visitor ); while ( it.hasNext() ) { System.out.println( "Document #: " + it.nextDocument() ); System.out.print( "\tPositions:" ); for ( Interval interval: it ) System.out.print( " " + interval ); System.out.println(); }
|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |