|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object it.unimi.di.mg4j.tool.URLMPHVirtualDocumentResolver
public class URLMPHVirtualDocumentResolver
A virtual-document resolver based on document URIs.
Instances of this class store in a StringMap
instances
all URIs from a collection, and consider a virtual-document specification a (possibly relative) URI. The
virtual-document specification is resolved against the document URI, and then the perfect hash is used
to retrieve the corresponding document.
This class provides a main method that helps in building serialised resolvers from URI lists.
In case of pathological document collections with duplicate URIs (most notably, the GOV2 collection
used for TREC evaluations), an option makes it possible to add random noise to duplicates, so that
minimal perfect hash construction does not go into an infinite loop. It is a rather crude solution, but it
is nonsensical to have duplicate URIs in the first place. Additional option include the kind of minimal perfect
hash function you want to use (e.g., out of sux4j
) and the number of bits used to sign them.
Constructor Summary | |
---|---|
URLMPHVirtualDocumentResolver(StringMap<? extends CharSequence> url2DocumentPointer)
|
Method Summary | |
---|---|
void |
context(Document document)
Sets the context document. |
static void |
main(String[] arg)
|
int |
numberOfDocuments()
Returns the number of documents handled by this resolver, if it is known. |
int |
resolve(CharSequence virtualDocumentSpec)
Resolves a virtual document specification. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public URLMPHVirtualDocumentResolver(StringMap<? extends CharSequence> url2DocumentPointer)
Method Detail |
---|
public void context(Document document)
VirtualDocumentResolver
VirtualDocumentResolver.resolve(CharSequence)
will
assume the virtual-document specification was found in document
.
context
in interface VirtualDocumentResolver
document
- the context document.public int resolve(CharSequence virtualDocumentSpec)
VirtualDocumentResolver
Note that the resolution process is carried out in the context of the last document
passed to VirtualDocumentResolver.context(Document)
(e.g., for relative URI resolution). If VirtualDocumentResolver.context(Document)
was never called, the behaviour is undefined.
resolve
in interface VirtualDocumentResolver
virtualDocumentSpec
- the virtual document specification.
virtualDocumentSpec
refers to, or -1 if the specification could not be resolved.public int numberOfDocuments()
VirtualDocumentResolver
VirtualDocumentResolver.resolve(CharSequence)
will always return a number
smaller than the one returned by this method.
numberOfDocuments
in interface VirtualDocumentResolver
public static void main(String[] arg) throws com.martiansoftware.jsap.JSAPException, IOException
com.martiansoftware.jsap.JSAPException
IOException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |