it.unimi.di.mg4j.tool
Class URLMPHVirtualDocumentResolver

java.lang.Object
  extended by it.unimi.di.mg4j.tool.URLMPHVirtualDocumentResolver
All Implemented Interfaces:
VirtualDocumentResolver, Serializable

public class URLMPHVirtualDocumentResolver
extends Object
implements VirtualDocumentResolver

A virtual-document resolver based on document URIs.

Instances of this class store in a StringMap instances all URIs from a collection, and consider a virtual-document specification a (possibly relative) URI. The virtual-document specification is resolved against the document URI, and then the perfect hash is used to retrieve the corresponding document.

This class provides a main method that helps in building serialised resolvers from URI lists. In case of pathological document collections with duplicate URIs (most notably, the GOV2 collection used for TREC evaluations), an option makes it possible to add random noise to duplicates, so that minimal perfect hash construction does not go into an infinite loop. It is a rather crude solution, but it is nonsensical to have duplicate URIs in the first place. Additional option include the kind of minimal perfect hash function you want to use (e.g., out of sux4j) and the number of bits used to sign them.

See Also:
Serialized Form

Constructor Summary
URLMPHVirtualDocumentResolver(StringMap<? extends CharSequence> url2DocumentPointer)
           
 
Method Summary
 void context(Document document)
          Sets the context document.
static void main(String[] arg)
           
 int numberOfDocuments()
          Returns the number of documents handled by this resolver, if it is known.
 int resolve(CharSequence virtualDocumentSpec)
          Resolves a virtual document specification.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

URLMPHVirtualDocumentResolver

public URLMPHVirtualDocumentResolver(StringMap<? extends CharSequence> url2DocumentPointer)
Method Detail

context

public void context(Document document)
Description copied from interface: VirtualDocumentResolver
Sets the context document. All successive calls to VirtualDocumentResolver.resolve(CharSequence) will assume the virtual-document specification was found in document.

Specified by:
context in interface VirtualDocumentResolver
Parameters:
document - the context document.

resolve

public int resolve(CharSequence virtualDocumentSpec)
Description copied from interface: VirtualDocumentResolver
Resolves a virtual document specification.

Note that the resolution process is carried out in the context of the last document passed to VirtualDocumentResolver.context(Document) (e.g., for relative URI resolution). If VirtualDocumentResolver.context(Document) was never called, the behaviour is undefined.

Specified by:
resolve in interface VirtualDocumentResolver
Parameters:
virtualDocumentSpec - the virtual document specification.
Returns:
the document virtualDocumentSpec refers to, or -1 if the specification could not be resolved.

numberOfDocuments

public int numberOfDocuments()
Description copied from interface: VirtualDocumentResolver
Returns the number of documents handled by this resolver, if it is known. A call to VirtualDocumentResolver.resolve(CharSequence) will always return a number smaller than the one returned by this method.

Specified by:
numberOfDocuments in interface VirtualDocumentResolver
Returns:
the number of documents handled by this resolver.

main

public static void main(String[] arg)
                 throws com.martiansoftware.jsap.JSAPException,
                        IOException
Throws:
com.martiansoftware.jsap.JSAPException
IOException