Class URLMPHVirtualDocumentResolver

  • All Implemented Interfaces:
    VirtualDocumentResolver, Serializable

    public class URLMPHVirtualDocumentResolver
    extends Object
    implements VirtualDocumentResolver
    A virtual-document resolver based on document URIs.

    Instances of this class store in a StringMap instances all URIs from a collection, and consider a virtual-document specification a (possibly relative) URI. The virtual-document specification is resolved against the document URI, and then the perfect hash is used to retrieve the corresponding document.

    This class provides a main method that helps in building serialised resolvers from URI lists. In case of pathological document collections with duplicate URIs (most notably, the GOV2 collection used for TREC evaluations), an option makes it possible to add random noise to duplicates, so that minimal perfect hash construction does not go into an infinite loop. It is a rather crude solution, but it is nonsensical to have duplicate URIs in the first place. Additional option include the kind of minimal perfect hash function you want to use (e.g., out of it.unimi.dsi.sux4j) and the number of bits used to sign them.

    Warning: up to version 5.2.1, this class was applying URI.normalize() in context(Document) and resolve(CharSequence) methods. This does not happen any longer, as it was breaking URLs such as http://en.wikipedia.org/wiki//dev/null.

    See Also:
    Serialized Form