Class Index

  • All Implemented Interfaces:
    Serializable
    Direct Known Subclasses:
    BitStreamIndex, IndexCluster, QuasiSuccinctIndex

    public abstract class Index
    extends Object
    implements Serializable
    An abstract representation of an index.

    Concrete subclasses of this class represent abstract index access information: for instance, the basename or IP address/port, flags, etc. It allows to build easily index readers over the index: in turn, index readers provide document iterators.

    This class contains just methods declarations, and attributes for all data that is common to any form of index. Note that we use an abstract class, rather than an interface, because interfaces do not allow to declare attributes.

    We provide static factory methods (e.g., getInstance(CharSequence)) that return an index given a suitable URI string. If the scheme part is mg4j, then the URI is assumed to point at a remote index. Otherwise, it is assumed to be the basename of a local index. In both cases, a query part introduced by ? can specify additional parameters (key=value pairs separated by ;). For instance, the URI example?inmemory=1 will load the index with basename example, caching its content in core memory. Please have a look at constants in Index.UriKeys (and analogous enums in subclasses) for additional parameters.

    If the index is local, by convention this class will locate a property file with extension DiskBasedIndex.PROPERTIES_EXTENSION that is expected to contain a number of key/value pairs (which are quite informative and can be examined manually). In particular, the key Index.PropertyKeys.INDEXCLASS explain which kind of index class should be used to read the index. The file might contain additional keys depending on the value of Index.PropertyKeys.INDEXCLASS (e.g., QuasiSuccinctIndex.PropertyKeys.BYTEORDER). An index usually exposes term or prefix maps and the size list but this is not compulsory (the latter, in particular, is necessary with certain codings).

    Thread safety

    Indices are a natural candidate for multithreaded access. An instance of this class must be thread safe as long as external data structures provided to its constructors are. For instance, the tool IndexBuilder generates a synchronized ImmutableExternalPrefixMap so that by default the resulting index is thread safe.

    For instance, a DiskBasedIndex requires a list of term offsets, term maps, etc. As long as all these data structures are thread safe, the same is true of the index. Data structures created by static factory methods such as DiskBasedIndex.getInstance(CharSequence) are thread safe.

    Note that IndexReaders returned by getReader() are not thread safe (even if the method getReader() is). The logic behind this arrangement is that you create as many reader as you need, and then Closeable.close() them. In a multithreaded environment, a pool of index readers can be created, and a custom QueryBuilderVisitor can be used to build DocumentIterators using the given pool of readers. In this case readers are not closed, but rather reused.

    Read-once load

    Implementations of this class are strongly encouraged to offer read-once constructors and factory methods: property files and other data related to the index (but not to an IndexReader should be read exactly once, and sequentially. This feature is very useful when combining indices.

    Since:
    0.9
    Author:
    Paolo Boldi, Sebastiano Vigna
    See Also:
    Serialized Form