Setup Time

Once the index has been created, there are many ways in which you can improve query resolution time. First of all, an index can be read from disk, memory-mapped, or directly loaded into main memory. These three solutions work with increasing speed and increased main memory usage. The default is to read an index from disk, but you can add suitable options to the index URI (e.g., mapped=1 or inmemory=1―see the Index.UriKeys documentation) to force your preferences. Analogously, offsets are necessary to locate, inside the index file, the posting list of a certain term. By default they are read from disk using a SemiExternalOffsetList, but you can load them in memory if you prefer so. If you load sizes (e.g., because you want to run a scorer that needs them) there is a suitable URI option (e.g., succinctsizes=1) that will load sizes in a highly compact format. This is particularly useful when pasting large indices.

To get more options, you can partition your index. Once you have a cluster formed by several sub-indices, you can decide which sub-indices go to memory, which will be mapped, and so on.

An important source of delay in loading the index is the expansion of the dump file of an ImmutableExternalPrefixMap, which is the default term map generated by IndexBuilder. The dump file must be copied from the serialized representation to a temporary directory, and for large collections the process can be very slow. The solution is either to use a different term map (e.g., some kind of signed hash—see the minimal perfect hash classes of Sux4J) to generate (either programmatically or using the main method of ImmutableExternalPrefixMap) a non-self-contained, synchronized instance of ImmutableExternalPrefixMap and save it using the standard suffix for term maps. Such an instance is based on a separate dump file that must be attached to the deserialized instance before usage (see the documentation for details). You can attach the dump stream by invoking

((ImmutableExternalPrefixMap)index.termMap).setDumpStream( filename );

with the appropriate argument.