MG4J provides a special kind of index, called
payload-based index, that is used to store not text
but rather metadata (dates, integers, etc.) related to a document. It is
the default way of storing non-textual fields. Essentially, a
payload-based index leverages the structure of a text-based index: it
has no counts or positions, but each posting has a
payload—a piece of data related to the document
referred by the posting. In this way, by creating an index with a single
posting list (related to the term
#) we are
effectively storing metadata related to each document. The main
advantage of this approach is that we get almost for free the
sophisticated skipping structure of MG4J's indices, and support for
splitting, combination, and so on.
From the user viewpoint there is no particular difference between standard and payload-based indices, except that the latter do not provide some files that would be nonsensical, such as the file of sizes or the global occurrence count, and that searching a payload-based index is rather different form searching an index (instead of term-based operators and Boolean combinators you just get range queries).