Chapter 2. Behind the scenes: The indexing process

Chapter 2. Behind the scenes: The indexing process
Prev		Next

Table of Contents

Introduction

Preamble: terms, dictionaries and term-related maps

Scan: Building batches

Time/space requirements

Combining batches

Splitting indices

Virtual fields in MG4J

Virtual fields and virtual fragments
Document resolvers
What is a document resolver actually doing: virtual texts and gaps

Payload-based indices

Introduction

The main point of MG4J is the construction of inverted indices: an inverted index is just like the index you can find at the end of a book is a list of the occurrences in the text of every term. Building an inverted index is a complex process that MG4J perform essentially in two phases. Furthermore, there is another step that is called term map construction that is optional, depending on the kind of functionalities you require of your index.

Besides traditional indices, MG4J provides payload-based indices, which are used to store metadata associated to documents such as dates, integers, and so on.

In this chapter we will try to dissect the whole process to give you an idea of what happens when you run the Index class.

Prev		Next
Querying MG4J	Home	Preamble: terms, dictionaries and term-related maps