Line-command tools for index construction.
The classes in this package contain a main method, and can be used to build indices starting from a document sequence. Please have a look at the MG4J manual to learn how to build an index.
Class Summary Class Description CombineCombines several indices. Combine.GammaCodedIntIteratorA partial
IntIteratorimplementation based on γ-coded integers.
ComputeNumBitsPositionsComputes the number of bits used by a
high-performance indexfor positions.
ConcatenateConcatenates several indices. DumpVirtualDocumentFragmentsScans a document sequence and prints on standard output virtual document fragments as a document specifier (usually, a URL) TAB-separated from the associated text. FilterOutWikipediaDuplicatesReads a Wikipedia XML dump and outputs the same dump after eliminating duplicate pages. IndexBuilderAn index builder. MergeMerges several indices. PartitionDocumentallyPartitions an index documentally. PartitionLexicallyPartitions an index lexically. PartitionLexically.LongWordInputBitStream PastePastes several indices. ScanScans a document sequence, dividing it in batches of occurrences and writing for each batch a corresponding subindex. Scan.PayloadAccumulatorAn accumulator for payloads. ScanMetadataScans a document sequence and prints on standard output the corresponding URIs. URLMPHVirtualDocumentResolverA virtual-document resolver based on document URIs.