Package it.unimi.di.big.mg4j.tool
Line-command tools for index construction.
The classes in this package contain a main method, and can be used to build indices starting from a document sequence. Please have a look at the MG4J manual to learn how to build an index.
-
Interface Summary Interface Description Scan.VirtualDocumentFragment An interface that describes a virtual document fragment.VirtualDocumentResolver A resolver for virtual documents. -
Class Summary Class Description Combine Combines several indices.Combine.GammaCodedIntIterator A partialIntIterator
implementation based on γ-coded integers.ComputeNumBitsPositions Computes the number of bits used by ahigh-performance index
for positions.Concatenate Concatenates several indices.DumpVirtualDocumentFragments Scans a document sequence and prints on standard output virtual document fragments as a document specifier (usually, a URL) TAB-separated from the associated text.FilterOutWikipediaDuplicates Reads a Wikipedia XML dump and outputs the same dump after eliminating duplicate pages.IndexBuilder An index builder.Merge Merges several indices.PartitionDocumentally Partitions an index documentally.PartitionLexically Partitions an index lexically.PartitionLexically.LongWordInputBitStream Paste Pastes several indices.Scan Scans a document sequence, dividing it in batches of occurrences and writing for each batch a corresponding subindex.Scan.PayloadAccumulator An accumulator for payloads.ScanMetadata Scans a document sequence and prints on standard output the corresponding URIs.URLMPHVirtualDocumentResolver A virtual-document resolver based on document URIs. -
Enum Summary Enum Description Combine.IndexType Scan.Completeness Scan.IndexingType