|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unimi.di.mg4j.tool.IndexBuilder
public class IndexBuilder
An index builder.
An instance of this class exposes a run()
method
that will index the DocumentSequence
provided at construction time
by calling Scan
and Combine
in sequence.
Additionally, a main method provides easy access to index construction.
All indexing parameters are available
either as chainable setters that can be called optionally before invoking run()
, or
as public mutable collections and maps. For instance,
new IndexBuilder( "foo", sequence ).skips( true ).run();will build an index with basename foo using skips. If instead we want to index just the first field of the sequence, and use a
ShiftAddXorSignedStringMap
as a term map, we can use the following code:
new IndexBuilder( "foo", sequence ) .termMapClass( ShiftAddXorSignedMinimalPerfectHash.class ) .indexedFields( 0 ).run();
More sophisticated modifications can be applied using public maps:
IndexBuilder indexBuilder = new IndexBuilder( "foo", sequence ); indexBuilder.virtualDocumentGaps.put( 0, 30 ); indexBuilder.virtualDocumentResolver.put( 0, someVirtualDocumentResolver ); indexBuilder.run();
Field Summary | |
---|---|
IntSortedSet |
indexedFields
The set of indexed fields (expressed as field indices). |
Int2IntMap |
virtualDocumentGaps
A map from field indices to virtual gaps. |
Int2ObjectMap<VirtualDocumentResolver> |
virtualDocumentResolvers
A map from field indices to a corresponding VirtualDocumentResolver . |
Constructor Summary | |
---|---|
IndexBuilder(String basename,
DocumentSequence documentSequence)
Creates a new index builder with default parameters. |
Method Summary | |
---|---|
IndexBuilder |
batchDirName(String batchDirName)
Sets the temporary directory for batches (default: the directory containing the basename). |
IndexBuilder |
bufferSize(int bufferSize)
Sets both the scan buffer size and the combine buffer size. |
IndexBuilder |
builder(DocumentCollectionBuilder builder)
Sets the document collection builder (default: null ). |
IndexBuilder |
combineBufferSize(int bufferSize)
Sets the Combine buffer size (default: Combine.DEFAULT_BUFFER_SIZE ). |
IndexBuilder |
documentsPerBatch(int documentsPerBatch)
Sets the number of documents per batch (default: Scan.DEFAULT_BATCH_SIZE ). |
IndexBuilder |
height(int height)
Sets the skip height (default: BitStreamIndex.DEFAULT_HEIGHT ). |
IndexBuilder |
indexedFields(int... field)
Sets the indexed fields to those provided (default: all fields, but see indexedFields ). |
IndexBuilder |
indexType(Combine.IndexType indexType)
Sets the type of the index to be built (default: Combine.IndexType.QUASI_SUCCINCT ). |
IndexBuilder |
interleaved(boolean interleaved)
Sets the interleaved flag (default: false). |
IndexBuilder |
ioFactory(IOFactory ioFactory)
Sets the I/O factory (default: IOFactory.FILESYSTEM_FACTORY ). |
IndexBuilder |
keepBatches(boolean keepBatches)
Sets the “keep batches” flag (default: false). |
IndexBuilder |
logInterval(long logInterval)
Sets the logging time interval (default: ProgressLogger.DEFAULT_LOG_INTERVAL ). |
static void |
main(String[] arg)
|
IndexBuilder |
mapFile(String mapFile)
Sets the name of a file containing a map on the document indices (default: null ). |
IndexBuilder |
maxTerms(int maxTerms)
Sets the maximum number of overall (i.e., cross-field) terms per batch (default: Scan.DEFAULT_BATCH_SIZE ). |
IndexBuilder |
pasteBufferSize(int bufferSize)
Sets the size in byte of the internal buffer using when pasting indices (default: Paste.DEFAULT_MEMORY_BUFFER_SIZE ). |
IndexBuilder |
payloadWriterFlags(Map<CompressionFlags.Component,CompressionFlags.Coding> payloadWriterFlags)
Sets the writer compression flags for payload-based indices (default: CompressionFlags.DEFAULT_PAYLOAD_INDEX ). |
IndexBuilder |
quantum(int quantum)
Sets the skip quantum (default: BitStreamIndex.DEFAULT_QUANTUM ). |
IndexBuilder |
quasiSuccinctWriterFlags(Map<CompressionFlags.Component,CompressionFlags.Coding> quasiSuccinctWriterFlags)
Sets the writer compression flags for standard indices (default: CompressionFlags.DEFAULT_QUASI_SUCCINCT_INDEX ). |
void |
run()
Builds the index. |
IndexBuilder |
scanBufferSize(int bufferSize)
Sets the Scan buffer size (default: Scan.DEFAULT_BUFFER_SIZE ). |
IndexBuilder |
skipBufferSize(int bufferSize)
Sets the size in byte of the internal buffer using during the construction of a index with skips (default: SkipBitStreamIndexWriter.DEFAULT_TEMP_BUFFER_SIZE ). |
IndexBuilder |
skips(boolean skips)
Sets the skip flag (default: true). |
IndexBuilder |
standardWriterFlags(Map<CompressionFlags.Component,CompressionFlags.Coding> standardWriterFlags)
Sets the writer compression flags for standard indices (default: CompressionFlags.DEFAULT_STANDARD_INDEX ). |
IndexBuilder |
termMapClass(Class<? extends StringMap<? extends CharSequence>> termMapClass)
Sets the class used to build the index term map (default: ImmutableExternalPrefixMap ). |
IndexBuilder |
termProcessor(TermProcessor termProcessor)
Sets the term processor (default: DowncaseTermProcessor ). |
IndexBuilder |
virtualDocumentResolver(int field,
VirtualDocumentResolver virtualDocumentResolver)
Adds a virtual document resolver to virtualDocumentResolvers . |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public IntSortedSet indexedFields
DocumentFactory.FieldType.VIRTUAL
will be indexed only
if they have a corresponding VirtualDocumentResolver
.
An alternative, chained access to this map is provided by the method indexedFields(int[])
After calling run()
, this map will contain the set of fields actually indexed.
public Int2ObjectMap<VirtualDocumentResolver> virtualDocumentResolvers
VirtualDocumentResolver
.
public Int2IntMap virtualDocumentGaps
DocumentFactory.FieldType.VIRTUAL
are meaningful,
and the default return value is set fo Scan.DEFAULT_VIRTUAL_DOCUMENT_GAP
. You
can either add entries, or change the default return value.
Constructor Detail |
---|
public IndexBuilder(String basename, DocumentSequence documentSequence)
Note, in particular, that the resulting index will be a BitStreamHPIndex
(unless you require payloads, in which case it will be a BitStreamIndex
with skips),
and that all terms will be downcased. You can set
more finely the type of index using interleaved(boolean)
and skips(boolean)
.
basename
- the basename from which all files will be stemmed.documentSequence
- the document sequence to be indexed.Method Detail |
---|
public IndexBuilder ioFactory(IOFactory ioFactory)
IOFactory.FILESYSTEM_FACTORY
).
ioFactory
- the I/O factory.
public IndexBuilder termProcessor(TermProcessor termProcessor)
DowncaseTermProcessor
).
termProcessor
- the term processor.
public IndexBuilder builder(DocumentCollectionBuilder builder)
null
).
builder
- a document-collection builder class that will be used to build a collection during the indexing phase.
public IndexBuilder indexedFields(int... field)
indexedFields
).
This is a utility method that provides a way to set indexedFields
in a chainable way.
field
- a list of fields to be indexed, that will replace the current values in indexedFields
.
indexedFields
public IndexBuilder virtualDocumentResolver(int field, VirtualDocumentResolver virtualDocumentResolver)
virtualDocumentResolvers
.
This is a utility method that provides a way to put an element into virtualDocumentResolvers
in a chainable way.
field
- a field index.virtualDocumentResolver
- a virtual document resolver.
virtualDocumentResolvers
public IndexBuilder scanBufferSize(int bufferSize)
Scan
buffer size (default: Scan.DEFAULT_BUFFER_SIZE
).
bufferSize
- a buffer size for Scan
.
public IndexBuilder combineBufferSize(int bufferSize)
Combine
buffer size (default: Combine.DEFAULT_BUFFER_SIZE
).
bufferSize
- a buffer size for Combine
.
public IndexBuilder bufferSize(int bufferSize)
bufferSize
- a buffer size.
public IndexBuilder skipBufferSize(int bufferSize)
SkipBitStreamIndexWriter.DEFAULT_TEMP_BUFFER_SIZE
).
bufferSize
- a buffer size for SkipBitStreamIndexWriter
.
public IndexBuilder pasteBufferSize(int bufferSize)
Paste.DEFAULT_MEMORY_BUFFER_SIZE
).
bufferSize
- a buffer size for Paste
.
public IndexBuilder documentsPerBatch(int documentsPerBatch)
Scan.DEFAULT_BATCH_SIZE
).
documentsPerBatch
- the number of documents Scan
will attempt to add to each batch.
public IndexBuilder maxTerms(int maxTerms)
Scan.DEFAULT_BATCH_SIZE
).
maxTerms
- the maximum number of overall (i.e., cross-field) terms Scan
will attempt to add to each batch.
public IndexBuilder keepBatches(boolean keepBatches)
keepBatches
- the new value for the “keep batches” flag.
public IndexBuilder standardWriterFlags(Map<CompressionFlags.Component,CompressionFlags.Coding> standardWriterFlags)
CompressionFlags.DEFAULT_STANDARD_INDEX
).
standardWriterFlags
- the flags for standard indices.
public IndexBuilder quasiSuccinctWriterFlags(Map<CompressionFlags.Component,CompressionFlags.Coding> quasiSuccinctWriterFlags)
CompressionFlags.DEFAULT_QUASI_SUCCINCT_INDEX
).
quasiSuccinctWriterFlags
- the flags for quasi-succinct indices.
public IndexBuilder payloadWriterFlags(Map<CompressionFlags.Component,CompressionFlags.Coding> payloadWriterFlags)
CompressionFlags.DEFAULT_PAYLOAD_INDEX
).
payloadWriterFlags
- the flags for payload-based indices.
public IndexBuilder skips(boolean skips)
skips
- the new value for the skip flag.
public IndexBuilder interleaved(boolean interleaved)
interleaved
- the new value for the interleaved flag.
public IndexBuilder indexType(Combine.IndexType indexType)
Combine.IndexType.QUASI_SUCCINCT
).
indexType
- the desired index type.
public IndexBuilder quantum(int quantum)
BitStreamIndex.DEFAULT_QUANTUM
).
quantum
- the skip quantum.
public IndexBuilder height(int height)
BitStreamIndex.DEFAULT_HEIGHT
).
height
- the skip height.
public IndexBuilder mapFile(String mapFile)
null
).
The provided file must containing integers in DataOutput
format. They must by as
many as the number of documents in the collection provided at construction time, and the
resulting function must be injective (i.e., there must be no duplicates).
mapFile
- a file representing a document map (or null
for no mapping).
public IndexBuilder logInterval(long logInterval)
ProgressLogger.DEFAULT_LOG_INTERVAL
).
logInterval
- the logging time interval.
public IndexBuilder batchDirName(String batchDirName)
batchDirName
- the name of the temporary directory for batches, or null
for the directory containing the basename.
public IndexBuilder termMapClass(Class<? extends StringMap<? extends CharSequence>> termMapClass)
ImmutableExternalPrefixMap
).
The only requirement for termMapClass
(besides, of course, implementing StringMap
)
is that of having a public constructor accepting a single parameter of type Iterable
<CharSequence
>.
termMapClass
- the class used to build the index term map.
public void run() throws ConfigurationException, SecurityException, IOException, URISyntaxException, ClassNotFoundException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
This method simply invokes Scan
and Combine
using the internally stored settings, and
finally builds a StringMap
.
If the provided document sequence can be iterated over several times, this method can be called several times, too, rebuilding each time the index.
ConfigurationException
SecurityException
IOException
URISyntaxException
ClassNotFoundException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
public static void main(String[] arg) throws com.martiansoftware.jsap.JSAPException, InvocationTargetException, NoSuchMethodException, IllegalAccessException, ConfigurationException, ClassNotFoundException, IOException, InstantiationException, URISyntaxException, SecurityException, IllegalArgumentException
com.martiansoftware.jsap.JSAPException
InvocationTargetException
NoSuchMethodException
IllegalAccessException
ConfigurationException
ClassNotFoundException
IOException
InstantiationException
URISyntaxException
SecurityException
IllegalArgumentException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |