Class DiskBasedIndex
- java.lang.Object
-
- it.unimi.di.big.mg4j.index.DiskBasedIndex
-
public class DiskBasedIndex extends Object
A static container providing facilities to load an index based on data stored on disk.This class contains several useful static methods such as
readOffsets(InputBitStream, long)
,readSizes(CharSequence, long)
,loadLongBigList(CharSequence, ByteOrder)
and static factor methods such asgetInstance(CharSequence, boolean, boolean, boolean, EnumMap)
that take care of reading the properties associated with the index, identify the correctIndex
implementation that should be used to load the index, and load the necessary data into memory.As an option, a disk-based index can be loaded into main memory (key:
Index.UriKeys.INMEMORY
), or mapped into main memory (key:Index.UriKeys.MAPPED
) (the value assigned to the keys is irrelevant).Note that quasi-succinct indices are memory-mapped by default, and for bitstream indices there is a limit of two gigabytes for in-memory indices.
By default the term-offset list is accessed using a
SemiExternalOffsetBigList
with a step ofDEFAULT_OFFSET_STEP
. This behaviour can be changed using the URI keyIndex.UriKeys.OFFSETSTEP
.Disk-based indices are the workhorse of MG4J. All other indices (clustered, remote, etc.) ultimately rely on disk-based indices to provide results.
Note that not all data produced by
Scan
and by the other indexing utilities are actually necessary to run a disk-based index. Usually the property file and the index files are sufficient: if one needs random access, also the offsets file must be present, and if the compression method requires document sizes or if sizes are requested explicitly, also the sizes file must be present. AStringMap
and possibly aPrefixMap
will be fetched automatically bygetInstance(CharSequence, boolean, boolean)
using standard extensions.Thread safety
A disk-based index is thread safe as long as the offset list, the size list and the term/prefix map are. The static factory methods provided by this class load offsets and sizes using data structures that are thread safe. If you use directly a constructor, instead, it is your responsibility to pass thread-safe data structures.
- Since:
- 1.1
- Author:
- Sebastiano Vigna
-
-
Field Summary
Fields Modifier and Type Field Description static int
BUFFER_SIZE
The size of the buffer used byloadLongBigList(ReadableByteChannel, long, ByteOrder)
.static String
COUNTS_EXTENSION
The extension for the counts bitstream.static int
DEFAULT_OFFSET_STEP
The default value for the query parameterIndex.UriKeys.OFFSETSTEP
.static String
FREQUENCIES_EXTENSION
Standard extension for the file of frequencies.static String
INDEX_EXTENSION
Standard extension for the index bitstream.static String
OCCURRENCIES_EXTENSION
Standard extension for the file of global counts.static String
OFFSETS_EXTENSION
Standard extension for the file of offsets.static String
OFFSETS_POSTFIX
The postfix to be added toPOINTERS_EXTENSIONS
,COUNTS_EXTENSION
andPOSITIONS_EXTENSION
for offsets.static String
POINTERS_EXTENSIONS
The extension for the pointers bitstream.static String
POSITIONS_EXTENSION
Standard extension for the positions bitstream of a high-performance index.static String
POSITIONS_NUMBER_OF_BITS_EXTENSION
Standard extension for the file of lengths of positions.static String
PREFIXMAP_EXTENSION
Standard extension for the prefix map.static String
PROPERTIES_EXTENSION
Standard extension for the index properties.static String
SIZES_EXTENSION
Standard extension for the file of sizes.static String
STATS_EXTENSION
Standard extension for the stats file.static String
SUMS_MAX_POSITION_EXTENSION
Standard extension for the file of lengths of positions.static String
TERMMAP_EXTENSION
Standard extension for the term map.static String
TERMS_EXTENSION
Standard extension for the file of terms.static String
UNSORTED_TERMS_EXTENSION
Standard extension for the file of terms, unsorted.
-
Method Summary
Modifier and Type Method Description static ByteOrder
byteOrder(String s)
Parses aByteOrder
value.static Index
getInstance(IOFactory ioFactory, CharSequence basename, Properties properties, boolean randomAccess, boolean documentSizes, boolean maps, EnumMap<Index.UriKeys,String> queryProperties)
Returns a new disk-based index, using preloadedProperties
and possibly guessing reasonable term and prefix maps from the basename.static Index
getInstance(IOFactory ioFactory, CharSequence basename, Properties properties, StringMap<? extends CharSequence> termMap, PrefixMap<? extends CharSequence> prefixMap, boolean randomAccess, boolean documentSizes, EnumMap<Index.UriKeys,String> queryProperties)
Returns a new disk-based index, loading exactly the specified parts and using preloadedProperties
.static Index
getInstance(CharSequence basename)
Returns a new local index, trying to guess reasonable term and prefix maps from the basename, loading offsets but loading document sizes only if it is necessary.static Index
getInstance(CharSequence basename, boolean randomAccess)
Returns a new local index, trying to guess reasonable term and prefix maps from the basename, and loading document sizes only if it is necessary.static Index
getInstance(CharSequence basename, boolean randomAccess, boolean documentSizes)
Returns a new disk-based index, guessing reasonable term and prefix maps from the basename.static Index
getInstance(CharSequence basename, boolean randomAccess, boolean documentSizes, boolean maps)
Returns a new disk-based index, using preloadedProperties
and possibly guessing reasonable term and prefix maps from the basename.static Index
getInstance(CharSequence basename, boolean randomAccess, boolean documentSizes, boolean maps, EnumMap<Index.UriKeys,String> queryProperties)
Returns a new disk-based index, possibly guessing reasonable term and prefix maps from the basename.static Index
getInstance(CharSequence basename, Properties properties, boolean randomAccess, boolean documentSizes, boolean maps, EnumMap<Index.UriKeys,String> queryProperties)
Returns a new disk-based index, using preloadedProperties
and possibly guessing reasonable term and prefix maps from the basename.static Index
getInstance(CharSequence basename, Properties properties, StringMap<? extends CharSequence> termMap, PrefixMap<? extends CharSequence> prefixMap, boolean randomAccess, boolean documentSizes, EnumMap<Index.UriKeys,String> queryProperties)
Returns a new disk-based index, loading exactly the specified parts and using preloadedProperties
and theIOFactory.FILESYSTEM_FACTORY
.static LongBigArrayBigList
loadLongBigList(IOFactory ioFactory, CharSequence filename, ByteOrder byteOrder)
Commodity method for loading a big list of binary longs with specified endianness into a long big array.static LongBigArrayBigList
loadLongBigList(CharSequence filename, ByteOrder byteOrder)
Commodity method for loading a big list of binary longs with specified endianness into a long big array using theIOFactory.FILESYSTEM_FACTORY
.static LongBigArrayBigList
loadLongBigList(ReadableByteChannel channel, long length, ByteOrder byteOrder)
Commodity method for loading from a channel a big list of binary longs with specified endianness into a long big array.static PrefixMap<? extends CharSequence>
loadPrefixMap(IOFactory ioFactory, String filename)
Utility static method that loads a prefix map.static PrefixMap<? extends CharSequence>
loadPrefixMap(String filename)
Utility static method that loads a prefix map using theIOFactory.FILESYSTEM_FACTORY
.static StringMap<? extends CharSequence>
loadStringMap(IOFactory ioFactory, String filename)
Utility static method that loads a term map.static StringMap<? extends CharSequence>
loadStringMap(String filename)
Utility static method that loads a term map using theIOFactory.FILESYSTEM_FACTORY
.static LongBigList
offsets(IOFactory ioFactory, String filename, long numberOfTerms, int offsetStep)
Returns the list of offsets.static LongBigList
offsets(String filename, long numberOfTerms, int offsetStep)
Returns the list of offsets using theIOFactory.FILESYSTEM_FACTORY
.static LongBigList
readOffsets(IOFactory ioFactory, CharSequence filename, long T)
Utility method to load a compressed offset file into a list.static LongBigList
readOffsets(InputBitStream in, long T)
Utility method to load a compressed offset file into a list.static LongBigList
readOffsets(CharSequence filename, long T)
Utility method to load a compressed offset file into a list using theIOFactory.FILESYSTEM_FACTORY
.static IntBigArrayBigList
readSizes(IOFactory ioFactory, CharSequence filename, long n)
Utility method to load a compressed size file into a list.static IntBigArrayBigList
readSizes(CharSequence filename, long N)
Utility method to load a compressed size file into a list using theIOFactory.FILESYSTEM_FACTORY
.static IntBigList
readSizesSuccinct(CharSequence filename, long N)
Deprecated.This method is an ancestral residue.
-
-
-
Field Detail
-
DEFAULT_OFFSET_STEP
public static final int DEFAULT_OFFSET_STEP
The default value for the query parameterIndex.UriKeys.OFFSETSTEP
.- See Also:
- Constant Field Values
-
INDEX_EXTENSION
public static final String INDEX_EXTENSION
Standard extension for the index bitstream.- See Also:
- Constant Field Values
-
POSITIONS_EXTENSION
public static final String POSITIONS_EXTENSION
Standard extension for the positions bitstream of a high-performance index.- See Also:
- Constant Field Values
-
PROPERTIES_EXTENSION
public static final String PROPERTIES_EXTENSION
Standard extension for the index properties.- See Also:
- Constant Field Values
-
SIZES_EXTENSION
public static final String SIZES_EXTENSION
Standard extension for the file of sizes.- See Also:
- Constant Field Values
-
OFFSETS_EXTENSION
public static final String OFFSETS_EXTENSION
Standard extension for the file of offsets.- See Also:
- Constant Field Values
-
POSITIONS_NUMBER_OF_BITS_EXTENSION
public static final String POSITIONS_NUMBER_OF_BITS_EXTENSION
Standard extension for the file of lengths of positions.- See Also:
- Constant Field Values
-
SUMS_MAX_POSITION_EXTENSION
public static final String SUMS_MAX_POSITION_EXTENSION
Standard extension for the file of lengths of positions.- See Also:
- Constant Field Values
-
OCCURRENCIES_EXTENSION
public static final String OCCURRENCIES_EXTENSION
Standard extension for the file of global counts.- See Also:
- Constant Field Values
-
FREQUENCIES_EXTENSION
public static final String FREQUENCIES_EXTENSION
Standard extension for the file of frequencies.- See Also:
- Constant Field Values
-
TERMS_EXTENSION
public static final String TERMS_EXTENSION
Standard extension for the file of terms.- See Also:
- Constant Field Values
-
UNSORTED_TERMS_EXTENSION
public static final String UNSORTED_TERMS_EXTENSION
Standard extension for the file of terms, unsorted.- See Also:
- Constant Field Values
-
TERMMAP_EXTENSION
public static final String TERMMAP_EXTENSION
Standard extension for the term map.- See Also:
- Constant Field Values
-
PREFIXMAP_EXTENSION
public static final String PREFIXMAP_EXTENSION
Standard extension for the prefix map.- See Also:
- Constant Field Values
-
STATS_EXTENSION
public static final String STATS_EXTENSION
Standard extension for the stats file.- See Also:
- Constant Field Values
-
POINTERS_EXTENSIONS
public static final String POINTERS_EXTENSIONS
The extension for the pointers bitstream.- See Also:
- Constant Field Values
-
COUNTS_EXTENSION
public static final String COUNTS_EXTENSION
The extension for the counts bitstream.- See Also:
- Constant Field Values
-
OFFSETS_POSTFIX
public static final String OFFSETS_POSTFIX
The postfix to be added toPOINTERS_EXTENSIONS
,COUNTS_EXTENSION
andPOSITIONS_EXTENSION
for offsets.- See Also:
- Constant Field Values
-
BUFFER_SIZE
public static final int BUFFER_SIZE
The size of the buffer used byloadLongBigList(ReadableByteChannel, long, ByteOrder)
.- See Also:
- Constant Field Values
-
-
Method Detail
-
readOffsets
public static LongBigList readOffsets(InputBitStream in, long T) throws IOException
Utility method to load a compressed offset file into a list.- Parameters:
in
- the input bit stream providing the offsets (seeBitStreamIndexWriter
).T
- the number of terms indexed.- Returns:
- a list of longs backed by an array; the list has
an additional final element of index
T
that gives the number of bytes of the index file. - Throws:
IOException
-
readOffsets
public static LongBigList readOffsets(IOFactory ioFactory, CharSequence filename, long T) throws IOException
Utility method to load a compressed offset file into a list.- Parameters:
ioFactory
- the factory that will be used to perform I/O.filename
- the file containing the offsets (seeBitStreamIndexWriter
).T
- the number of terms indexed.- Returns:
- a list of longs backed by an array; the list has
an additional final element of index
T
that gives the number of bytes of the index file. - Throws:
IOException
-
readOffsets
public static LongBigList readOffsets(CharSequence filename, long T) throws IOException
Utility method to load a compressed offset file into a list using theIOFactory.FILESYSTEM_FACTORY
.- Parameters:
filename
- the file containing the offsets (seeBitStreamIndexWriter
).T
- the number of terms indexed.- Returns:
- a list of longs backed by an array; the list has
an additional final element of index
T
that gives the number of bytes of the index file. - Throws:
IOException
-
readSizes
public static IntBigArrayBigList readSizes(IOFactory ioFactory, CharSequence filename, long n) throws IOException
Utility method to load a compressed size file into a list.- Parameters:
ioFactory
- the factory that will be used to perform I/O.filename
- the file containing the γ-coded sizes (seeBitStreamIndexWriter
).n
- the number of documents.- Returns:
- a list of integers backed by an array.
- Throws:
IOException
-
readSizes
public static IntBigArrayBigList readSizes(CharSequence filename, long N) throws IOException
Utility method to load a compressed size file into a list using theIOFactory.FILESYSTEM_FACTORY
.- Parameters:
filename
- the file containing the γ-coded sizes (seeBitStreamIndexWriter
).N
- the number of documents.- Returns:
- a list of integers backed by an array.
- Throws:
IOException
-
readSizesSuccinct
@Deprecated public static IntBigList readSizesSuccinct(CharSequence filename, long N) throws IOException
Deprecated.This method is an ancestral residue.Utility method to load a compressed size file into an Elias–Fano compressed list.- Parameters:
filename
- the filename containing the γ-coded sizes (seeBitStreamIndexWriter
).N
- the number of documents indexed.- Returns:
- a list of integers backed by an Elias–Fano compressed list.
- Throws:
IllegalStateException
- ifioFactory
is notIOFactory.FILESYSTEM_FACTORY
.IOException
-
loadLongBigList
public static LongBigArrayBigList loadLongBigList(IOFactory ioFactory, CharSequence filename, ByteOrder byteOrder) throws IOException
Commodity method for loading a big list of binary longs with specified endianness into a long big array.- Parameters:
ioFactory
- the factory that will be used to perform I/O.filename
- the file containing the longs.byteOrder
- the endianness of the longs.- Returns:
- a big list of longs containing the longs in
file
. - Throws:
IOException
-
loadLongBigList
public static LongBigArrayBigList loadLongBigList(CharSequence filename, ByteOrder byteOrder) throws IOException
Commodity method for loading a big list of binary longs with specified endianness into a long big array using theIOFactory.FILESYSTEM_FACTORY
.- Parameters:
filename
- the file containing the longs.byteOrder
- the endianness of the longs.- Returns:
- a big list of longs containing the longs in
file
. - Throws:
IOException
-
loadLongBigList
public static LongBigArrayBigList loadLongBigList(ReadableByteChannel channel, long length, ByteOrder byteOrder) throws IOException
Commodity method for loading from a channel a big list of binary longs with specified endianness into a long big array.- Parameters:
channel
- the channel.byteOrder
- the endianness of the longs.- Returns:
- a big list of longs containing the longs returned by
channel
. - Throws:
IOException
-
byteOrder
public static ByteOrder byteOrder(String s)
Parses aByteOrder
value.- Parameters:
s
- a string (either BIG_ENDIAN or LITTLE_ENDIAN).- Returns:
- the corresponding byte order (
ByteOrder.BIG_ENDIAN
orByteOrder.LITTLE_ENDIAN
).
-
loadStringMap
public static StringMap<? extends CharSequence> loadStringMap(IOFactory ioFactory, String filename) throws IOException
Utility static method that loads a term map.- Parameters:
ioFactory
- the factory that will be used to perform I/O.filename
- the name of the file containing the term map.- Returns:
- the map, or
null
if the file did not exist. - Throws:
IOException
- if some IOException (other thanFileNotFoundException
) occurred.
-
loadStringMap
public static StringMap<? extends CharSequence> loadStringMap(String filename) throws IOException
Utility static method that loads a term map using theIOFactory.FILESYSTEM_FACTORY
.- Parameters:
filename
- the name of the file containing the term map.- Returns:
- the map, or
null
if the file did not exist. - Throws:
IOException
- if some IOException (other thanFileNotFoundException
) occurred.
-
loadPrefixMap
public static PrefixMap<? extends CharSequence> loadPrefixMap(IOFactory ioFactory, String filename) throws IOException
Utility static method that loads a prefix map.- Parameters:
ioFactory
- the factory that will be used to perform I/O.filename
- the name of the file containing the prefix map.- Returns:
- the map, or
null
if the file did not exist. - Throws:
IOException
- if some IOException (other thanFileNotFoundException
) occurred.
-
loadPrefixMap
public static PrefixMap<? extends CharSequence> loadPrefixMap(String filename) throws IOException
Utility static method that loads a prefix map using theIOFactory.FILESYSTEM_FACTORY
.- Parameters:
filename
- the name of the file containing the prefix map.- Returns:
- the map, or
null
if the file did not exist. - Throws:
IOException
- if some IOException (other thanFileNotFoundException
) occurred.
-
offsets
public static LongBigList offsets(IOFactory ioFactory, String filename, long numberOfTerms, int offsetStep) throws FileNotFoundException, IOException
Returns the list of offsets.- Parameters:
ioFactory
- the factory that will be used to perform I/O.filename
- the file containing the offsets.numberOfTerms
- the number of terms.offsetStep
- the offset step.- Returns:
- if
offsetStep
is less than zero, a memory-mapped, synchronizedSemiExternalOffsetBigList
with offset step equal to-offsetStep
; if it is zero, an in-memory list; if it is greater than than zero, we return a synchronizedSemiExternalOffsetBigList
with offset step equal to-offsetStep
. - Throws:
FileNotFoundException
IOException
-
offsets
public static LongBigList offsets(String filename, long numberOfTerms, int offsetStep) throws FileNotFoundException, IOException
Returns the list of offsets using theIOFactory.FILESYSTEM_FACTORY
.- Parameters:
filename
- the file containing the offsets.numberOfTerms
- the number of terms.offsetStep
- the offset step.- Returns:
- if
offsetStep
is less than zero, a memory-mapped, synchronizedSemiExternalOffsetBigList
with offset step equal to-offsetStep
; if it is zero, an in-memory list; if it is greater than than zero, we return a synchronizedSemiExternalOffsetBigList
with offset step equal to-offsetStep
. - Throws:
FileNotFoundException
IOException
-
getInstance
public static Index getInstance(IOFactory ioFactory, CharSequence basename, Properties properties, StringMap<? extends CharSequence> termMap, PrefixMap<? extends CharSequence> prefixMap, boolean randomAccess, boolean documentSizes, EnumMap<Index.UriKeys,String> queryProperties) throws ClassNotFoundException, IOException, InstantiationException, IllegalAccessException
Returns a new disk-based index, loading exactly the specified parts and using preloadedProperties
.- Parameters:
ioFactory
- the factory that will be used to perform I/O.basename
- the basename of the index.properties
- the properties obtained from the given basename.termMap
- the term map for this index, ornull
for no term map.prefixMap
- the prefix map for this index, ornull
for no prefix map.randomAccess
- whether the index should be accessible randomly (e.g., if it will be possible to callIndexReader.documents(long)
on the index readers returned by the index).documentSizes
- if true, document sizes will be loaded (note that sometimes document sizes might be loaded anyway because the compression method for positions requires it).queryProperties
- a map containing associations betweenIndex.UriKeys
and values, ornull
.- Throws:
ClassNotFoundException
IOException
InstantiationException
IllegalAccessException
-
getInstance
public static Index getInstance(CharSequence basename, Properties properties, StringMap<? extends CharSequence> termMap, PrefixMap<? extends CharSequence> prefixMap, boolean randomAccess, boolean documentSizes, EnumMap<Index.UriKeys,String> queryProperties) throws ClassNotFoundException, IOException, InstantiationException, IllegalAccessException
Returns a new disk-based index, loading exactly the specified parts and using preloadedProperties
and theIOFactory.FILESYSTEM_FACTORY
.- Parameters:
basename
- the basename of the index.properties
- the properties obtained from the given basename.termMap
- the term map for this index, ornull
for no term map.prefixMap
- the prefix map for this index, ornull
for no prefix map.randomAccess
- whether the index should be accessible randomly (e.g., if it will be possible to callIndexReader.documents(long)
on the index readers returned by the index).documentSizes
- if true, document sizes will be loaded (note that sometimes document sizes might be loaded anyway because the compression method for positions requires it).queryProperties
- a map containing associations betweenIndex.UriKeys
and values, ornull
.- Throws:
ClassNotFoundException
IOException
InstantiationException
IllegalAccessException
-
getInstance
public static Index getInstance(IOFactory ioFactory, CharSequence basename, Properties properties, boolean randomAccess, boolean documentSizes, boolean maps, EnumMap<Index.UriKeys,String> queryProperties) throws ClassNotFoundException, IOException, InstantiationException, IllegalAccessException
Returns a new disk-based index, using preloadedProperties
and possibly guessing reasonable term and prefix maps from the basename.- Parameters:
ioFactory
- the factory that will be used to perform I/O.basename
- the basename of the index.properties
- the properties obtained by stemmingbasename
.randomAccess
- whether the index should be accessible randomly.documentSizes
- if true, document sizes will be loaded.maps
- if true, term and prefix maps will be guessed and loaded.queryProperties
- a map containing associations betweenIndex.UriKeys
and values, ornull
.- Throws:
IllegalAccessException
InstantiationException
ClassNotFoundException
IOException
- See Also:
getInstance(CharSequence, Properties, StringMap, PrefixMap, boolean, boolean, EnumMap)
-
getInstance
public static Index getInstance(CharSequence basename, Properties properties, boolean randomAccess, boolean documentSizes, boolean maps, EnumMap<Index.UriKeys,String> queryProperties) throws ClassNotFoundException, IOException, InstantiationException, IllegalAccessException
Returns a new disk-based index, using preloadedProperties
and possibly guessing reasonable term and prefix maps from the basename.- Parameters:
basename
- the basename of the index.properties
- the properties obtained by stemmingbasename
.randomAccess
- whether the index should be accessible randomly.documentSizes
- if true, document sizes will be loaded.maps
- if true, term and prefix maps will be guessed and loaded.queryProperties
- a map containing associations betweenIndex.UriKeys
and values, ornull
.- Throws:
IllegalAccessException
InstantiationException
ClassNotFoundException
IOException
- See Also:
getInstance(CharSequence, Properties, StringMap, PrefixMap, boolean, boolean, EnumMap)
-
getInstance
public static Index getInstance(CharSequence basename, boolean randomAccess, boolean documentSizes, boolean maps, EnumMap<Index.UriKeys,String> queryProperties) throws org.apache.commons.configuration.ConfigurationException, ClassNotFoundException, IOException, InstantiationException, IllegalAccessException
Returns a new disk-based index, possibly guessing reasonable term and prefix maps from the basename.If there is a term map file (basename stemmed with .termmap), it is used as term map and, in case it implements
PrefixMap
. Otherwise, we search for a prefix map (basename stemmed with .prefixmap) and, if it implementsStringMap
and no term map has been found, we use it as prefix map.- Parameters:
basename
- the basename of the index.randomAccess
- whether the index should be accessible randomly (e.g., if it will be possible to callIndexReader.documents(long)
on the index readers returned by the index).documentSizes
- if true, document sizes will be loaded (note that sometimes document sizes might be loaded anyway because the compression method for positions requires it).maps
- if true, term and prefix maps will be guessed and loaded (this feature might not be available with some kind of index).queryProperties
- a map containing associations betweenIndex.UriKeys
and values, ornull
.- Throws:
org.apache.commons.configuration.ConfigurationException
ClassNotFoundException
IOException
InstantiationException
IllegalAccessException
-
getInstance
public static Index getInstance(CharSequence basename, boolean randomAccess, boolean documentSizes, boolean maps) throws org.apache.commons.configuration.ConfigurationException, ClassNotFoundException, IOException, InstantiationException, IllegalAccessException
Returns a new disk-based index, using preloadedProperties
and possibly guessing reasonable term and prefix maps from the basename.If there is a term map file (basename stemmed with .termmap), it is used as term map and, in case it implements
PrefixMap
. Otherwise, we search for a prefix map (basename stemmed with .prefixmap) and, if it implementsStringMap
and no term map has been found, we use it as prefix map.- Parameters:
basename
- the basename of the index.randomAccess
- whether the index should be accessible randomly (e.g., if it will be possible to callIndexReader.documents(long)
on the index readers returned by the index).documentSizes
- if true, document sizes will be loaded (note that sometimes document sizes might be loaded anyway because the compression method for positions requires it).maps
- if true, term and prefix maps will be guessed and loaded (this feature might not be available with some kind of index).- Throws:
org.apache.commons.configuration.ConfigurationException
ClassNotFoundException
IOException
InstantiationException
IllegalAccessException
- See Also:
getInstance(CharSequence, boolean, boolean, boolean, EnumMap)
-
getInstance
public static Index getInstance(CharSequence basename, boolean randomAccess, boolean documentSizes) throws org.apache.commons.configuration.ConfigurationException, ClassNotFoundException, IOException, InstantiationException, IllegalAccessException
Returns a new disk-based index, guessing reasonable term and prefix maps from the basename.- Parameters:
basename
- the basename of the index.randomAccess
- whether the index should be accessible randomly (e.g., if it will be possible to callIndexReader.documents(long)
on the index readers returned by the index).documentSizes
- if true, document sizes will be loaded (note that sometimes document sizes might be loaded anyway because the compression method for positions requires it).- Throws:
org.apache.commons.configuration.ConfigurationException
ClassNotFoundException
IOException
InstantiationException
IllegalAccessException
-
getInstance
public static Index getInstance(CharSequence basename, boolean randomAccess) throws org.apache.commons.configuration.ConfigurationException, ClassNotFoundException, IOException, InstantiationException, IllegalAccessException
Returns a new local index, trying to guess reasonable term and prefix maps from the basename, and loading document sizes only if it is necessary.- Parameters:
basename
- the basename of the index.randomAccess
- whether the index should be accessible randomly (e.g., if it will be possible to callIndexReader.documents(long)
on the index readers returned by the index).- Throws:
org.apache.commons.configuration.ConfigurationException
ClassNotFoundException
IOException
InstantiationException
IllegalAccessException
-
getInstance
public static Index getInstance(CharSequence basename) throws org.apache.commons.configuration.ConfigurationException, ClassNotFoundException, IOException, InstantiationException, IllegalAccessException
Returns a new local index, trying to guess reasonable term and prefix maps from the basename, loading offsets but loading document sizes only if it is necessary.- Parameters:
basename
- the basename of the index.- Throws:
org.apache.commons.configuration.ConfigurationException
ClassNotFoundException
IOException
InstantiationException
IllegalAccessException
-
-