|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unimi.di.mg4j.index.Index
public abstract class Index
An abstract representation of an index.
Concrete subclasses of this class represent abstract index access information: for instance, the basename or IP address/port, flags, etc. It allows to build easily index readers over the index: in turn, index readers provide document iterators.
This class contains just methods declarations, and attributes for all data that is common to any form of index. Note that we use an abstract class, rather than an interface, because interfaces do not allow to declare attributes.
We provide static factory methods (e.g., getInstance(CharSequence)
)
that return an index given a suitable URI string. If the scheme part is mg4j, then
the URI is assumed to point at a remote index. Otherwise, it is assumed to be the
basename of a local index. In both cases, a query part introduced by ? can
specify additional parameters (key=value pairs separated
by ;). For instance, the URI example?inmemory=1 will load
the index with basename example, caching its content in core memory.
Please have a look at constants in Index.UriKeys
(and analogous enums in subclasses) for additional parameters.
If the index is local, by convention this class will locate a property file with extension
DiskBasedIndex.PROPERTIES_EXTENSION
that is expected to contain a number
of key/value pairs (which are quite informative and can be examined
manually). In particular, the key Index.PropertyKeys.INDEXCLASS
explain which kind of
index class should be used to read the index. The file might contain additional keys
depending on the value of Index.PropertyKeys.INDEXCLASS
(e.g., QuasiSuccinctIndex.PropertyKeys.BYTEORDER
).
An index usually exposes term
or prefix maps and the size list but this is not compulsory
(the latter, in particular, is necessary with certain codings).
Indices are a natural candidate for multithreaded access. An instance of this class
must be thread safe as long as external data structures provided to its
constructors are. For instance, the tool IndexBuilder
generates
a synchronized ImmutableExternalPrefixMap
so that by default the resulting index is thread safe.
For instance, a DiskBasedIndex
requires a list of
term offsets, term maps, etc. As long as all these data structures are thread safe, the
same is true of the index. Data structures created by static factory methods such as
DiskBasedIndex.getInstance(CharSequence)
are thread safe.
Note that IndexReader
s returned by getReader()
are not thread safe (even if the method getReader()
is). The logic behind
this arrangement is that you create as many reader as you need, and then Closeable.close()
them. In a multithreaded
environment, a pool of index readers can be created, and a custom QueryBuilderVisitor
can be used to build DocumentIterator
s using the given pool of readers. In
this case readers are not closed, but rather reused.
Implementations of this class are strongly encouraged to offer read-once constructors
and factory methods: property files and other data related to the index (but not to an IndexReader
should be read exactly once, and sequentially. This feature is very useful when
combining indices.
Nested Class Summary | |
---|---|
class |
Index.EmptyIndexIterator
An iterator returning no documents based on this index. |
static class |
Index.PropertyKeys
Symbolic names for properties of a Index . |
static class |
Index.UriKeys
Keys to be used (downcased) in specifiying additional parameters to a MG4J URI. |
Field Summary | |
---|---|
String |
field
The field indexed by this index, or null . |
boolean |
hasCounts
Whether this index contains counts. |
boolean |
hasPayloads
Whether this index contains payloads; if true, payload is non-null . |
boolean |
hasPositions
Whether this index contains positions. |
Index |
keyIndex
The index used as a key to retrieve intervals. |
int |
maxCount
The maximum number of positions in an position list, or possibly -1 if this index does not have positions. |
int |
numberOfDocuments
The number of documents of the collection. |
long |
numberOfOccurrences
The number of occurrences of the collection, or possibly -1 if it is unknown. |
long |
numberOfPostings
The number of postings (pairs term/document) of the collection. |
int |
numberOfTerms
The number of terms of the collection. |
Payload |
payload
The payload for this index, or null . |
PrefixMap<? extends CharSequence> |
prefixMap
The prefix map for this index, or null if the prefix map was not loaded. |
Properties |
properties
The properties of this index. |
ReferenceSet<Index> |
singletonSet
An immutable singleton set containing just keyIndex . |
IntList |
sizes
The size of each document, or null if sizes are not necessary or not loaded in this index. |
StringMap<? extends CharSequence> |
termMap
The term map for this index, or null if the term map was not loaded. |
TermProcessor |
termProcessor
The term processor used to build this index. |
Constructor Summary | |
---|---|
protected |
Index(int numberOfDocuments,
int numberOfTerms,
long numberOfPostings,
long numberOfOccurrences,
int maxCount,
Payload payload,
boolean hasCounts,
boolean hasPositions,
TermProcessor termProcessor,
String field,
StringMap<? extends CharSequence> termMap,
PrefixMap<? extends CharSequence> prefixMap,
IntList sizes,
Properties properties)
Creates a new instance, initialising all fields. |
Method Summary | |
---|---|
IndexIterator |
documents(CharSequence term)
Creates a new IndexReader for this index and uses it to return
an index iterator over the documents containing a term; the term is
given explicitly, and the index term map is used, if present. |
IndexIterator |
documents(CharSequence prefix,
int limit)
Creates a number of instances of IndexReader for this index and uses them to return
a MultiTermIndexIterator over the documents containing any term our of a set of terms defined
by a prefix; the prefix is given explicitly, and unless the index has a
prefix map, an UnsupportedOperationException
will be thrown. |
IndexIterator |
documents(int term)
Creates a new IndexReader for this index and uses it to return
an index iterator over the documents containing a term. |
IndexIterator |
getEmptyIndexIterator()
|
IndexIterator |
getEmptyIndexIterator(CharSequence term)
|
IndexIterator |
getEmptyIndexIterator(CharSequence term,
int termNumber)
|
IndexIterator |
getEmptyIndexIterator(int term)
|
static Index |
getInstance(CharSequence uri)
Returns a new index using the given URI, searching dynamically for term and prefix maps, loading offsets but loading document sizes only if it is necessary. |
static Index |
getInstance(CharSequence uri,
boolean randomAccess)
Returns a new index using the given URI, searching dynamically for term and prefix maps and loading document sizes only if it is necessary. |
static Index |
getInstance(CharSequence uri,
boolean randomAccess,
boolean documentSizes)
Returns a new index using the given URI, searching dynamically for term and prefix maps. |
static Index |
getInstance(CharSequence uri,
boolean randomAccess,
boolean documentSizes,
boolean maps)
Returns a new index using the given URI and no IOFactory . |
static Index |
getInstance(IOFactory ioFactory,
CharSequence uri,
boolean randomAccess,
boolean documentSizes,
boolean maps)
Returns a new index using the given URI. |
IndexReader |
getReader()
Creates and returns a new IndexReader based on this index, using
the default buffer size. |
abstract IndexReader |
getReader(int bufferSize)
Creates and returns a new IndexReader based on this index. |
protected static TermProcessor |
getTermProcessor(Properties properties)
|
void |
keyIndex(Index newKeyIndex)
Sets the index used as a key to retrieve intervals from iterators generated from this index. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public final String field
null
.
public final Properties properties
null
.
public final int numberOfDocuments
public final int numberOfTerms
public final long numberOfOccurrences
public final long numberOfPostings
public final int maxCount
public final Payload payload
null
.
public final boolean hasPayloads
payload
is non-null
.
public final boolean hasCounts
public final boolean hasPositions
public final TermProcessor termProcessor
public ReferenceSet<Index> singletonSet
keyIndex
.
public Index keyIndex
this
, but it is settable.
public final StringMap<? extends CharSequence> termMap
null
if the term map was not loaded.
public final PrefixMap<? extends CharSequence> prefixMap
null
if the prefix map was not loaded.
public final IntList sizes
null
if sizes are not necessary or not loaded in this index.
Constructor Detail |
---|
protected Index(int numberOfDocuments, int numberOfTerms, long numberOfPostings, long numberOfOccurrences, int maxCount, Payload payload, boolean hasCounts, boolean hasPositions, TermProcessor termProcessor, String field, StringMap<? extends CharSequence> termMap, PrefixMap<? extends CharSequence> prefixMap, IntList sizes, Properties properties)
Method Detail |
---|
protected static TermProcessor getTermProcessor(Properties properties)
public static Index getInstance(IOFactory ioFactory, CharSequence uri, boolean randomAccess, boolean documentSizes, boolean maps) throws IOException, ConfigurationException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
If uri
has scheme mg4j, the index is considered to be remote
and index creation delegated to IndexServer.getIndex(String, int, boolean, boolean)
. Otherwise,
we delegate to DiskBasedIndex.getInstance(CharSequence, boolean, boolean, boolean, EnumMap)
.
ioFactory
- the factory that will be used to perform I/O, or null
(implying the IOFactory.FILESYSTEM_FACTORY
for disk-based indices).uri
- the URI defining the index.randomAccess
- whether the index should be accessible randomly.documentSizes
- if true, document sizes will be loaded (note that sometimes document sizes
might be loaded anyway because the compression method for positions requires it).maps
- if true, term and prefix maps will be guessed and loaded (this
feature might not be available with some kind of index).
IOException
ConfigurationException
URISyntaxException
ClassNotFoundException
SecurityException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
public static Index getInstance(CharSequence uri, boolean randomAccess, boolean documentSizes, boolean maps) throws IOException, ConfigurationException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
IOFactory
.
If uri
has scheme mg4j, the index is considered to be remote
and index creation delegated to IndexServer.getIndex(String, int, boolean, boolean)
. Otherwise,
we delegate to DiskBasedIndex.getInstance(CharSequence, boolean, boolean, boolean, EnumMap)
.
uri
- the URI defining the index.randomAccess
- whether the index should be accessible randomly.documentSizes
- if true, document sizes will be loaded (note that sometimes document sizes
might be loaded anyway because the compression method for positions requires it).maps
- if true, term and prefix maps will be guessed and loaded (this
feature might not be available with some kind of index).
IOException
ConfigurationException
URISyntaxException
ClassNotFoundException
SecurityException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
public static Index getInstance(CharSequence uri, boolean randomAccess, boolean documentSizes) throws IOException, ConfigurationException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
uri
- the URI defining the index.randomAccess
- whether the index should be accessible randomly.documentSizes
- if true, document sizes will be loaded (note that sometimes document sizes
might be loaded anyway because the compression method for positions requires it).
IOException
ConfigurationException
URISyntaxException
ClassNotFoundException
SecurityException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
getInstance(CharSequence, boolean, boolean, boolean)
public static Index getInstance(CharSequence uri, boolean randomAccess) throws ConfigurationException, IOException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
uri
- the URI defining the index.randomAccess
- whether the index should be accessible randomly.
ConfigurationException
IOException
URISyntaxException
ClassNotFoundException
SecurityException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
getInstance(CharSequence, boolean, boolean)
public static Index getInstance(CharSequence uri) throws ConfigurationException, IOException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
uri
- the URI defining the index.
ConfigurationException
IOException
URISyntaxException
ClassNotFoundException
SecurityException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
getInstance(CharSequence, boolean)
public IndexIterator getEmptyIndexIterator()
public IndexIterator getEmptyIndexIterator(int term)
public IndexIterator getEmptyIndexIterator(CharSequence term)
public IndexIterator getEmptyIndexIterator(CharSequence term, int termNumber)
public IndexReader getReader() throws IOException
IndexReader
based on this index, using
the default buffer size. After that, you can use the reader to read this index.
IndexReader
to read this index.
IOException
public abstract IndexReader getReader(int bufferSize) throws IOException
IndexReader
based on this index. After that, you
can use the reader to read this index.
bufferSize
- the size of the buffer to be used accessing the reader, or -1
for a default buffer size.
IndexReader
to read this index.
IOException
public IndexIterator documents(int term) throws IOException
IndexReader
for this index and uses it to return
an index iterator over the documents containing a term.
Since the reader is created from scratch, it is essential
to dispose the
returned iterator after usage. See IndexReader.documents(int)
for a method with the same semantics, but making reader reuse possible.
term
- a term.
IOException
- if an exception occurred while accessing the index.
UnsupportedOperationException
- if this index is not accessible by term
number.IndexReader.documents(int)
public IndexIterator documents(CharSequence term) throws IOException
IndexReader
for this index and uses it to return
an index iterator over the documents containing a term; the term is
given explicitly, and the index term map is used, if present.
Since the reader is created from scratch, it is essential
to dispose the
returned iterator after usage. See IndexReader.documents(int)
for a method with the same semantics, but making reader reuse possible.
Unless the term processor of
this index is null
, words coming from a query will
have to be processed before being used with this method.
term
- a term.
IOException
- if an exception occurred while accessing the index.
UnsupportedOperationException
- if the term map is not
available for this index.IndexReader.documents(CharSequence)
public IndexIterator documents(CharSequence prefix, int limit) throws IOException, TooManyTermsException
IndexReader
for this index and uses them to return
a MultiTermIndexIterator
over the documents containing any term our of a set of terms defined
by a prefix; the prefix is given explicitly, and unless the index has a
prefix map, an UnsupportedOperationException
will be thrown.
prefix
- a prefix.limit
- a limit on the number of terms that will be used to resolve
the prefix query; if the terms starting with prefix
are more than
limit
, a TooManyTermsException
will be thrown.
UnsupportedOperationException
- if this index cannot resolve prefixes.
TooManyTermsException
- if there are more than limit
terms starting with prefix
.
IOException
public void keyIndex(Index newKeyIndex)
This setter is a compromise between clarity of design and efficiency.
Each index iterator is based on an index, and when that index is passed
to DocumentIterator.intervalIterator(Index)
, intervals corresponding
to the positions of the term in the current document are returned. Analogously,
DocumentIterator.indices()
returns a singleton
set containing the index. However, when composing indices into clusters,
often iterators generated by a local index must act as if they really belong
to the global index. This method allows to set the index that is used as
a key to return intervals, and that is contained in singletonSet
.
Note that setting this value will only influence index readers created afterwards.
newKeyIndex
- the new index to be used as a key for interval retrieval.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |