|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface DocumentSequence
A sequence of documents.
This is the most basic class available in MG4J for representing a sequence to documents to be indexed. Its only duty is to be able to return once an iterator over the documents in sequence.
The iterator returned by iterator()
must always return the
same documents in the same order, given the same external conditions
(standard input, file system, etc.).
Document sequences must always return documents of the same type. This
is usually accomplished by providing at construction time a DocumentFactory
that will be used to build and parse documents. Of course, it is possible to
create document sequences with a hardwired factory
(see, e.g., ZipDocumentCollection
).
Some sequences might require invoking filename(CharSequence)
to
access ancillary data. AbstractDocumentSequence.load(CharSequence)
is
the suggest method for deserialising sequences, as it will do it for you.
Method Summary | |
---|---|
void |
close()
Closes this document sequence, releasing all resources. |
DocumentFactory |
factory()
Returns the factory used by this sequence. |
void |
filename(CharSequence filename)
Sets the filename of this document sequence. |
DocumentIterator |
iterator()
Returns an iterator over the sequence of documents. |
Method Detail |
---|
DocumentIterator iterator() throws IOException
Warning: this method can be safely called just one time. For instance, implementations based on standard input will usually throw an exception if this method is called twice.
Implementations may decide to override this restriction
(in particular, if they implement DocumentCollection
). Usually,
however, it is not possible to obtain two iterators at the
same time on a collection.
IOException
DocumentCollection
DocumentFactory factory()
Every document sequence is based on a document factory that transforms raw bytes into a sequence of characters. The factory contains useful information such as the number of fields.
void close() throws IOException
You should always call this method after having finished with this document sequence.
Implementations are invited to call this method in a finaliser as a safety net (even better,
implement SafelyCloseable
), but since there
is no guarantee as to when finalisers are invoked, you should not depend on this behaviour.
close
in interface Closeable
IOException
void filename(CharSequence filename) throws IOException
Several document sequences (or collections) are stored using Java's standard serialisation mechanism; nonetheless, they require access to files that are stored as serialised filenames inside the instance. If all pieces are in the current directory, this works as expected. However, if the sequence was specified using a complete pathname, during deserialisation it will be impossible to recover the associated files. In this case, the class expects that this method is invoked over the newly deserialised instance so that pathnames can be relativised to the given filename. Classes that need this mechanism should not fail upon deserialisation if they do not find some support file, but rather wait for the first access.
In several cases, this method can be a no-op (e.g., for an InputStreamDocumentSequence
or a FileSetDocumentCollection
).
Other implementations, such as SimpleCompressedDocumentCollection
or ZipDocumentCollection
, require
a specific treatment. AbstractDocumentSequence
implements this method as a no-op.
filename
- the filename of this document sequence.
IOException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |