it.unimi.di.mg4j.document
Interface DocumentCollection

All Superinterfaces:
Closeable, DocumentSequence, FlyweightPrototype<DocumentCollection>
All Known Implementing Classes:
AbstractDocumentCollection, ConcatenatedDocumentCollection, FileSetDocumentCollection, JavamailDocumentCollection, JdbcDocumentCollection, SimpleCompressedDocumentCollection, SubDocumentCollection, TRECDocumentCollection, WikipediaDocumentCollection, ZipDocumentCollection

public interface DocumentCollection
extends DocumentSequence, FlyweightPrototype<DocumentCollection>

A collection of documents.

Classes implementing this interface have additional responsibilities w.r.t. DocumentSequence in that they must provide random access to the documents, and guarantee the possibility of multiple calls to DocumentSequence.iterator().

Note, however, that the objects returned by iterator(), stream(int) and document(int) are, unless explicitly stated otherwise, mutually exclusive. They share a single resource managed by the collection (and disposed by a call to close()), so each time a stream or a document are returned by some method, the ones previously returned are no longer valid, and access to their methods will cause unpredictable behaviour. If you need many documents, you can obtain a flyweight copy of the collection.

Warning: implementations of this class are not required to be thread-safe, but they provide flyweight copies. The copy() method is strengthened so to return a instance of this class.


Field Summary
static String DEFAULT_EXTENSION
          The default extension for a serialised collection (including the dot).
 
Method Summary
 DocumentCollection copy()
           
 Document document(int index)
          Returns the document given its index.
 Reference2ObjectMap<Enum<?>,Object> metadata(int index)
          Returns the metadata map for a document.
 int size()
          Returns the number of documents in this collection.
 InputStream stream(int index)
          Returns an input stream for the raw content of a document.
 
Methods inherited from interface it.unimi.di.mg4j.document.DocumentSequence
close, factory, filename, iterator
 

Field Detail

DEFAULT_EXTENSION

static final String DEFAULT_EXTENSION
The default extension for a serialised collection (including the dot).

See Also:
Constant Field Values
Method Detail

size

int size()
Returns the number of documents in this collection.

Returns:
the number of documents in this collection.

document

Document document(int index)
                  throws IOException
Returns the document given its index.

Parameters:
index - an index between 0 (inclusive) and size() (exclusive).
Returns:
the index-th document.
Throws:
IOException

stream

InputStream stream(int index)
                   throws IOException
Returns an input stream for the raw content of a document.

Parameters:
index - an index between 0 (inclusive) and size() (exclusive).
Returns:
the raw content of the document as an input stream.
Throws:
IOException

metadata

Reference2ObjectMap<Enum<?>,Object> metadata(int index)
                                             throws IOException
Returns the metadata map for a document.

Parameters:
index - an index between 0 (inclusive) and size() (exclusive).
Returns:
the metadata map for the document.
Throws:
IOException

copy

DocumentCollection copy()
Specified by:
copy in interface FlyweightPrototype<DocumentCollection>