Interface DocumentCollection
-
- All Superinterfaces:
AutoCloseable
,Closeable
,DocumentSequence
,FlyweightPrototype<DocumentCollection>
- All Known Implementing Classes:
AbstractDocumentCollection
,ConcatenatedDocumentCollection
,FileSetDocumentCollection
,JavamailDocumentCollection
,JdbcDocumentCollection
,SimpleCompressedDocumentCollection
,SubDocumentCollection
,TRECDocumentCollection
,WikipediaDocumentCollection
,ZipDocumentCollection
public interface DocumentCollection extends DocumentSequence, FlyweightPrototype<DocumentCollection>
A collection of documents.Classes implementing this interface have additional responsibilities w.r.t.
DocumentSequence
in that they must provide random access to the documents, and guarantee the possibility of multiple calls toDocumentSequence.iterator()
.Note, however, that the objects returned by
iterator()
,stream(long)
anddocument(long)
are, unless explicitly stated otherwise, mutually exclusive. They share a single resource managed by the collection (and disposed by a call toclose()
), so each time a stream or a document are returned by some method, the ones previously returned are no longer valid, and access to their methods will cause unpredictable behaviour. If you need many documents, you can obtain a flyweight copy of the collection.Warning: implementations of this class are not required to be thread-safe, but they provide flyweight copies. The
copy()
method is strengthened so to return a instance of this class.
-
-
Field Summary
Fields Modifier and Type Field Description static String
DEFAULT_EXTENSION
The default extension for a serialised collection (including the dot).
-
Method Summary
Modifier and Type Method Description DocumentCollection
copy()
Document
document(long index)
Returns the document given its index.Reference2ObjectMap<Enum<?>,Object>
metadata(long index)
Returns the metadata map for a document.long
size()
Returns the number of documents in this collection.InputStream
stream(long index)
Returns an input stream for the raw content of a document.-
Methods inherited from interface it.unimi.di.big.mg4j.document.DocumentSequence
close, factory, filename, iterator
-
-
-
-
Field Detail
-
DEFAULT_EXTENSION
static final String DEFAULT_EXTENSION
The default extension for a serialised collection (including the dot).- See Also:
- Constant Field Values
-
-
Method Detail
-
size
long size()
Returns the number of documents in this collection.- Returns:
- the number of documents in this collection.
-
document
Document document(long index) throws IOException
Returns the document given its index.- Parameters:
index
- an index between 0 (inclusive) andsize()
(exclusive).- Returns:
- the
index
-th document. - Throws:
IOException
-
stream
InputStream stream(long index) throws IOException
Returns an input stream for the raw content of a document.- Parameters:
index
- an index between 0 (inclusive) andsize()
(exclusive).- Returns:
- the raw content of the document as an input stream.
- Throws:
IOException
-
metadata
Reference2ObjectMap<Enum<?>,Object> metadata(long index) throws IOException
Returns the metadata map for a document.- Parameters:
index
- an index between 0 (inclusive) andsize()
(exclusive).- Returns:
- the metadata map for the document.
- Throws:
IOException
-
copy
DocumentCollection copy()
- Specified by:
copy
in interfaceFlyweightPrototype<DocumentCollection>
-
-