Package it.unimi.di.big.mg4j.document
Class SubDocumentCollection
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.AbstractDocumentSequence
-
- it.unimi.di.big.mg4j.document.AbstractDocumentCollection
-
- it.unimi.di.big.mg4j.document.SubDocumentCollection
-
- All Implemented Interfaces:
DocumentCollection
,DocumentSequence
,SafelyCloseable
,FlyweightPrototype<DocumentCollection>
,Closeable
,AutoCloseable
public class SubDocumentCollection extends AbstractDocumentCollection
A collection that exhibits a contiguous subsets of documents from a given collection.This class provides several string-based constructors that use the
ObjectParser
conventions; they can be used to generate easily subcollections from the command line.- Author:
- Sebastiano Vigna
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentCollection
AbstractDocumentCollection.PropertyKeys
-
-
Field Summary
-
Fields inherited from interface it.unimi.di.big.mg4j.document.DocumentCollection
DEFAULT_EXTENSION
-
-
Constructor Summary
Constructors Constructor Description SubDocumentCollection(DocumentCollection underlyingCollection, long first)
Creates a new subcollection starting from a given document.SubDocumentCollection(DocumentCollection underlyingCollection, long first, long last)
Creates a new subcollection.SubDocumentCollection(String underlyingCollectionFilename, String first)
Creates a new subcollection starting from a given document.SubDocumentCollection(String underlyingCollectionFilename, String first, String last)
Creates a new subcollection.
-
Method Summary
Modifier and Type Method Description DocumentCollection
copy()
Document
document(long index)
Returns the document given its index.DocumentFactory
factory()
Returns the factory used by this sequence.Reference2ObjectMap<Enum<?>,Object>
metadata(long index)
Returns the metadata map for a document.long
size()
Returns the number of documents in this collection.InputStream
stream(long index)
Returns an input stream for the raw content of a document.-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentCollection
ensureDocumentIndex, iterator, main, printAllDocuments, toString
-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentSequence
close, filename, finalize, load
-
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface it.unimi.di.big.mg4j.document.DocumentSequence
close, filename
-
-
-
-
Constructor Detail
-
SubDocumentCollection
public SubDocumentCollection(DocumentCollection underlyingCollection, long first, long last)
Creates a new subcollection.- Parameters:
underlyingCollection
- the underlying document collection.first
- the first document (inclusive) in the subcollection.last
- the last document (exclusive) in this subcollection.
-
SubDocumentCollection
public SubDocumentCollection(DocumentCollection underlyingCollection, long first)
Creates a new subcollection starting from a given document.The new subcollection will contain all documents from the given one onwards.
- Parameters:
underlyingCollection
- the underlying document collection.first
- the first document (inclusive) in the subcollection.
-
SubDocumentCollection
public SubDocumentCollection(String underlyingCollectionFilename, String first, String last) throws NumberFormatException, IllegalArgumentException, SecurityException, IOException, ClassNotFoundException
Creates a new subcollection.- Parameters:
underlyingCollectionFilename
- the filename of the underlying document collection.first
- the first document (inclusive) in the subcollection.last
- the last document (exclusive) in this subcollection.- Throws:
NumberFormatException
IllegalArgumentException
SecurityException
IOException
ClassNotFoundException
-
SubDocumentCollection
public SubDocumentCollection(String underlyingCollectionFilename, String first) throws NumberFormatException, IllegalArgumentException, SecurityException, IOException, ClassNotFoundException
Creates a new subcollection starting from a given document.The new subcollection will contain all documents from the given one onwards.
- Parameters:
underlyingCollectionFilename
- the filename of the underlying document collection.first
- the first document (inclusive) in the subcollection.- Throws:
NumberFormatException
IllegalArgumentException
SecurityException
IOException
ClassNotFoundException
-
-
Method Detail
-
copy
public DocumentCollection copy()
-
document
public Document document(long index) throws IOException
Description copied from interface:DocumentCollection
Returns the document given its index.- Parameters:
index
- an index between 0 (inclusive) andDocumentCollection.size()
(exclusive).- Returns:
- the
index
-th document. - Throws:
IOException
-
size
public long size()
Description copied from interface:DocumentCollection
Returns the number of documents in this collection.- Returns:
- the number of documents in this collection.
-
metadata
public Reference2ObjectMap<Enum<?>,Object> metadata(long index) throws IOException
Description copied from interface:DocumentCollection
Returns the metadata map for a document.- Parameters:
index
- an index between 0 (inclusive) andDocumentCollection.size()
(exclusive).- Returns:
- the metadata map for the document.
- Throws:
IOException
-
stream
public InputStream stream(long index) throws IOException
Description copied from interface:DocumentCollection
Returns an input stream for the raw content of a document.- Parameters:
index
- an index between 0 (inclusive) andDocumentCollection.size()
(exclusive).- Returns:
- the raw content of the document as an input stream.
- Throws:
IOException
-
factory
public DocumentFactory factory()
Description copied from interface:DocumentSequence
Returns the factory used by this sequence.Every document sequence is based on a document factory that transforms raw bytes into a sequence of characters. The factory contains useful information such as the number of fields.
- Returns:
- the factory used by this sequence.
-
-