|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unimi.di.mg4j.document.AbstractDocumentSequence
it.unimi.di.mg4j.document.AbstractDocumentCollection
it.unimi.di.mg4j.document.SubDocumentCollection
public class SubDocumentCollection
A collection that exhibits a contiguous subsets of documents from a given collection.
This class provides several string-based constructors that use the ObjectParser
conventions; they can be used to generate easily subcollections from the command line.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class it.unimi.di.mg4j.document.AbstractDocumentCollection |
---|
AbstractDocumentCollection.PropertyKeys |
Field Summary |
---|
Fields inherited from interface it.unimi.di.mg4j.document.DocumentCollection |
---|
DEFAULT_EXTENSION |
Constructor Summary | |
---|---|
SubDocumentCollection(DocumentCollection underlyingCollection,
int first)
Creates a new subcollection starting from a given document. |
|
SubDocumentCollection(DocumentCollection underlyingCollection,
int first,
int last)
Creates a new subcollection. |
|
SubDocumentCollection(String underlyingCollectionBasename,
String first)
Creates a new subcollection starting from a given document. |
|
SubDocumentCollection(String underlyingCollectionBasename,
String first,
String last)
Creates a new subcollection. |
Method Summary | |
---|---|
DocumentCollection |
copy()
|
Document |
document(int index)
Returns the document given its index. |
DocumentFactory |
factory()
Returns the factory used by this sequence. |
Reference2ObjectMap<Enum<?>,Object> |
metadata(int index)
Returns the metadata map for a document. |
int |
size()
Returns the number of documents in this collection. |
InputStream |
stream(int index)
Returns an input stream for the raw content of a document. |
Methods inherited from class it.unimi.di.mg4j.document.AbstractDocumentCollection |
---|
ensureDocumentIndex, iterator, main, printAllDocuments, toString |
Methods inherited from class it.unimi.di.mg4j.document.AbstractDocumentSequence |
---|
close, filename, finalize, load |
Methods inherited from class java.lang.Object |
---|
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Methods inherited from interface it.unimi.di.mg4j.document.DocumentSequence |
---|
close, filename |
Constructor Detail |
---|
public SubDocumentCollection(DocumentCollection underlyingCollection, int first, int last)
underlyingCollection
- the underlying document collection.first
- the first document (inclusive) in the subcollection.last
- the last document (exclusive) in this subcollection.public SubDocumentCollection(DocumentCollection underlyingCollection, int first)
The new subcollection will contain all documents from the given one onwards.
underlyingCollection
- the underlying document collection.first
- the first document (inclusive) in the subcollection.public SubDocumentCollection(String underlyingCollectionBasename, String first, String last) throws NumberFormatException, IllegalArgumentException, SecurityException, IOException, ClassNotFoundException
underlyingCollectionBasename
- the basename of the underlying document collection.first
- the first document (inclusive) in the subcollection.last
- the last document (exclusive) in this subcollection.
NumberFormatException
IllegalArgumentException
SecurityException
IOException
ClassNotFoundException
public SubDocumentCollection(String underlyingCollectionBasename, String first) throws NumberFormatException, IllegalArgumentException, SecurityException, IOException, ClassNotFoundException
The new subcollection will contain all documents from the given one onwards.
underlyingCollectionBasename
- the basename of the underlying document collection.first
- the first document (inclusive) in the subcollection.
NumberFormatException
IllegalArgumentException
SecurityException
IOException
ClassNotFoundException
Method Detail |
---|
public DocumentCollection copy()
public Document document(int index) throws IOException
DocumentCollection
index
- an index between 0 (inclusive) and DocumentCollection.size()
(exclusive).
index
-th document.
IOException
public int size()
DocumentCollection
public Reference2ObjectMap<Enum<?>,Object> metadata(int index) throws IOException
DocumentCollection
index
- an index between 0 (inclusive) and DocumentCollection.size()
(exclusive).
IOException
public InputStream stream(int index) throws IOException
DocumentCollection
index
- an index between 0 (inclusive) and DocumentCollection.size()
(exclusive).
IOException
public DocumentFactory factory()
DocumentSequence
Every document sequence is based on a document factory that transforms raw bytes into a sequence of characters. The factory contains useful information such as the number of fields.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |