it.unimi.di.mg4j.document
Class SubDocumentCollection

java.lang.Object
  extended by it.unimi.di.mg4j.document.AbstractDocumentSequence
      extended by it.unimi.di.mg4j.document.AbstractDocumentCollection
          extended by it.unimi.di.mg4j.document.SubDocumentCollection
All Implemented Interfaces:
DocumentCollection, DocumentSequence, SafelyCloseable, FlyweightPrototype<DocumentCollection>, Closeable

public class SubDocumentCollection
extends AbstractDocumentCollection

A collection that exhibits a contiguous subsets of documents from a given collection.

This class provides several string-based constructors that use the ObjectParser conventions; they can be used to generate easily subcollections from the command line.

Author:
Sebastiano Vigna

Nested Class Summary
 
Nested classes/interfaces inherited from class it.unimi.di.mg4j.document.AbstractDocumentCollection
AbstractDocumentCollection.PropertyKeys
 
Field Summary
 
Fields inherited from interface it.unimi.di.mg4j.document.DocumentCollection
DEFAULT_EXTENSION
 
Constructor Summary
SubDocumentCollection(DocumentCollection underlyingCollection, int first)
          Creates a new subcollection starting from a given document.
SubDocumentCollection(DocumentCollection underlyingCollection, int first, int last)
          Creates a new subcollection.
SubDocumentCollection(String underlyingCollectionBasename, String first)
          Creates a new subcollection starting from a given document.
SubDocumentCollection(String underlyingCollectionBasename, String first, String last)
          Creates a new subcollection.
 
Method Summary
 DocumentCollection copy()
           
 Document document(int index)
          Returns the document given its index.
 DocumentFactory factory()
          Returns the factory used by this sequence.
 Reference2ObjectMap<Enum<?>,Object> metadata(int index)
          Returns the metadata map for a document.
 int size()
          Returns the number of documents in this collection.
 InputStream stream(int index)
          Returns an input stream for the raw content of a document.
 
Methods inherited from class it.unimi.di.mg4j.document.AbstractDocumentCollection
ensureDocumentIndex, iterator, main, printAllDocuments, toString
 
Methods inherited from class it.unimi.di.mg4j.document.AbstractDocumentSequence
close, filename, finalize, load
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface it.unimi.di.mg4j.document.DocumentSequence
close, filename
 

Constructor Detail

SubDocumentCollection

public SubDocumentCollection(DocumentCollection underlyingCollection,
                             int first,
                             int last)
Creates a new subcollection.

Parameters:
underlyingCollection - the underlying document collection.
first - the first document (inclusive) in the subcollection.
last - the last document (exclusive) in this subcollection.

SubDocumentCollection

public SubDocumentCollection(DocumentCollection underlyingCollection,
                             int first)
Creates a new subcollection starting from a given document.

The new subcollection will contain all documents from the given one onwards.

Parameters:
underlyingCollection - the underlying document collection.
first - the first document (inclusive) in the subcollection.

SubDocumentCollection

public SubDocumentCollection(String underlyingCollectionBasename,
                             String first,
                             String last)
                      throws NumberFormatException,
                             IllegalArgumentException,
                             SecurityException,
                             IOException,
                             ClassNotFoundException
Creates a new subcollection.

Parameters:
underlyingCollectionBasename - the basename of the underlying document collection.
first - the first document (inclusive) in the subcollection.
last - the last document (exclusive) in this subcollection.
Throws:
NumberFormatException
IllegalArgumentException
SecurityException
IOException
ClassNotFoundException

SubDocumentCollection

public SubDocumentCollection(String underlyingCollectionBasename,
                             String first)
                      throws NumberFormatException,
                             IllegalArgumentException,
                             SecurityException,
                             IOException,
                             ClassNotFoundException
Creates a new subcollection starting from a given document.

The new subcollection will contain all documents from the given one onwards.

Parameters:
underlyingCollectionBasename - the basename of the underlying document collection.
first - the first document (inclusive) in the subcollection.
Throws:
NumberFormatException
IllegalArgumentException
SecurityException
IOException
ClassNotFoundException
Method Detail

copy

public DocumentCollection copy()

document

public Document document(int index)
                  throws IOException
Description copied from interface: DocumentCollection
Returns the document given its index.

Parameters:
index - an index between 0 (inclusive) and DocumentCollection.size() (exclusive).
Returns:
the index-th document.
Throws:
IOException

size

public int size()
Description copied from interface: DocumentCollection
Returns the number of documents in this collection.

Returns:
the number of documents in this collection.

metadata

public Reference2ObjectMap<Enum<?>,Object> metadata(int index)
                                             throws IOException
Description copied from interface: DocumentCollection
Returns the metadata map for a document.

Parameters:
index - an index between 0 (inclusive) and DocumentCollection.size() (exclusive).
Returns:
the metadata map for the document.
Throws:
IOException

stream

public InputStream stream(int index)
                   throws IOException
Description copied from interface: DocumentCollection
Returns an input stream for the raw content of a document.

Parameters:
index - an index between 0 (inclusive) and DocumentCollection.size() (exclusive).
Returns:
the raw content of the document as an input stream.
Throws:
IOException

factory

public DocumentFactory factory()
Description copied from interface: DocumentSequence
Returns the factory used by this sequence.

Every document sequence is based on a document factory that transforms raw bytes into a sequence of characters. The factory contains useful information such as the number of fields.

Returns:
the factory used by this sequence.