it.unimi.di.mg4j.document
Class ConcatenatedDocumentCollection

java.lang.Object
  extended by it.unimi.di.mg4j.document.AbstractDocumentSequence
      extended by it.unimi.di.mg4j.document.AbstractDocumentCollection
          extended by it.unimi.di.mg4j.document.ConcatenatedDocumentCollection
All Implemented Interfaces:
DocumentCollection, DocumentSequence, SafelyCloseable, FlyweightPrototype<DocumentCollection>, Closeable, Serializable

public class ConcatenatedDocumentCollection
extends AbstractDocumentCollection
implements Serializable

A document collection exhibiting a list of underlying document collections, called segments, as a single collection. The underlying collections are (virtually) concatenated—that is, the first document of the second collection is renumbered to the size of the first collection, and so on. All underlying collections must use the same factory class.

Author:
Sebastiano Vigna
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class it.unimi.di.mg4j.document.AbstractDocumentCollection
AbstractDocumentCollection.PropertyKeys
 
Field Summary
 
Fields inherited from interface it.unimi.di.mg4j.document.DocumentCollection
DEFAULT_EXTENSION
 
Constructor Summary
  ConcatenatedDocumentCollection(String... collectionName)
          Creates a new, partially uninitialised concatenated document collection using giving component collections names.
protected ConcatenatedDocumentCollection(String[] collectionName, DocumentCollection[] collection)
          Creates a new concatenated document collection using giving component collections.
 
Method Summary
 void close()
          Closes this document sequence, releasing all resources.
 DocumentCollection copy()
           
 Document document(int index)
          Returns the document given its index.
 DocumentFactory factory()
          Returns the factory used by this sequence.
 void filename(CharSequence filename)
          Does nothing.
 Reference2ObjectMap<Enum<?>,Object> metadata(int index)
          Returns the metadata map for a document.
 int size()
          Returns the number of documents in this collection.
 InputStream stream(int index)
          Returns an input stream for the raw content of a document.
 
Methods inherited from class it.unimi.di.mg4j.document.AbstractDocumentCollection
ensureDocumentIndex, iterator, main, printAllDocuments, toString
 
Methods inherited from class it.unimi.di.mg4j.document.AbstractDocumentSequence
finalize, load
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ConcatenatedDocumentCollection

protected ConcatenatedDocumentCollection(String[] collectionName,
                                         DocumentCollection[] collection)
Creates a new concatenated document collection using giving component collections.

Parameters:
collection - a list of component collections.

ConcatenatedDocumentCollection

public ConcatenatedDocumentCollection(String... collectionName)
                               throws IllegalArgumentException,
                                      SecurityException
Creates a new, partially uninitialised concatenated document collection using giving component collections names.

Parameters:
collectionName - a list of names of component collections.
Throws:
IllegalArgumentException
SecurityException
Method Detail

filename

public void filename(CharSequence filename)
Description copied from class: AbstractDocumentSequence
Does nothing.

Specified by:
filename in interface DocumentSequence
Overrides:
filename in class AbstractDocumentSequence
Parameters:
filename - the filename of this document sequence.

copy

public DocumentCollection copy()
Specified by:
copy in interface DocumentCollection
Specified by:
copy in interface FlyweightPrototype<DocumentCollection>

document

public Document document(int index)
                  throws IOException
Description copied from interface: DocumentCollection
Returns the document given its index.

Specified by:
document in interface DocumentCollection
Parameters:
index - an index between 0 (inclusive) and DocumentCollection.size() (exclusive).
Returns:
the index-th document.
Throws:
IOException

metadata

public Reference2ObjectMap<Enum<?>,Object> metadata(int index)
                                             throws IOException
Description copied from interface: DocumentCollection
Returns the metadata map for a document.

Specified by:
metadata in interface DocumentCollection
Parameters:
index - an index between 0 (inclusive) and DocumentCollection.size() (exclusive).
Returns:
the metadata map for the document.
Throws:
IOException

size

public int size()
Description copied from interface: DocumentCollection
Returns the number of documents in this collection.

Specified by:
size in interface DocumentCollection
Returns:
the number of documents in this collection.

stream

public InputStream stream(int index)
                   throws IOException
Description copied from interface: DocumentCollection
Returns an input stream for the raw content of a document.

Specified by:
stream in interface DocumentCollection
Parameters:
index - an index between 0 (inclusive) and DocumentCollection.size() (exclusive).
Returns:
the raw content of the document as an input stream.
Throws:
IOException

factory

public DocumentFactory factory()
Description copied from interface: DocumentSequence
Returns the factory used by this sequence.

Every document sequence is based on a document factory that transforms raw bytes into a sequence of characters. The factory contains useful information such as the number of fields.

Specified by:
factory in interface DocumentSequence
Returns:
the factory used by this sequence.

close

public void close()
           throws IOException
Description copied from interface: DocumentSequence
Closes this document sequence, releasing all resources.

You should always call this method after having finished with this document sequence. Implementations are invited to call this method in a finaliser as a safety net (even better, implement SafelyCloseable), but since there is no guarantee as to when finalisers are invoked, you should not depend on this behaviour.

Specified by:
close in interface DocumentSequence
Specified by:
close in interface Closeable
Overrides:
close in class AbstractDocumentSequence
Throws:
IOException