Package it.unimi.di.big.mg4j.document
Class ConcatenatedDocumentCollection
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.AbstractDocumentSequence
-
- it.unimi.di.big.mg4j.document.AbstractDocumentCollection
-
- it.unimi.di.big.mg4j.document.ConcatenatedDocumentCollection
-
- All Implemented Interfaces:
DocumentCollection
,DocumentSequence
,SafelyCloseable
,FlyweightPrototype<DocumentCollection>
,Closeable
,Serializable
,AutoCloseable
public class ConcatenatedDocumentCollection extends AbstractDocumentCollection implements Serializable
A document collection exhibiting a list of underlying document collections, called segments, as a single collection. The underlying collections are (virtually) concatenated—that is, the first document of the second collection is renumbered to the size of the first collection, and so on. All underlying collections must use the same factory class.A main method makes it easy to create concatenated collections given the filenames of the component collections.
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentCollection
AbstractDocumentCollection.PropertyKeys
-
-
Field Summary
-
Fields inherited from interface it.unimi.di.big.mg4j.document.DocumentCollection
DEFAULT_EXTENSION
-
-
Constructor Summary
Constructors Modifier Constructor Description ConcatenatedDocumentCollection(String... collectionName)
Creates a new, partially uninitialised concatenated document collection using giving component collections names.protected
ConcatenatedDocumentCollection(String[] collectionName, DocumentCollection[] collection)
Creates a new concatenated document collection using giving component collections.
-
Method Summary
Modifier and Type Method Description void
close()
Closes this document sequence, releasing all resources.DocumentCollection
copy()
Document
document(long index)
Returns the document given its index.DocumentFactory
factory()
Returns the factory used by this sequence.void
filename(CharSequence filename)
Does nothing.static void
main(String[] arg)
Reference2ObjectMap<Enum<?>,Object>
metadata(long index)
Returns the metadata map for a document.long
size()
Returns the number of documents in this collection.InputStream
stream(long index)
Returns an input stream for the raw content of a document.-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentCollection
ensureDocumentIndex, iterator, printAllDocuments, toString
-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentSequence
finalize, load
-
-
-
-
Constructor Detail
-
ConcatenatedDocumentCollection
protected ConcatenatedDocumentCollection(String[] collectionName, DocumentCollection[] collection)
Creates a new concatenated document collection using giving component collections.- Parameters:
collection
- a list of component collections.
-
ConcatenatedDocumentCollection
public ConcatenatedDocumentCollection(String... collectionName) throws IllegalArgumentException, SecurityException
Creates a new, partially uninitialised concatenated document collection using giving component collections names.- Parameters:
collectionName
- a list of names of component collections.- Throws:
IllegalArgumentException
SecurityException
-
-
Method Detail
-
filename
public void filename(CharSequence filename)
Description copied from class:AbstractDocumentSequence
Does nothing.- Specified by:
filename
in interfaceDocumentSequence
- Overrides:
filename
in classAbstractDocumentSequence
- Parameters:
filename
- the filename of this document sequence.
-
copy
public DocumentCollection copy()
- Specified by:
copy
in interfaceDocumentCollection
- Specified by:
copy
in interfaceFlyweightPrototype<DocumentCollection>
-
document
public Document document(long index) throws IOException
Description copied from interface:DocumentCollection
Returns the document given its index.- Specified by:
document
in interfaceDocumentCollection
- Parameters:
index
- an index between 0 (inclusive) andDocumentCollection.size()
(exclusive).- Returns:
- the
index
-th document. - Throws:
IOException
-
metadata
public Reference2ObjectMap<Enum<?>,Object> metadata(long index) throws IOException
Description copied from interface:DocumentCollection
Returns the metadata map for a document.- Specified by:
metadata
in interfaceDocumentCollection
- Parameters:
index
- an index between 0 (inclusive) andDocumentCollection.size()
(exclusive).- Returns:
- the metadata map for the document.
- Throws:
IOException
-
size
public long size()
Description copied from interface:DocumentCollection
Returns the number of documents in this collection.- Specified by:
size
in interfaceDocumentCollection
- Returns:
- the number of documents in this collection.
-
stream
public InputStream stream(long index) throws IOException
Description copied from interface:DocumentCollection
Returns an input stream for the raw content of a document.- Specified by:
stream
in interfaceDocumentCollection
- Parameters:
index
- an index between 0 (inclusive) andDocumentCollection.size()
(exclusive).- Returns:
- the raw content of the document as an input stream.
- Throws:
IOException
-
factory
public DocumentFactory factory()
Description copied from interface:DocumentSequence
Returns the factory used by this sequence.Every document sequence is based on a document factory that transforms raw bytes into a sequence of characters. The factory contains useful information such as the number of fields.
- Specified by:
factory
in interfaceDocumentSequence
- Returns:
- the factory used by this sequence.
-
close
public void close() throws IOException
Description copied from interface:DocumentSequence
Closes this document sequence, releasing all resources.You should always call this method after having finished with this document sequence. Implementations are invited to call this method in a finaliser as a safety net (even better, implement
SafelyCloseable
), but since there is no guarantee as to when finalisers are invoked, you should not depend on this behaviour.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in interfaceDocumentSequence
- Overrides:
close
in classAbstractDocumentSequence
- Throws:
IOException
-
main
public static void main(String[] arg) throws IOException, com.martiansoftware.jsap.JSAPException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
- Throws:
IOException
com.martiansoftware.jsap.JSAPException
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
-
-