Class SubsetDocumentSequence
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.AbstractDocumentSequence
-
- it.unimi.di.big.mg4j.document.SubsetDocumentSequence
-
- All Implemented Interfaces:
DocumentSequence
,SafelyCloseable
,Closeable
,Serializable
,AutoCloseable
public class SubsetDocumentSequence extends AbstractDocumentSequence implements Serializable
A collection that exhibits a subset of documents (possibly not contiguous) from a given sequence.This class provides several string-based constructors that use the
ObjectParser
conventions; they can be used to generate easily subcollections from the command line.- Author:
- Paolo Boldi
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description SubsetDocumentSequence(DocumentSequence underlyingSequence, long first, long last)
Creates a new subsequence.SubsetDocumentSequence(DocumentSequence underlyingSequence, LongSet documents)
Creates a new subsequence.SubsetDocumentSequence(String underlyingSequenceFilename, String documentFileFilename)
Creates a new subsequence.SubsetDocumentSequence(String underlyingSequenceFilename, String first, String last)
Creates a new subsequence.
-
Method Summary
Modifier and Type Method Description void
close()
Closes this document sequence, releasing all resources.DocumentFactory
factory()
Returns the factory used by this sequence.DocumentIterator
iterator()
Returns an iterator over the sequence of documents.-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentSequence
filename, finalize, load
-
-
-
-
Constructor Detail
-
SubsetDocumentSequence
public SubsetDocumentSequence(DocumentSequence underlyingSequence, LongSet documents)
Creates a new subsequence.- Parameters:
underlyingSequence
- the underlying document sequence.documents
- in the subsequence.
-
SubsetDocumentSequence
public SubsetDocumentSequence(DocumentSequence underlyingSequence, long first, long last)
Creates a new subsequence.- Parameters:
underlyingSequence
- the underlying document sequence.first
- the first document (inclusive) in the subsequence.last
- the last document (exclusive) in this subsequence.
-
SubsetDocumentSequence
public SubsetDocumentSequence(String underlyingSequenceFilename, String documentFileFilename) throws NumberFormatException, IllegalArgumentException, SecurityException, IOException, ClassNotFoundException
Creates a new subsequence.- Parameters:
underlyingSequenceFilename
- the filename of the underlying document sequence.documentFileFilename
- the filename of a file containing a serialized version of the set of document pointers to be retained.- Throws:
NumberFormatException
IllegalArgumentException
SecurityException
IOException
ClassNotFoundException
-
SubsetDocumentSequence
public SubsetDocumentSequence(String underlyingSequenceFilename, String first, String last) throws NumberFormatException, IllegalArgumentException, SecurityException, IOException, ClassNotFoundException
Creates a new subsequence.- Parameters:
underlyingSequenceFilename
- the filename of the underlying document sequence.first
- the first document (inclusive) in the subsequence.last
- the last document (exclusive) in this subsequence.- Throws:
NumberFormatException
IllegalArgumentException
SecurityException
IOException
ClassNotFoundException
-
-
Method Detail
-
iterator
public DocumentIterator iterator() throws IOException
Description copied from interface:DocumentSequence
Returns an iterator over the sequence of documents.Warning: this method can be safely called just one time. For instance, implementations based on standard input will usually throw an exception if this method is called twice.
Implementations may decide to override this restriction (in particular, if they implement
DocumentCollection
). Usually, however, it is not possible to obtain two iterators at the same time on a collection.- Specified by:
iterator
in interfaceDocumentSequence
- Returns:
- an iterator over the sequence of documents.
- Throws:
IOException
- See Also:
DocumentCollection
-
factory
public DocumentFactory factory()
Description copied from interface:DocumentSequence
Returns the factory used by this sequence.Every document sequence is based on a document factory that transforms raw bytes into a sequence of characters. The factory contains useful information such as the number of fields.
- Specified by:
factory
in interfaceDocumentSequence
- Returns:
- the factory used by this sequence.
-
close
public void close() throws IOException
Description copied from interface:DocumentSequence
Closes this document sequence, releasing all resources.You should always call this method after having finished with this document sequence. Implementations are invited to call this method in a finaliser as a safety net (even better, implement
SafelyCloseable
), but since there is no guarantee as to when finalisers are invoked, you should not depend on this behaviour.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in interfaceDocumentSequence
- Overrides:
close
in classAbstractDocumentSequence
- Throws:
IOException
-
-