it.unimi.di.mg4j.document
Class SubsetDocumentSequence

java.lang.Object
  extended by it.unimi.di.mg4j.document.AbstractDocumentSequence
      extended by it.unimi.di.mg4j.document.SubsetDocumentSequence
All Implemented Interfaces:
DocumentSequence, SafelyCloseable, Closeable, Serializable

public class SubsetDocumentSequence
extends AbstractDocumentSequence
implements Serializable

A collection that exhibits a subset of documents (possibly not contiguous) from a given sequence.

This class provides several string-based constructors that use the ObjectParser conventions; they can be used to generate easily subcollections from the command line.

Author:
Paolo Boldi
See Also:
Serialized Form

Constructor Summary
SubsetDocumentSequence(DocumentSequence underlyingSequence, IntSet documents)
          Creates a new subsequence.
SubsetDocumentSequence(String underlyingSequenceBasename, String documentFileBasename)
          Creates a new subsequence.
 
Method Summary
 void close()
          Closes this document sequence, releasing all resources.
 DocumentFactory factory()
          Returns the factory used by this sequence.
 DocumentIterator iterator()
          Returns an iterator over the sequence of documents.
 
Methods inherited from class it.unimi.di.mg4j.document.AbstractDocumentSequence
filename, finalize, load
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SubsetDocumentSequence

public SubsetDocumentSequence(DocumentSequence underlyingSequence,
                              IntSet documents)
Creates a new subsequence.

Parameters:
underlyingSequence - the underlying document sequence.
documents - in the subsequence.

SubsetDocumentSequence

public SubsetDocumentSequence(String underlyingSequenceBasename,
                              String documentFileBasename)
                       throws NumberFormatException,
                              IllegalArgumentException,
                              SecurityException,
                              IOException,
                              ClassNotFoundException
Creates a new subsequence.

Parameters:
underlyingSequenceBasename - the basename of the underlying document sequence.
documentFileBasename - the basename of a file containing a serialized version of the set of document pointers to be retained.
Throws:
NumberFormatException
IllegalArgumentException
SecurityException
IOException
ClassNotFoundException
Method Detail

iterator

public DocumentIterator iterator()
                          throws IOException
Description copied from interface: DocumentSequence
Returns an iterator over the sequence of documents.

Warning: this method can be safely called just one time. For instance, implementations based on standard input will usually throw an exception if this method is called twice.

Implementations may decide to override this restriction (in particular, if they implement DocumentCollection). Usually, however, it is not possible to obtain two iterators at the same time on a collection.

Specified by:
iterator in interface DocumentSequence
Returns:
an iterator over the sequence of documents.
Throws:
IOException
See Also:
DocumentCollection

factory

public DocumentFactory factory()
Description copied from interface: DocumentSequence
Returns the factory used by this sequence.

Every document sequence is based on a document factory that transforms raw bytes into a sequence of characters. The factory contains useful information such as the number of fields.

Specified by:
factory in interface DocumentSequence
Returns:
the factory used by this sequence.

close

public void close()
           throws IOException
Description copied from interface: DocumentSequence
Closes this document sequence, releasing all resources.

You should always call this method after having finished with this document sequence. Implementations are invited to call this method in a finaliser as a safety net (even better, implement SafelyCloseable), but since there is no guarantee as to when finalisers are invoked, you should not depend on this behaviour.

Specified by:
close in interface DocumentSequence
Specified by:
close in interface Closeable
Overrides:
close in class AbstractDocumentSequence
Throws:
IOException