it.unimi.di.mg4j.document
Class ReplicatedDocumentFactory

java.lang.Object
  extended by it.unimi.di.mg4j.document.AbstractDocumentFactory
      extended by it.unimi.di.mg4j.document.ReplicatedDocumentFactory
All Implemented Interfaces:
DocumentFactory, FlyweightPrototype<DocumentFactory>, Serializable

public class ReplicatedDocumentFactory
extends AbstractDocumentFactory

A factory that replicates a given factory several times. A special case of a composite factory.

Note that in general replicated factories support only sequential access to field content (albeit skipping items is allowed).

See Also:
Serialized Form

Nested Class Summary
protected  class ReplicatedDocumentFactory.ReplicatedDocument
          A document obtained by replication of the underlying-factory document.
 
Nested classes/interfaces inherited from interface it.unimi.di.mg4j.document.DocumentFactory
DocumentFactory.FieldType
 
Field Summary
 DocumentFactory documentFactory
          The document factory that will be replicated.
 int numberOfCopies
          The number of copies.
 
Constructor Summary
protected ReplicatedDocumentFactory(DocumentFactory documentFactory, int numberOfCopies, String[] fieldName, Object2IntOpenHashMap<String> field2Index)
           
 
Method Summary
 ReplicatedDocumentFactory copy()
           
 int fieldIndex(String fieldName)
          Returns the index of a field, given its symbolic name.
 String fieldName(int field)
          Returns the symbolic name of a field.
 DocumentFactory.FieldType fieldType(int field)
          Returns the type of a field.
 Document getDocument(InputStream rawContent, Reference2ObjectMap<Enum<?>,Object> metadata)
          Returns the document obtained by parsing the given byte stream.
static DocumentFactory getFactory(DocumentFactory documentFactory, int numberOfCopies, String[] fieldName)
          Returns a document factory replicating the given factory.
 int numberOfFields()
          Returns the number of fields present in the documents produced by this factory.
 
Methods inherited from class it.unimi.di.mg4j.document.AbstractDocumentFactory
ensureFieldIndex, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

documentFactory

public final DocumentFactory documentFactory
The document factory that will be replicated.


numberOfCopies

public final int numberOfCopies
The number of copies.

Constructor Detail

ReplicatedDocumentFactory

protected ReplicatedDocumentFactory(DocumentFactory documentFactory,
                                    int numberOfCopies,
                                    String[] fieldName,
                                    Object2IntOpenHashMap<String> field2Index)
Method Detail

getFactory

public static DocumentFactory getFactory(DocumentFactory documentFactory,
                                         int numberOfCopies,
                                         String[] fieldName)
Returns a document factory replicating the given factory.

Parameters:
documentFactory - the factory that will be replicated.
numberOfCopies - the number of copies.
Returns:
a replicated document factory.

copy

public ReplicatedDocumentFactory copy()

numberOfFields

public int numberOfFields()
Description copied from interface: DocumentFactory
Returns the number of fields present in the documents produced by this factory.

Returns:
the number of fields present in the documents produced by this factory.

fieldName

public String fieldName(int field)
Description copied from interface: DocumentFactory
Returns the symbolic name of a field.

Parameters:
field - the index of a field (between 0 inclusive and DocumentFactory.numberOfFields() exclusive}).
Returns:
the symbolic name of the field-th field.

fieldIndex

public int fieldIndex(String fieldName)
Description copied from interface: DocumentFactory
Returns the index of a field, given its symbolic name.

Parameters:
fieldName - the name of a field of this factory.
Returns:
the corresponding index, or -1 if there is no field with name fieldName.

fieldType

public DocumentFactory.FieldType fieldType(int field)
Description copied from interface: DocumentFactory
Returns the type of a field.

The possible types are defined in DocumentFactory.FieldType.

Parameters:
field - the index of a field (between 0 inclusive and DocumentFactory.numberOfFields() exclusive}).
Returns:
the type of the field-th field.

getDocument

public Document getDocument(InputStream rawContent,
                            Reference2ObjectMap<Enum<?>,Object> metadata)
                     throws IOException
Description copied from interface: DocumentFactory
Returns the document obtained by parsing the given byte stream.

The parameter metadata actually replaces the lack of a simple keyword-based parameter-passing system in Java. This method might take several different type of “suggestions” which have been collected by the collection: typically, the document title, a URI representing the document, its MIME type, its encoding and so on. Some of this information might be set by default (as it happens, for instance, in a PropertyBasedDocumentFactory). Implementations of this method must consult the metadata provided by the collection, possibly complete them with default factory metadata, and proceed to the document construction.

Parameters:
rawContent - the raw content from which the document should be extracted; it must not be closed, as resource management is a responsibility of the DocumentCollection.
metadata - a map from enums (e.g., keys taken in PropertyBasedDocumentFactory) to various kind of objects.
Returns:
the document obtained by parsing the given character sequence.
Throws:
IOException