Class CompositeDocumentFactory
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.AbstractDocumentFactory
-
- it.unimi.di.big.mg4j.document.CompositeDocumentFactory
-
- All Implemented Interfaces:
DocumentFactory
,FlyweightPrototype<DocumentFactory>
,Serializable
public class CompositeDocumentFactory extends AbstractDocumentFactory
A composite factory that passes the input stream to a sequence of factories in turn.Factories can be composed. A composite factory will pass in turn the input stream given to
getDocument(InputStream, Reference2ObjectMap)
to the underlying factories, after callingInputStream.reset()
. Document sequences using composite factories must pass togetDocument(InputStream, Reference2ObjectMap)
aMultipleInputStream
that can be reset enough times.Note that in general composite factories support only sequential access to field content (albeit skipping items is allowed).
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected class
CompositeDocumentFactory.CompositeDocument
A document obtained by composition of documents of underyling factories.-
Nested classes/interfaces inherited from interface it.unimi.di.big.mg4j.document.DocumentFactory
DocumentFactory.FieldType
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
CompositeDocumentFactory(DocumentFactory[] documentFactory, String[] fieldName)
Creates a new composite document factory using the factories in a given array.
-
Method Summary
Modifier and Type Method Description CompositeDocumentFactory
copy()
int
fieldIndex(String fieldName)
Returns the index of a field, given its symbolic name.String
fieldName(int field)
Returns the symbolic name of a field.DocumentFactory.FieldType
fieldType(int field)
Returns the type of a field.Document
getDocument(InputStream rawContent, Reference2ObjectMap<Enum<?>,Object> metadata)
Returns the document obtained by parsing the given byte stream.static DocumentFactory
getFactory(DocumentFactory... documentFactory)
Returns a document factory composing the given document factories.static DocumentFactory
getFactory(DocumentFactory[] documentFactory, String[] fieldName)
Returns a document factory composing the given document factories.int
numberOfFields()
Returns the number of fields present in the documents produced by this factory.-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentFactory
ensureFieldIndex, toString
-
-
-
-
Constructor Detail
-
CompositeDocumentFactory
protected CompositeDocumentFactory(DocumentFactory[] documentFactory, String[] fieldName)
Creates a new composite document factory using the factories in a given array.- Parameters:
documentFactory
- an array of document factories that will composed.fieldName
- an array of names for the resulting field, ornull
.
-
-
Method Detail
-
copy
public CompositeDocumentFactory copy()
-
getFactory
public static DocumentFactory getFactory(DocumentFactory[] documentFactory, String[] fieldName)
Returns a document factory composing the given document factories.By passing an optional array of field names, it is possible to rename the fields of the composing factories.
- Parameters:
documentFactory
- an array of document factories that will composed.fieldName
- an array of names for the resulting field, ornull
.- Returns:
- a composed document factory (the first element of the argument, for arguments of length 1).
-
getFactory
public static DocumentFactory getFactory(DocumentFactory... documentFactory)
Returns a document factory composing the given document factories.- Parameters:
documentFactory
- document factories that will composed.- Returns:
- a composed document factory (the first element of the argument, for arguments of length 1).
-
numberOfFields
public int numberOfFields()
Description copied from interface:DocumentFactory
Returns the number of fields present in the documents produced by this factory.- Returns:
- the number of fields present in the documents produced by this factory.
-
fieldName
public String fieldName(int field)
Description copied from interface:DocumentFactory
Returns the symbolic name of a field.- Parameters:
field
- the index of a field (between 0 inclusive andDocumentFactory.numberOfFields()
exclusive}).- Returns:
- the symbolic name of the
field
-th field.
-
fieldIndex
public int fieldIndex(String fieldName)
Description copied from interface:DocumentFactory
Returns the index of a field, given its symbolic name.- Parameters:
fieldName
- the name of a field of this factory.- Returns:
- the corresponding index, or -1 if there is no field with name
fieldName
.
-
fieldType
public DocumentFactory.FieldType fieldType(int field)
Description copied from interface:DocumentFactory
Returns the type of a field.The possible types are defined in
DocumentFactory.FieldType
.- Parameters:
field
- the index of a field (between 0 inclusive andDocumentFactory.numberOfFields()
exclusive}).- Returns:
- the type of the
field
-th field.
-
getDocument
public Document getDocument(InputStream rawContent, Reference2ObjectMap<Enum<?>,Object> metadata) throws IOException
Description copied from interface:DocumentFactory
Returns the document obtained by parsing the given byte stream.The parameter
metadata
actually replaces the lack of a simple keyword-based parameter-passing system in Java. This method might take several different type of “suggestions” which have been collected by the collection: typically, the document title, a URI representing the document, its MIME type, its encoding and so on. Some of this information might be set by default (as it happens, for instance, in aPropertyBasedDocumentFactory
). Implementations of this method must consult the metadata provided by the collection, possibly complete them with default factory metadata, and proceed to the document construction.- Parameters:
rawContent
- the raw content from which the document should be extracted; it must not be closed, as resource management is a responsibility of the DocumentCollection.metadata
- a map from enums (e.g., keys taken inPropertyBasedDocumentFactory
) to various kind of objects.- Returns:
- the document obtained by parsing the given character sequence.
- Throws:
IOException
-
-