Class DispatchingDocumentFactory
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.AbstractDocumentFactory
-
- it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
-
- it.unimi.di.big.mg4j.document.DispatchingDocumentFactory
-
- All Implemented Interfaces:
DocumentFactory
,FlyweightPrototype<DocumentFactory>
,Serializable
public class DispatchingDocumentFactory extends PropertyBasedDocumentFactory
A document factory that actually dispatches the task of building documents to various factories according to some strategy.The strategy is specified as (an object embedding) a method that determines which factory should be used on the basis of the metadata that are provided to the
getDocument(InputStream, Reference2ObjectMap)
method. Since usually the strategy will have to resolve the name of metadata, it is also passed this factory, so that the correctPropertyBasedDocumentFactory.resolve(Enum,Reference2ObjectMap)
method can be invoked.Moreover, at construction one must specify, for each subfactory and for each field of this factory, which field of the subfactory should be used. Note that to guarantee sequential access, fields specified for each subfactory should appear in increasing order.
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
DispatchingDocumentFactory.DispatchingStrategy
A strategy that decides which factory is appropriate using the document metadata.static class
DispatchingDocumentFactory.MetadataKeys
Case-insensitive keys for metadata.static class
DispatchingDocumentFactory.StringBasedDispatchingStrategy
A strategy that is based on trying to match the value of the metadata with a given key with respect to a certain set of values.-
Nested classes/interfaces inherited from interface it.unimi.di.big.mg4j.document.DocumentFactory
DocumentFactory.FieldType
-
-
Field Summary
Fields Modifier and Type Field Description static String
OTHERWISE_IN_RULE
The value to be used inRULE
to introduce the default factory.-
Fields inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
defaultMetadata
-
-
Constructor Summary
Constructors Constructor Description DispatchingDocumentFactory()
DispatchingDocumentFactory(DocumentFactory[] documentFactory, String[] fieldName, DocumentFactory.FieldType[] fieldType, int[][] rename, DispatchingDocumentFactory.DispatchingStrategy strategy)
Creates a new dispatching factory.DispatchingDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
DispatchingDocumentFactory(Properties properties)
DispatchingDocumentFactory(String[] property)
-
Method Summary
Modifier and Type Method Description DispatchingDocumentFactory
copy()
int
fieldIndex(String fieldName)
Returns the index of a field, given its symbolic name.String
fieldName(int field)
Returns the symbolic name of a field.DocumentFactory.FieldType
fieldType(int field)
Returns the type of a field.Document
getDocument(InputStream rawContent, Reference2ObjectMap<Enum<?>,Object> metadata)
Returns the document obtained by parsing the given byte stream.static void
main(String[] arg)
int
numberOfFields()
Returns the number of fields present in the documents produced by this factory.protected boolean
parseProperty(String key, String[] values, Reference2ObjectMap<Enum<?>,Object> metadata)
Parses a property with given key and value, adding it to the given map.-
Methods inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
ensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, resolve, resolve, resolveNotNull, sameKey, toString
-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentFactory
ensureFieldIndex
-
-
-
-
Field Detail
-
OTHERWISE_IN_RULE
public static final String OTHERWISE_IN_RULE
The value to be used inRULE
to introduce the default factory. Otherwise, no default factory is provided for documents that do not match.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
DispatchingDocumentFactory
public DispatchingDocumentFactory(DocumentFactory[] documentFactory, String[] fieldName, DocumentFactory.FieldType[] fieldType, int[][] rename, DispatchingDocumentFactory.DispatchingStrategy strategy)
Creates a new dispatching factory.- Parameters:
documentFactory
- the array of subfactories.fieldName
- the names of this factory's fields.fieldType
- the types of this factory's fields.rename
- the way fields of this class are mapped to fields of the subfactories.strategy
- the strategy to decide which factory should be used.
-
DispatchingDocumentFactory
public DispatchingDocumentFactory(Properties properties) throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
-
DispatchingDocumentFactory
public DispatchingDocumentFactory(String[] property) throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
-
DispatchingDocumentFactory
public DispatchingDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
-
DispatchingDocumentFactory
public DispatchingDocumentFactory()
-
-
Method Detail
-
copy
public DispatchingDocumentFactory copy()
-
parseProperty
protected boolean parseProperty(String key, String[] values, Reference2ObjectMap<Enum<?>,Object> metadata) throws org.apache.commons.configuration.ConfigurationException
Description copied from class:PropertyBasedDocumentFactory
Parses a property with given key and value, adding it to the given map.Currently this implementation just parses the
PropertyBasedDocumentFactory.MetadataKeys.LOCALE
property.Subclasses should do their own parsing, returing true in case of success and returning
super.parseProperty()
otherwise.- Overrides:
parseProperty
in classPropertyBasedDocumentFactory
- Parameters:
key
- the property key.values
- the property value; this is an array, because properties may have a list of comma-separated values.metadata
- the metadata map.- Returns:
- true if the property was parsed correctly, false if it was ignored.
- Throws:
org.apache.commons.configuration.ConfigurationException
-
numberOfFields
public int numberOfFields()
Description copied from interface:DocumentFactory
Returns the number of fields present in the documents produced by this factory.- Returns:
- the number of fields present in the documents produced by this factory.
-
fieldName
public String fieldName(int field)
Description copied from interface:DocumentFactory
Returns the symbolic name of a field.- Parameters:
field
- the index of a field (between 0 inclusive andDocumentFactory.numberOfFields()
exclusive}).- Returns:
- the symbolic name of the
field
-th field.
-
fieldIndex
public int fieldIndex(String fieldName)
Description copied from interface:DocumentFactory
Returns the index of a field, given its symbolic name.- Parameters:
fieldName
- the name of a field of this factory.- Returns:
- the corresponding index, or -1 if there is no field with name
fieldName
.
-
fieldType
public DocumentFactory.FieldType fieldType(int field)
Description copied from interface:DocumentFactory
Returns the type of a field.The possible types are defined in
DocumentFactory.FieldType
.- Parameters:
field
- the index of a field (between 0 inclusive andDocumentFactory.numberOfFields()
exclusive}).- Returns:
- the type of the
field
-th field.
-
getDocument
public Document getDocument(InputStream rawContent, Reference2ObjectMap<Enum<?>,Object> metadata) throws IOException
Description copied from interface:DocumentFactory
Returns the document obtained by parsing the given byte stream.The parameter
metadata
actually replaces the lack of a simple keyword-based parameter-passing system in Java. This method might take several different type of “suggestions” which have been collected by the collection: typically, the document title, a URI representing the document, its MIME type, its encoding and so on. Some of this information might be set by default (as it happens, for instance, in aPropertyBasedDocumentFactory
). Implementations of this method must consult the metadata provided by the collection, possibly complete them with default factory metadata, and proceed to the document construction.- Parameters:
rawContent
- the raw content from which the document should be extracted; it must not be closed, as resource management is a responsibility of the DocumentCollection.metadata
- a map from enums (e.g., keys taken inPropertyBasedDocumentFactory
) to various kind of objects.- Returns:
- the document obtained by parsing the given character sequence.
- Throws:
IOException
-
main
public static void main(String[] arg) throws IOException, org.apache.commons.configuration.ConfigurationException
- Throws:
IOException
org.apache.commons.configuration.ConfigurationException
-
-