Class PropertyBasedDocumentFactory
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.AbstractDocumentFactory
-
- it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
-
- All Implemented Interfaces:
DocumentFactory
,FlyweightPrototype<DocumentFactory>
,Serializable
- Direct Known Subclasses:
AbstractTikaDocumentFactory
,DispatchingDocumentFactory
,HtmlDocumentFactory
,IdentityDocumentFactory
public abstract class PropertyBasedDocumentFactory extends AbstractDocumentFactory
A document factory initialised by default properties.Many document factories need a number of default values that are used when the metadata passed to
DocumentFactory.getDocument(java.io.InputStream,Reference2ObjectMap)
is not sufficient or lacks some key. This abstract class provides a common base for all such factories.All concrete implementations of this class should have:
- an empty constructor;
- a constructor taking a
Reference2ObjectMap
havingEnum
keys; - a constructor taking a
Properties
object; - a constructor taking a string array.
In the third case, the properties will be parsed by the
parseProperties(Properties)
method. In the fourth case, by theparseProperties(String[])
method.Since all implementations are expected to provide such constructors, corresponding static factory methods have been provided to simplify factory instantiation.
If the implementation needs to read and parse some key, it must override the
parseProperty(String, String[], Reference2ObjectMap)
method.Keys are specified with a dotted notation. The last dot-separated token is the actual key. The prefix is used to select properties: only properties with a prefix that is a prefix of the current class name are considered. Moreover, if a property with a completely specified prefix (i.e., a prefix that is a class name) is not parsed an exception will be thrown.
This class provide helpers methods
resolve(Enum, Reference2ObjectMap)
andresolveNotNull(Enum, Reference2ObjectMap)
to help in writing implementations ofDocumentFactory.getDocument(java.io.InputStream,Reference2ObjectMap)
that handle default metadata correctly.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
PropertyBasedDocumentFactory.MetadataKeys
Case-insensitive keys for metadata passed toDocumentFactory.getDocument(java.io.InputStream,it.unimi.dsi.fastutil.objects.Reference2ObjectMap)
.-
Nested classes/interfaces inherited from interface it.unimi.di.big.mg4j.document.DocumentFactory
DocumentFactory.FieldType
-
-
Field Summary
Fields Modifier and Type Field Description protected Reference2ObjectMap<Enum<?>,Object>
defaultMetadata
The set of default metadata for this factory.
-
Constructor Summary
Constructors Modifier Constructor Description protected
PropertyBasedDocumentFactory()
protected
PropertyBasedDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
protected
PropertyBasedDocumentFactory(Properties properties)
protected
PropertyBasedDocumentFactory(String[] property)
-
Method Summary
Modifier and Type Method Description protected static String
ensureJustOne(String key, String[] values)
This method checks that the array of values contains just one element, and returns the element.static PropertyBasedDocumentFactory
getInstance(Class<?> klass)
static PropertyBasedDocumentFactory
getInstance(Class<?> klass, Reference2ObjectMap<Enum<?>,Object> metadata)
static PropertyBasedDocumentFactory
getInstance(Class<?> klass, Properties properties)
static PropertyBasedDocumentFactory
getInstance(Class<?> klass, String[] property)
Reference2ObjectMap<Enum<?>,Object>
parseProperties(Properties properties)
Scans the property set, parsing the properties that concern this class.Reference2ObjectMap<Enum<?>,Object>
parseProperties(String[] property)
Parses the given list of properties either as key=value specs (value may be a list of comma-separated values), or as filenames.protected boolean
parseProperty(String key, String[] valuesUnused, Reference2ObjectMap<Enum<?>,Object> metadataUnused)
Parses a property with given key and value, adding it to the given map.protected Object
resolve(Enum<?> key, Reference2ObjectMap<Enum<?>,Object> metadata)
Resolves the given key against the given metadata, falling back to the default metadata.protected Object
resolve(Enum<?> key, Reference2ObjectMap<Enum<?>,Object> metadata, Object o)
Resolves the given key against the given metadata, falling back to the provided object.protected Object
resolveNotNull(Enum<?> key, Reference2ObjectMap<Enum<?>,Object> metadata)
Resolves the given key against the given metadata, falling back to the default metadata and guaranteeing a non-null
result.static boolean
sameKey(Enum<?> enumKey, String key)
A utility method checking whether the downcased name of anEnum
is equal to a given string.String
toString()
-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentFactory
ensureFieldIndex
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface it.unimi.di.big.mg4j.document.DocumentFactory
copy, fieldIndex, fieldName, fieldType, getDocument, numberOfFields
-
-
-
-
Field Detail
-
defaultMetadata
protected Reference2ObjectMap<Enum<?>,Object> defaultMetadata
The set of default metadata for this factory. It is initalised byparseProperties(Properties)
.
-
-
Constructor Detail
-
PropertyBasedDocumentFactory
protected PropertyBasedDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
-
PropertyBasedDocumentFactory
protected PropertyBasedDocumentFactory(Properties properties) throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
-
PropertyBasedDocumentFactory
protected PropertyBasedDocumentFactory(String[] property) throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
-
PropertyBasedDocumentFactory
protected PropertyBasedDocumentFactory()
-
-
Method Detail
-
getInstance
public static PropertyBasedDocumentFactory getInstance(Class<?> klass, String[] property) throws InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
-
getInstance
public static PropertyBasedDocumentFactory getInstance(Class<?> klass, Properties properties) throws InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
-
getInstance
public static PropertyBasedDocumentFactory getInstance(Class<?> klass) throws InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
-
getInstance
public static PropertyBasedDocumentFactory getInstance(Class<?> klass, Reference2ObjectMap<Enum<?>,Object> metadata) throws InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException
-
sameKey
public static boolean sameKey(Enum<?> enumKey, String key)
A utility method checking whether the downcased name of anEnum
is equal to a given string.This class uses an
Enum
(PropertyBasedDocumentFactory.MetadataKeys
) to store valid property keys. We follow both the uppercase naming convention for enums and the lowercase naming convention for properties, and this method encapsulates the method calls that necessary to correctly handle key parsing.
-
parseProperty
protected boolean parseProperty(String key, String[] valuesUnused, Reference2ObjectMap<Enum<?>,Object> metadataUnused) throws org.apache.commons.configuration.ConfigurationException
Parses a property with given key and value, adding it to the given map.Currently this implementation just parses the
PropertyBasedDocumentFactory.MetadataKeys.LOCALE
property.Subclasses should do their own parsing, returing true in case of success and returning
super.parseProperty()
otherwise.- Parameters:
key
- the property key.valuesUnused
- the property value; this is an array, because properties may have a list of comma-separated values.metadataUnused
- the metadata map.- Returns:
- true if the property was parsed correctly, false if it was ignored.
- Throws:
org.apache.commons.configuration.ConfigurationException
-
ensureJustOne
protected static String ensureJustOne(String key, String[] values) throws org.apache.commons.configuration.ConfigurationException
This method checks that the array of values contains just one element, and returns the element.- Parameters:
key
- the property name (used to build the exception message).values
- the array of values.- Returns:
- the only value (if the array contains exactly one element).
- Throws:
org.apache.commons.configuration.ConfigurationException
- iff values does not contain a single element.
-
parseProperties
public Reference2ObjectMap<Enum<?>,Object> parseProperties(Properties properties) throws org.apache.commons.configuration.ConfigurationException
Scans the property set, parsing the properties that concern this class.- Parameters:
properties
- a set of properties.- Returns:
- a metadata map.
- Throws:
org.apache.commons.configuration.ConfigurationException
-
parseProperties
public Reference2ObjectMap<Enum<?>,Object> parseProperties(String[] property) throws org.apache.commons.configuration.ConfigurationException
Parses the given list of properties either as key=value specs (value may be a list of comma-separated values), or as filenames.- Parameters:
property
- an array of strings specifying properties.- Returns:
- a metadata map.
- Throws:
org.apache.commons.configuration.ConfigurationException
-
resolve
protected Object resolve(Enum<?> key, Reference2ObjectMap<Enum<?>,Object> metadata)
Resolves the given key against the given metadata, falling back to the default metadata.- Parameters:
key
- a key.metadata
- a metadata map.- Returns:
- the value returned by
metadata
forkey
, or the value returned bydefaultMetadata
forkey
if the former isnull
(the latter, of course, might benull
).
-
resolve
protected Object resolve(Enum<?> key, Reference2ObjectMap<Enum<?>,Object> metadata, Object o)
Resolves the given key against the given metadata, falling back to the provided object.- Parameters:
key
- a key.metadata
- a metadata map.o
- a default object.- Returns:
- the value returned by
metadata
forkey
, oro
if the former isnull
.
-
resolveNotNull
protected Object resolveNotNull(Enum<?> key, Reference2ObjectMap<Enum<?>,Object> metadata)
Resolves the given key against the given metadata, falling back to the default metadata and guaranteeing a non-null
result.- Parameters:
key
- a key.metadata
- a metadata map.- Returns:
- the value returned by
metadata
forkey
, or the value returned bydefaultMetadata
forkey
if the former isnull
; if the latter isnull
, too, aNoSuchElementException
will be thrown.
-
toString
public String toString()
- Overrides:
toString
in classAbstractDocumentFactory
-
-