it.unimi.di.mg4j.document
Class PropertyBasedDocumentFactory

java.lang.Object
  extended by it.unimi.di.mg4j.document.AbstractDocumentFactory
      extended by it.unimi.di.mg4j.document.PropertyBasedDocumentFactory
All Implemented Interfaces:
DocumentFactory, FlyweightPrototype<DocumentFactory>, Serializable
Direct Known Subclasses:
AbstractTikaDocumentFactory, DispatchingDocumentFactory, HtmlDocumentFactory, IdentityDocumentFactory

public abstract class PropertyBasedDocumentFactory
extends AbstractDocumentFactory

A document factory initialised by default properties.

Many document factories need a number of default values that are used when the metadata passed to DocumentFactory.getDocument(java.io.InputStream,Reference2ObjectMap) is not sufficient or lacks some key. This abstract class provides a common base for all such factories.

All concrete implementations of this class should have:

  1. an empty constructor;
  2. a constructor taking a Reference2ObjectMap having Enum keys;
  3. a constructor taking a Properties object;
  4. a constructor taking a string array.

In the third case, the properties will be parsed by the parseProperties(Properties) method. In the fourth case, by the parseProperties(String[]) method.

Since all implementations are expected to provide such constructors, corresponding static factory methods have been provided to simplify factory instantiation.

If the implementation needs to read and parse some key, it must override the parseProperty(String, String[], Reference2ObjectMap) method.

Keys are specified with a dotted notation. The last dot-separated token is the actual key. The prefix is used to select properties: only properties with a prefix that is a prefix of the current class name are considered. Moreover, if a property with a completely specified prefix (i.e., a prefix that is a class name) is not parsed an exception will be thrown.

This class provide helpers methods resolve(Enum, Reference2ObjectMap) and resolveNotNull(Enum, Reference2ObjectMap) to help in writing implementations of DocumentFactory.getDocument(java.io.InputStream,Reference2ObjectMap) that handle default metadata correctly.

See Also:
Serialized Form

Nested Class Summary
static class PropertyBasedDocumentFactory.MetadataKeys
          Case-insensitive keys for metadata passed to DocumentFactory.getDocument(java.io.InputStream,it.unimi.dsi.fastutil.objects.Reference2ObjectMap).
 
Nested classes/interfaces inherited from interface it.unimi.di.mg4j.document.DocumentFactory
DocumentFactory.FieldType
 
Field Summary
protected  Reference2ObjectMap<Enum<?>,Object> defaultMetadata
          The set of default metadata for this factory.
 
Constructor Summary
protected PropertyBasedDocumentFactory()
           
protected PropertyBasedDocumentFactory(Properties properties)
           
protected PropertyBasedDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
           
protected PropertyBasedDocumentFactory(String[] property)
           
 
Method Summary
protected static String ensureJustOne(String key, String[] values)
          This method checks that the array of values contains just one element, and returns the element.
static PropertyBasedDocumentFactory getInstance(Class<?> klass)
           
static PropertyBasedDocumentFactory getInstance(Class<?> klass, Properties properties)
           
static PropertyBasedDocumentFactory getInstance(Class<?> klass, Reference2ObjectMap<Enum<?>,Object> metadata)
           
static PropertyBasedDocumentFactory getInstance(Class<?> klass, String[] property)
           
 Reference2ObjectMap<Enum<?>,Object> parseProperties(Properties properties)
          Scans the property set, parsing the properties that concern this class.
 Reference2ObjectMap<Enum<?>,Object> parseProperties(String[] property)
          Parses the given list of properties either as key=value specs (value may be a list of comma-separated values), or as filenames.
protected  boolean parseProperty(String key, String[] valuesUnused, Reference2ObjectMap<Enum<?>,Object> metadataUnused)
          Parses a property with given key and value, adding it to the given map.
protected  Object resolve(Enum<?> key, Reference2ObjectMap<Enum<?>,Object> metadata)
          Resolves the given key against the given metadata, falling back to the default metadata.
protected  Object resolve(Enum<?> key, Reference2ObjectMap<Enum<?>,Object> metadata, Object o)
          Resolves the given key against the given metadata, falling back to the provided object.
protected  Object resolveNotNull(Enum<?> key, Reference2ObjectMap<Enum<?>,Object> metadata)
          Resolves the given key against the given metadata, falling back to the default metadata and guaranteeing a non-null result.
static boolean sameKey(Enum<?> enumKey, String key)
          A utility method checking whether the downcased name of an Enum is equal to a given string.
 
Methods inherited from class it.unimi.di.mg4j.document.AbstractDocumentFactory
ensureFieldIndex, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface it.unimi.di.mg4j.document.DocumentFactory
copy, fieldIndex, fieldName, fieldType, getDocument, numberOfFields
 

Field Detail

defaultMetadata

protected Reference2ObjectMap<Enum<?>,Object> defaultMetadata
The set of default metadata for this factory. It is initalised by parseProperties(Properties).

Constructor Detail

PropertyBasedDocumentFactory

protected PropertyBasedDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)

PropertyBasedDocumentFactory

protected PropertyBasedDocumentFactory(Properties properties)
                                throws ConfigurationException
Throws:
ConfigurationException

PropertyBasedDocumentFactory

protected PropertyBasedDocumentFactory(String[] property)
                                throws ConfigurationException
Throws:
ConfigurationException

PropertyBasedDocumentFactory

protected PropertyBasedDocumentFactory()
Method Detail

getInstance

public static PropertyBasedDocumentFactory getInstance(Class<?> klass,
                                                       String[] property)
                                                throws InstantiationException,
                                                       IllegalAccessException,
                                                       InvocationTargetException,
                                                       NoSuchMethodException
Throws:
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException

getInstance

public static PropertyBasedDocumentFactory getInstance(Class<?> klass,
                                                       Properties properties)
                                                throws InstantiationException,
                                                       IllegalAccessException,
                                                       InvocationTargetException,
                                                       NoSuchMethodException
Throws:
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException

getInstance

public static PropertyBasedDocumentFactory getInstance(Class<?> klass)
                                                throws InstantiationException,
                                                       IllegalAccessException,
                                                       InvocationTargetException,
                                                       NoSuchMethodException
Throws:
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException

getInstance

public static PropertyBasedDocumentFactory getInstance(Class<?> klass,
                                                       Reference2ObjectMap<Enum<?>,Object> metadata)
                                                throws InstantiationException,
                                                       IllegalAccessException,
                                                       InvocationTargetException,
                                                       NoSuchMethodException
Throws:
InstantiationException
IllegalAccessException
InvocationTargetException
NoSuchMethodException

sameKey

public static boolean sameKey(Enum<?> enumKey,
                              String key)
A utility method checking whether the downcased name of an Enum is equal to a given string.

This class uses an Enum (PropertyBasedDocumentFactory.MetadataKeys) to store valid property keys. We follow both the uppercase naming convention for enums and the lowercase naming convention for properties, and this method encapsulates the method calls that necessary to correctly handle key parsing.

Parameters:
enumKey - a key expressed as an Enum.
key - a key expressed as a string.
Returns:
true if key is equal to the downcased name of enumKey.

parseProperty

protected boolean parseProperty(String key,
                                String[] valuesUnused,
                                Reference2ObjectMap<Enum<?>,Object> metadataUnused)
                         throws ConfigurationException
Parses a property with given key and value, adding it to the given map.

Currently this implementation just parses the PropertyBasedDocumentFactory.MetadataKeys.LOCALE property.

Subclasses should do their own parsing, returing true in case of success and returning super.parseProperty() otherwise.

Parameters:
key - the property key.
valuesUnused - the property value; this is an array, because properties may have a list of comma-separated values.
metadataUnused - the metadata map.
Returns:
true if the property was parsed correctly, false if it was ignored.
Throws:
ConfigurationException

ensureJustOne

protected static String ensureJustOne(String key,
                                      String[] values)
                               throws ConfigurationException
This method checks that the array of values contains just one element, and returns the element.

Parameters:
key - the property name (used to build the exception message).
values - the array of values.
Returns:
the only value (if the array contains exactly one element).
Throws:
ConfigurationException - iff values does not contain a single element.

parseProperties

public Reference2ObjectMap<Enum<?>,Object> parseProperties(Properties properties)
                                                    throws ConfigurationException
Scans the property set, parsing the properties that concern this class.

Parameters:
properties - a set of properties.
Returns:
a metadata map.
Throws:
ConfigurationException

parseProperties

public Reference2ObjectMap<Enum<?>,Object> parseProperties(String[] property)
                                                    throws ConfigurationException
Parses the given list of properties either as key=value specs (value may be a list of comma-separated values), or as filenames.

Parameters:
property - an array of strings specifying properties.
Returns:
a metadata map.
Throws:
ConfigurationException

resolve

protected Object resolve(Enum<?> key,
                         Reference2ObjectMap<Enum<?>,Object> metadata)
Resolves the given key against the given metadata, falling back to the default metadata.

Parameters:
key - a key.
metadata - a metadata map.
Returns:
the value returned by metadata for key, or the value returned by defaultMetadata for key if the former is null (the latter, of course, might be null).

resolve

protected Object resolve(Enum<?> key,
                         Reference2ObjectMap<Enum<?>,Object> metadata,
                         Object o)
Resolves the given key against the given metadata, falling back to the provided object.

Parameters:
key - a key.
metadata - a metadata map.
o - a default object.
Returns:
the value returned by metadata for key, or o if the former is null.

resolveNotNull

protected Object resolveNotNull(Enum<?> key,
                                Reference2ObjectMap<Enum<?>,Object> metadata)
Resolves the given key against the given metadata, falling back to the default metadata and guaranteeing a non-null result.

Parameters:
key - a key.
metadata - a metadata map.
Returns:
the value returned by metadata for key, or the value returned by defaultMetadata for key if the former is null; if the latter is null, too, a NoSuchElementException will be thrown.