it.unimi.di.mg4j.document.tika
Class AbstractTikaDocumentFactory

java.lang.Object
  extended by it.unimi.di.mg4j.document.AbstractDocumentFactory
      extended by it.unimi.di.mg4j.document.PropertyBasedDocumentFactory
          extended by it.unimi.di.mg4j.document.tika.AbstractTikaDocumentFactory
All Implemented Interfaces:
DocumentFactory, FlyweightPrototype<DocumentFactory>, Serializable
Direct Known Subclasses:
AbstractSimpleTikaDocumentFactory

public abstract class AbstractTikaDocumentFactory
extends PropertyBasedDocumentFactory

An abstract document factory that provides the mapping from field names to field indices.

Concrete subclasses must implement the method fields(), providing the list of Tika fields.

Author:
Salvatore Insalaco
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class it.unimi.di.mg4j.document.PropertyBasedDocumentFactory
PropertyBasedDocumentFactory.MetadataKeys
 
Nested classes/interfaces inherited from interface it.unimi.di.mg4j.document.DocumentFactory
DocumentFactory.FieldType
 
Field Summary
 
Fields inherited from class it.unimi.di.mg4j.document.PropertyBasedDocumentFactory
defaultMetadata
 
Constructor Summary
AbstractTikaDocumentFactory()
           
AbstractTikaDocumentFactory(Properties properties)
           
AbstractTikaDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
           
AbstractTikaDocumentFactory(String[] property)
           
 
Method Summary
 int fieldIndex(String fieldName)
          Returns the index of a field, given its symbolic name.
 String fieldName(int field)
          Returns the symbolic name of a field.
protected abstract  List<TikaField> fields()
          Returns the list of Tika fields (they will be mapped to MG4J fields whose index is their index in the list).
 DocumentFactory.FieldType fieldType(int field)
          Returns the type of a field.
 int numberOfFields()
          Returns the number of fields present in the documents produced by this factory.
 
Methods inherited from class it.unimi.di.mg4j.document.PropertyBasedDocumentFactory
ensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, parseProperty, resolve, resolve, resolveNotNull, sameKey
 
Methods inherited from class it.unimi.di.mg4j.document.AbstractDocumentFactory
ensureFieldIndex, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface it.unimi.di.mg4j.document.DocumentFactory
copy, getDocument
 

Constructor Detail

AbstractTikaDocumentFactory

public AbstractTikaDocumentFactory(Properties properties)
                            throws ConfigurationException
Throws:
ConfigurationException

AbstractTikaDocumentFactory

public AbstractTikaDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)

AbstractTikaDocumentFactory

public AbstractTikaDocumentFactory(String[] property)
                            throws ConfigurationException
Throws:
ConfigurationException

AbstractTikaDocumentFactory

public AbstractTikaDocumentFactory()
Method Detail

numberOfFields

public int numberOfFields()
Description copied from interface: DocumentFactory
Returns the number of fields present in the documents produced by this factory.

Returns:
the number of fields present in the documents produced by this factory.

fieldName

public String fieldName(int field)
Description copied from interface: DocumentFactory
Returns the symbolic name of a field.

Parameters:
field - the index of a field (between 0 inclusive and DocumentFactory.numberOfFields() exclusive}).
Returns:
the symbolic name of the field-th field.

fieldIndex

public int fieldIndex(String fieldName)
Description copied from interface: DocumentFactory
Returns the index of a field, given its symbolic name.

Parameters:
fieldName - the name of a field of this factory.
Returns:
the corresponding index, or -1 if there is no field with name fieldName.

fieldType

public DocumentFactory.FieldType fieldType(int field)
Description copied from interface: DocumentFactory
Returns the type of a field.

The possible types are defined in DocumentFactory.FieldType.

Parameters:
field - the index of a field (between 0 inclusive and DocumentFactory.numberOfFields() exclusive}).
Returns:
the type of the field-th field.

fields

protected abstract List<TikaField> fields()
Returns the list of Tika fields (they will be mapped to MG4J fields whose index is their index in the list).

Returns:
the list of Tika fields.