it.unimi.di.mg4j.document.tika
Class AbstractTikaDocumentFactory
java.lang.Object
it.unimi.di.mg4j.document.AbstractDocumentFactory
it.unimi.di.mg4j.document.PropertyBasedDocumentFactory
it.unimi.di.mg4j.document.tika.AbstractTikaDocumentFactory
- All Implemented Interfaces:
- DocumentFactory, FlyweightPrototype<DocumentFactory>, Serializable
- Direct Known Subclasses:
- AbstractSimpleTikaDocumentFactory
public abstract class AbstractTikaDocumentFactory
- extends PropertyBasedDocumentFactory
An abstract document factory that provides the mapping from field names to field indices.
Concrete subclasses must implement the method fields()
, providing the list of Tika fields.
- Author:
- Salvatore Insalaco
- See Also:
- Serialized Form
Method Summary |
int |
fieldIndex(String fieldName)
Returns the index of a field, given its symbolic name. |
String |
fieldName(int field)
Returns the symbolic name of a field. |
protected abstract List<TikaField> |
fields()
Returns the list of Tika fields (they will be mapped to MG4J fields whose index is their index in the list). |
DocumentFactory.FieldType |
fieldType(int field)
Returns the type of a field. |
int |
numberOfFields()
Returns the number of fields present in the documents produced by this factory. |
Methods inherited from class it.unimi.di.mg4j.document.PropertyBasedDocumentFactory |
ensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, parseProperty, resolve, resolve, resolveNotNull, sameKey |
AbstractTikaDocumentFactory
public AbstractTikaDocumentFactory(Properties properties)
throws ConfigurationException
- Throws:
ConfigurationException
AbstractTikaDocumentFactory
public AbstractTikaDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
AbstractTikaDocumentFactory
public AbstractTikaDocumentFactory(String[] property)
throws ConfigurationException
- Throws:
ConfigurationException
AbstractTikaDocumentFactory
public AbstractTikaDocumentFactory()
numberOfFields
public int numberOfFields()
- Description copied from interface:
DocumentFactory
- Returns the number of fields present in the documents produced by this factory.
- Returns:
- the number of fields present in the documents produced by this factory.
fieldName
public String fieldName(int field)
- Description copied from interface:
DocumentFactory
- Returns the symbolic name of a field.
- Parameters:
field
- the index of a field (between 0 inclusive and DocumentFactory.numberOfFields()
exclusive}).
- Returns:
- the symbolic name of the
field
-th field.
fieldIndex
public int fieldIndex(String fieldName)
- Description copied from interface:
DocumentFactory
- Returns the index of a field, given its symbolic name.
- Parameters:
fieldName
- the name of a field of this factory.
- Returns:
- the corresponding index, or -1 if there is no field with name
fieldName
.
fieldType
public DocumentFactory.FieldType fieldType(int field)
- Description copied from interface:
DocumentFactory
- Returns the type of a field.
The possible types are defined in DocumentFactory.FieldType
.
- Parameters:
field
- the index of a field (between 0 inclusive and DocumentFactory.numberOfFields()
exclusive}).
- Returns:
- the type of the
field
-th field.
fields
protected abstract List<TikaField> fields()
- Returns the list of Tika fields (they will be mapped to MG4J fields whose index is their index in the list).
- Returns:
- the list of Tika fields.