Class AbstractTikaDocumentFactory
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.AbstractDocumentFactory
-
- it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
-
- it.unimi.di.big.mg4j.document.tika.AbstractTikaDocumentFactory
-
- All Implemented Interfaces:
DocumentFactory
,FlyweightPrototype<DocumentFactory>
,Serializable
- Direct Known Subclasses:
AbstractSimpleTikaDocumentFactory
public abstract class AbstractTikaDocumentFactory extends PropertyBasedDocumentFactory
An abstract document factory that provides the mapping from field names to field indices.Concrete subclasses must implement the method
fields()
, providing the list of Tika fields.- Author:
- Salvatore Insalaco
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
PropertyBasedDocumentFactory.MetadataKeys
-
Nested classes/interfaces inherited from interface it.unimi.di.big.mg4j.document.DocumentFactory
DocumentFactory.FieldType
-
-
Field Summary
-
Fields inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
defaultMetadata
-
-
Constructor Summary
Constructors Constructor Description AbstractTikaDocumentFactory()
AbstractTikaDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
AbstractTikaDocumentFactory(Properties properties)
AbstractTikaDocumentFactory(String[] property)
-
Method Summary
Modifier and Type Method Description int
fieldIndex(String fieldName)
Returns the index of a field, given its symbolic name.String
fieldName(int field)
Returns the symbolic name of a field.protected abstract List<TikaField>
fields()
Returns the list of Tika fields (they will be mapped to MG4J fields whose index is their index in the list).DocumentFactory.FieldType
fieldType(int field)
Returns the type of a field.int
numberOfFields()
Returns the number of fields present in the documents produced by this factory.-
Methods inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
ensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, parseProperty, resolve, resolve, resolveNotNull, sameKey, toString
-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentFactory
ensureFieldIndex
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface it.unimi.di.big.mg4j.document.DocumentFactory
copy, getDocument
-
-
-
-
Constructor Detail
-
AbstractTikaDocumentFactory
public AbstractTikaDocumentFactory(Properties properties) throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
-
AbstractTikaDocumentFactory
public AbstractTikaDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
-
AbstractTikaDocumentFactory
public AbstractTikaDocumentFactory(String[] property) throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
-
AbstractTikaDocumentFactory
public AbstractTikaDocumentFactory()
-
-
Method Detail
-
numberOfFields
public int numberOfFields()
Description copied from interface:DocumentFactory
Returns the number of fields present in the documents produced by this factory.- Returns:
- the number of fields present in the documents produced by this factory.
-
fieldName
public String fieldName(int field)
Description copied from interface:DocumentFactory
Returns the symbolic name of a field.- Parameters:
field
- the index of a field (between 0 inclusive andDocumentFactory.numberOfFields()
exclusive}).- Returns:
- the symbolic name of the
field
-th field.
-
fieldIndex
public int fieldIndex(String fieldName)
Description copied from interface:DocumentFactory
Returns the index of a field, given its symbolic name.- Parameters:
fieldName
- the name of a field of this factory.- Returns:
- the corresponding index, or -1 if there is no field with name
fieldName
.
-
fieldType
public DocumentFactory.FieldType fieldType(int field)
Description copied from interface:DocumentFactory
Returns the type of a field.The possible types are defined in
DocumentFactory.FieldType
.- Parameters:
field
- the index of a field (between 0 inclusive andDocumentFactory.numberOfFields()
exclusive}).- Returns:
- the type of the
field
-th field.
-
-