it.unimi.di.mg4j.document.tika
Class AutoDetectDocumentFactory
java.lang.Object
it.unimi.di.mg4j.document.AbstractDocumentFactory
it.unimi.di.mg4j.document.PropertyBasedDocumentFactory
it.unimi.di.mg4j.document.tika.AbstractTikaDocumentFactory
it.unimi.di.mg4j.document.tika.AbstractSimpleTikaDocumentFactory
it.unimi.di.mg4j.document.tika.AutoDetectDocumentFactory
- All Implemented Interfaces:
- DocumentFactory, FlyweightPrototype<DocumentFactory>, Serializable
public class AutoDetectDocumentFactory
- extends AbstractSimpleTikaDocumentFactory
A document factory that automatically detect the type of the document content.
The metadata that will be tentatively parsed are Metadata.TITLE
and
GreedyTikaField.NAME
: the latter will contain all
Tika fields Object.toString()
'd and concatenated.
- Author:
- Salvatore Insalaco
- See Also:
- Serialized Form
Method Summary |
protected org.apache.tika.parser.Parser |
getParser()
The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe. |
protected List<? extends TikaField> |
metadataFields()
The list of Tika fields (apart for content) that this factory provides; it returns the empty list, so most subclasses may want to override this method. |
Methods inherited from class it.unimi.di.mg4j.document.PropertyBasedDocumentFactory |
ensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, resolve, resolve, resolveNotNull, sameKey |
AutoDetectDocumentFactory
public AutoDetectDocumentFactory()
AutoDetectDocumentFactory
public AutoDetectDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
AutoDetectDocumentFactory
public AutoDetectDocumentFactory(Properties properties)
throws ConfigurationException
- Throws:
ConfigurationException
AutoDetectDocumentFactory
public AutoDetectDocumentFactory(String[] property)
throws ConfigurationException
- Throws:
ConfigurationException
getParser
protected org.apache.tika.parser.Parser getParser()
- Description copied from class:
AbstractSimpleTikaDocumentFactory
- The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe.
- Specified by:
getParser
in class AbstractSimpleTikaDocumentFactory
- Returns:
- the parser to be used to parse this kind of documents.
metadataFields
protected List<? extends TikaField> metadataFields()
- Description copied from class:
AbstractSimpleTikaDocumentFactory
- The list of Tika fields (apart for content) that this factory provides; it returns the empty list, so most subclasses may want to override this method.
- Overrides:
metadataFields
in class AbstractSimpleTikaDocumentFactory
- Returns:
- the list of Tika fields that this factory provides.