it.unimi.di.mg4j.document.tika
Class MSOfficeDocumentFactory
java.lang.Object
it.unimi.di.mg4j.document.AbstractDocumentFactory
it.unimi.di.mg4j.document.PropertyBasedDocumentFactory
it.unimi.di.mg4j.document.tika.AbstractTikaDocumentFactory
it.unimi.di.mg4j.document.tika.AbstractSimpleTikaDocumentFactory
it.unimi.di.mg4j.document.tika.MSOfficeDocumentFactory
- All Implemented Interfaces:
- DocumentFactory, FlyweightPrototype<DocumentFactory>, Serializable
public class MSOfficeDocumentFactory
- extends AbstractSimpleTikaDocumentFactory
A document factory for the Microsoft Office format.
The only metadata that will be parsed is GreedyTikaField.NAME
.
- Author:
- Salvatore Insalaco
- See Also:
- Serialized Form
Method Summary |
protected org.apache.tika.parser.Parser |
getParser()
The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe. |
protected List<? extends TikaField> |
metadataFields()
The list of Tika fields (apart for content) that this factory provides; it returns the empty list, so most subclasses may want to override this method. |
Methods inherited from class it.unimi.di.mg4j.document.PropertyBasedDocumentFactory |
ensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, resolve, resolve, resolveNotNull, sameKey |
MSOfficeDocumentFactory
public MSOfficeDocumentFactory()
MSOfficeDocumentFactory
public MSOfficeDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
MSOfficeDocumentFactory
public MSOfficeDocumentFactory(Properties properties)
throws ConfigurationException
- Throws:
ConfigurationException
MSOfficeDocumentFactory
public MSOfficeDocumentFactory(String[] property)
throws ConfigurationException
- Throws:
ConfigurationException
metadataFields
protected List<? extends TikaField> metadataFields()
- Description copied from class:
AbstractSimpleTikaDocumentFactory
- The list of Tika fields (apart for content) that this factory provides; it returns the empty list, so most subclasses may want to override this method.
- Overrides:
metadataFields
in class AbstractSimpleTikaDocumentFactory
- Returns:
- the list of Tika fields that this factory provides.
getParser
protected org.apache.tika.parser.Parser getParser()
- Description copied from class:
AbstractSimpleTikaDocumentFactory
- The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe.
- Specified by:
getParser
in class AbstractSimpleTikaDocumentFactory
- Returns:
- the parser to be used to parse this kind of documents.