Class OOXMLDocumentFactory
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.AbstractDocumentFactory
-
- it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
-
- it.unimi.di.big.mg4j.document.tika.AbstractTikaDocumentFactory
-
- it.unimi.di.big.mg4j.document.tika.AbstractSimpleTikaDocumentFactory
-
- it.unimi.di.big.mg4j.document.tika.OOXMLDocumentFactory
-
- All Implemented Interfaces:
DocumentFactory
,FlyweightPrototype<DocumentFactory>
,Serializable
public class OOXMLDocumentFactory extends AbstractSimpleTikaDocumentFactory
A document factory for the OOXML format.The only metadata that will be parsed is
GreedyTikaField.NAME
.- Author:
- Salvatore Insalaco
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
PropertyBasedDocumentFactory.MetadataKeys
-
Nested classes/interfaces inherited from interface it.unimi.di.big.mg4j.document.DocumentFactory
DocumentFactory.FieldType
-
-
Field Summary
-
Fields inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
defaultMetadata
-
-
Constructor Summary
Constructors Constructor Description OOXMLDocumentFactory()
OOXMLDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
OOXMLDocumentFactory(Properties properties)
OOXMLDocumentFactory(String[] property)
-
Method Summary
Modifier and Type Method Description protected org.apache.tika.parser.Parser
getParser()
The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe.protected List<? extends TikaField>
metadataFields()
The list of Tika fields (apart for content) that this factory provides; it returns the empty list, so most subclasses may want to override this method.-
Methods inherited from class it.unimi.di.big.mg4j.document.tika.AbstractSimpleTikaDocumentFactory
copy, fields, getDocument, parseProperty
-
Methods inherited from class it.unimi.di.big.mg4j.document.tika.AbstractTikaDocumentFactory
fieldIndex, fieldName, fieldType, numberOfFields
-
Methods inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
ensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, resolve, resolve, resolveNotNull, sameKey, toString
-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentFactory
ensureFieldIndex
-
-
-
-
Constructor Detail
-
OOXMLDocumentFactory
public OOXMLDocumentFactory()
-
OOXMLDocumentFactory
public OOXMLDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
-
OOXMLDocumentFactory
public OOXMLDocumentFactory(Properties properties) throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
-
OOXMLDocumentFactory
public OOXMLDocumentFactory(String[] property) throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
-
-
Method Detail
-
metadataFields
protected List<? extends TikaField> metadataFields()
Description copied from class:AbstractSimpleTikaDocumentFactory
The list of Tika fields (apart for content) that this factory provides; it returns the empty list, so most subclasses may want to override this method.- Overrides:
metadataFields
in classAbstractSimpleTikaDocumentFactory
- Returns:
- the list of Tika fields that this factory provides.
-
getParser
protected org.apache.tika.parser.Parser getParser()
Description copied from class:AbstractSimpleTikaDocumentFactory
The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe.- Specified by:
getParser
in classAbstractSimpleTikaDocumentFactory
- Returns:
- the parser to be used to parse this kind of documents.
-
-