Class EPUBDocumentFactory
-
- All Implemented Interfaces:
DocumentFactory
,FlyweightPrototype<DocumentFactory>
,Serializable
public class EPUBDocumentFactory extends AbstractSimpleTikaDocumentFactory
A document factory for the epub format.The metadata that will be tentatively parsed are
Metadata.TITLE
,Metadata.SUBJECT
,Metadata.CREATOR
,Metadata.DESCRIPTION
,Metadata.PUBLISHER
,Metadata.CONTRIBUTOR
,Metadata.DATE
,Metadata.TYPE
,Metadata.FORMAT
,Metadata.IDENTIFIER
,Metadata.LANGUAGE
, andMetadata.RIGHTS
.- Author:
- Salvatore Insalaco
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
PropertyBasedDocumentFactory.MetadataKeys
-
Nested classes/interfaces inherited from interface it.unimi.di.big.mg4j.document.DocumentFactory
DocumentFactory.FieldType
-
-
Field Summary
-
Fields inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
defaultMetadata
-
-
Constructor Summary
Constructors Constructor Description EPUBDocumentFactory()
EPUBDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
EPUBDocumentFactory(Properties properties)
EPUBDocumentFactory(String[] property)
-
Method Summary
Modifier and Type Method Description protected org.apache.tika.parser.Parser
getParser()
The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe.protected List<TikaField>
metadataFields()
The list of Tika fields (apart for content) that this factory provides; it returns the empty list, so most subclasses may want to override this method.-
Methods inherited from class it.unimi.di.big.mg4j.document.tika.AbstractSimpleTikaDocumentFactory
copy, fields, getDocument, parseProperty
-
Methods inherited from class it.unimi.di.big.mg4j.document.tika.AbstractTikaDocumentFactory
fieldIndex, fieldName, fieldType, numberOfFields
-
Methods inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
ensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, resolve, resolve, resolveNotNull, sameKey, toString
-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentFactory
ensureFieldIndex
-
-
-
-
Constructor Detail
-
EPUBDocumentFactory
public EPUBDocumentFactory()
-
EPUBDocumentFactory
public EPUBDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
-
EPUBDocumentFactory
public EPUBDocumentFactory(Properties properties) throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
-
EPUBDocumentFactory
public EPUBDocumentFactory(String[] property) throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
-
-
Method Detail
-
getParser
protected org.apache.tika.parser.Parser getParser()
Description copied from class:AbstractSimpleTikaDocumentFactory
The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe.- Specified by:
getParser
in classAbstractSimpleTikaDocumentFactory
- Returns:
- the parser to be used to parse this kind of documents.
-
metadataFields
protected List<TikaField> metadataFields()
Description copied from class:AbstractSimpleTikaDocumentFactory
The list of Tika fields (apart for content) that this factory provides; it returns the empty list, so most subclasses may want to override this method.- Overrides:
metadataFields
in classAbstractSimpleTikaDocumentFactory
- Returns:
- the list of Tika fields that this factory provides.
-
-