it.unimi.di.mg4j.document.tika
Class PdfDocumentFactory
java.lang.Object
it.unimi.di.mg4j.document.AbstractDocumentFactory
it.unimi.di.mg4j.document.PropertyBasedDocumentFactory
it.unimi.di.mg4j.document.tika.AbstractTikaDocumentFactory
it.unimi.di.mg4j.document.tika.AbstractSimpleTikaDocumentFactory
it.unimi.di.mg4j.document.tika.PdfDocumentFactory
- All Implemented Interfaces:
- DocumentFactory, FlyweightPrototype<DocumentFactory>, Serializable
public class PdfDocumentFactory
- extends AbstractSimpleTikaDocumentFactory
A document factory for the PDF format.
The metadata that will be tentatively parsed are
Metadata.TITLE
, MSOffice.AUTHOR
, Metadata.CREATOR
,
MSOffice.KEYWORDS
, Metadata.SUBJECT
, producer, created,
trapped, and HttpHeaders.LAST_MODIFIED
.
- Author:
- Salvatore Insalaco
- See Also:
- Serialized Form
Method Summary |
protected org.apache.tika.parser.Parser |
getParser()
The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe. |
protected List<TikaField> |
metadataFields()
The list of Tika fields (apart for content) that this factory provides; it returns the empty list, so most subclasses may want to override this method. |
Methods inherited from class it.unimi.di.mg4j.document.PropertyBasedDocumentFactory |
ensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, resolve, resolve, resolveNotNull, sameKey |
PdfDocumentFactory
public PdfDocumentFactory()
PdfDocumentFactory
public PdfDocumentFactory(Properties properties)
throws ConfigurationException
- Throws:
ConfigurationException
PdfDocumentFactory
public PdfDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
PdfDocumentFactory
public PdfDocumentFactory(String[] property)
throws ConfigurationException
- Throws:
ConfigurationException
getParser
protected org.apache.tika.parser.Parser getParser()
- Description copied from class:
AbstractSimpleTikaDocumentFactory
- The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe.
- Specified by:
getParser
in class AbstractSimpleTikaDocumentFactory
- Returns:
- the parser to be used to parse this kind of documents.
metadataFields
protected List<TikaField> metadataFields()
- Description copied from class:
AbstractSimpleTikaDocumentFactory
- The list of Tika fields (apart for content) that this factory provides; it returns the empty list, so most subclasses may want to override this method.
- Overrides:
metadataFields
in class AbstractSimpleTikaDocumentFactory
- Returns:
- the list of Tika fields that this factory provides.