it.unimi.di.mg4j.document.tika
Class TextDocumentFactory
java.lang.Object
it.unimi.di.mg4j.document.AbstractDocumentFactory
it.unimi.di.mg4j.document.PropertyBasedDocumentFactory
it.unimi.di.mg4j.document.tika.AbstractTikaDocumentFactory
it.unimi.di.mg4j.document.tika.AbstractSimpleTikaDocumentFactory
it.unimi.di.mg4j.document.tika.TextDocumentFactory
- All Implemented Interfaces:
- DocumentFactory, FlyweightPrototype<DocumentFactory>, Serializable
public class TextDocumentFactory
- extends AbstractSimpleTikaDocumentFactory
A document factory for the text format; the character set will be autodetected.
This factory has no metadata.
- Author:
- Salvatore Insalaco
- See Also:
- Serialized Form
Method Summary |
protected org.apache.tika.parser.Parser |
getParser()
The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe. |
Methods inherited from class it.unimi.di.mg4j.document.PropertyBasedDocumentFactory |
ensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, resolve, resolve, resolveNotNull, sameKey |
TextDocumentFactory
public TextDocumentFactory()
TextDocumentFactory
public TextDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
TextDocumentFactory
public TextDocumentFactory(Properties properties)
throws ConfigurationException
- Throws:
ConfigurationException
TextDocumentFactory
public TextDocumentFactory(String[] property)
throws ConfigurationException
- Throws:
ConfigurationException
getParser
protected org.apache.tika.parser.Parser getParser()
- Description copied from class:
AbstractSimpleTikaDocumentFactory
- The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe.
- Specified by:
getParser
in class AbstractSimpleTikaDocumentFactory
- Returns:
- the parser to be used to parse this kind of documents.