Class TextDocumentFactory
-
- All Implemented Interfaces:
DocumentFactory
,FlyweightPrototype<DocumentFactory>
,Serializable
public class TextDocumentFactory extends AbstractSimpleTikaDocumentFactory
A document factory for the text format; the character set will be autodetected.This factory has no metadata.
- Author:
- Salvatore Insalaco
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
PropertyBasedDocumentFactory.MetadataKeys
-
Nested classes/interfaces inherited from interface it.unimi.di.big.mg4j.document.DocumentFactory
DocumentFactory.FieldType
-
-
Field Summary
-
Fields inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
defaultMetadata
-
-
Constructor Summary
Constructors Constructor Description TextDocumentFactory()
TextDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
TextDocumentFactory(Properties properties)
TextDocumentFactory(String[] property)
-
Method Summary
Modifier and Type Method Description protected org.apache.tika.parser.Parser
getParser()
The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe.-
Methods inherited from class it.unimi.di.big.mg4j.document.tika.AbstractSimpleTikaDocumentFactory
copy, fields, getDocument, metadataFields, parseProperty
-
Methods inherited from class it.unimi.di.big.mg4j.document.tika.AbstractTikaDocumentFactory
fieldIndex, fieldName, fieldType, numberOfFields
-
Methods inherited from class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory
ensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, resolve, resolve, resolveNotNull, sameKey, toString
-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentFactory
ensureFieldIndex
-
-
-
-
Constructor Detail
-
TextDocumentFactory
public TextDocumentFactory()
-
TextDocumentFactory
public TextDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
-
TextDocumentFactory
public TextDocumentFactory(Properties properties) throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
-
TextDocumentFactory
public TextDocumentFactory(String[] property) throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
-
-
Method Detail
-
getParser
protected org.apache.tika.parser.Parser getParser()
Description copied from class:AbstractSimpleTikaDocumentFactory
The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe.- Specified by:
getParser
in classAbstractSimpleTikaDocumentFactory
- Returns:
- the parser to be used to parse this kind of documents.
-
-