|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface DocumentFactory
A factory parsing and building documents of the same type.
Each document produced by the same factory has a number of fields,
which represent units of information that should be indexed
separately. The number of available fields may be recovered calling
numberOfFields()
, their types calling fieldType(int)
,
and their symbolic names using fieldName(int)
.
Factories contain the parsing and document-level breaking logic. For instance,
a factory for HTML documents might extract the text into a title and a body, and
expose them as DocumentFactory.FieldType.TEXT
fields. Additionally, the last modification
date might be exposed as a DocumentFactory.FieldType.DATE
field, and so on.
Warning: implementations of this class are not required
to be thread-safe, but they provide flyweight copies
.
The copy()
method is strengthened so to return a instance of this class.
Nested Class Summary | |
---|---|
static class |
DocumentFactory.FieldType
A field type. |
Method Summary | |
---|---|
DocumentFactory |
copy()
|
int |
fieldIndex(String fieldName)
Returns the index of a field, given its symbolic name. |
String |
fieldName(int field)
Returns the symbolic name of a field. |
DocumentFactory.FieldType |
fieldType(int field)
Returns the type of a field. |
Document |
getDocument(InputStream rawContent,
Reference2ObjectMap<Enum<?>,Object> metadata)
Returns the document obtained by parsing the given byte stream. |
int |
numberOfFields()
Returns the number of fields present in the documents produced by this factory. |
Method Detail |
---|
int numberOfFields()
String fieldName(int field)
field
- the index of a field (between 0 inclusive and numberOfFields()
exclusive}).
field
-th field.int fieldIndex(String fieldName)
fieldName
- the name of a field of this factory.
fieldName
.DocumentFactory.FieldType fieldType(int field)
The possible types are defined in DocumentFactory.FieldType
.
field
- the index of a field (between 0 inclusive and numberOfFields()
exclusive}).
field
-th field.Document getDocument(InputStream rawContent, Reference2ObjectMap<Enum<?>,Object> metadata) throws IOException
The parameter metadata
actually replaces the lack of a simple keyword-based
parameter-passing system in Java. This method might take several different type of “suggestions”
which have been collected by the collection: typically, the document title, a URI representing
the document, its MIME type, its encoding and so on. Some of this information might be
set by default (as it happens, for instance, in a PropertyBasedDocumentFactory
).
Implementations of this method must consult the metadata provided by the collection, possibly
complete them with default factory metadata, and proceed to the document construction.
rawContent
- the raw content from which the document should be extracted; it must not be closed, as
resource management is a responsibility of the DocumentCollection.metadata
- a map from enums (e.g., keys taken in PropertyBasedDocumentFactory
) to various kind of objects.
IOException
DocumentFactory copy()
copy
in interface FlyweightPrototype<DocumentFactory>
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |