Interface DocumentFactory

    • Method Detail

      • numberOfFields

        int numberOfFields()
        Returns the number of fields present in the documents produced by this factory.
        Returns:
        the number of fields present in the documents produced by this factory.
      • fieldName

        String fieldName​(int field)
        Returns the symbolic name of a field.
        Parameters:
        field - the index of a field (between 0 inclusive and numberOfFields() exclusive}).
        Returns:
        the symbolic name of the field-th field.
      • fieldIndex

        int fieldIndex​(String fieldName)
        Returns the index of a field, given its symbolic name.
        Parameters:
        fieldName - the name of a field of this factory.
        Returns:
        the corresponding index, or -1 if there is no field with name fieldName.
      • getDocument

        Document getDocument​(InputStream rawContent,
                             Reference2ObjectMap<Enum<?>,​Object> metadata)
                      throws IOException
        Returns the document obtained by parsing the given byte stream.

        The parameter metadata actually replaces the lack of a simple keyword-based parameter-passing system in Java. This method might take several different type of “suggestions” which have been collected by the collection: typically, the document title, a URI representing the document, its MIME type, its encoding and so on. Some of this information might be set by default (as it happens, for instance, in a PropertyBasedDocumentFactory). Implementations of this method must consult the metadata provided by the collection, possibly complete them with default factory metadata, and proceed to the document construction.

        Parameters:
        rawContent - the raw content from which the document should be extracted; it must not be closed, as resource management is a responsibility of the DocumentCollection.
        metadata - a map from enums (e.g., keys taken in PropertyBasedDocumentFactory) to various kind of objects.
        Returns:
        the document obtained by parsing the given character sequence.
        Throws:
        IOException