Class TRECHeaderDocumentFactory

    • Constructor Detail

      • TRECHeaderDocumentFactory

        public TRECHeaderDocumentFactory()
    • Method Detail

      • numberOfFields

        public int numberOfFields()
        Description copied from interface: DocumentFactory
        Returns the number of fields present in the documents produced by this factory.
        Returns:
        the number of fields present in the documents produced by this factory.
      • fieldName

        public String fieldName​(int fieldIndex)
        Description copied from interface: DocumentFactory
        Returns the symbolic name of a field.
        Parameters:
        fieldIndex - the index of a field (between 0 inclusive and DocumentFactory.numberOfFields() exclusive}).
        Returns:
        the symbolic name of the field-th field.
      • fieldIndex

        public int fieldIndex​(String fieldName)
        Description copied from interface: DocumentFactory
        Returns the index of a field, given its symbolic name.
        Parameters:
        fieldName - the name of a field of this factory.
        Returns:
        the corresponding index, or -1 if there is no field with name fieldName.
      • startsWith

        protected static boolean startsWith​(byte[] a,
                                            int l,
                                            byte[] b)
      • startsWithIgnoreCase

        protected static boolean startsWithIgnoreCase​(byte[] a,
                                                      int l,
                                                      char[] b)
      • getDocument

        public Document getDocument​(InputStream rawContent,
                                    Reference2ObjectMap<Enum<?>,​Object> metadata)
                             throws IOException
        Description copied from interface: DocumentFactory
        Returns the document obtained by parsing the given byte stream.

        The parameter metadata actually replaces the lack of a simple keyword-based parameter-passing system in Java. This method might take several different type of “suggestions” which have been collected by the collection: typically, the document title, a URI representing the document, its MIME type, its encoding and so on. Some of this information might be set by default (as it happens, for instance, in a PropertyBasedDocumentFactory). Implementations of this method must consult the metadata provided by the collection, possibly complete them with default factory metadata, and proceed to the document construction.

        Parameters:
        rawContent - the raw content from which the document should be extracted; it must not be closed, as resource management is a responsibility of the DocumentCollection.
        metadata - a map from enums (e.g., keys taken in PropertyBasedDocumentFactory) to various kind of objects.
        Returns:
        the document obtained by parsing the given character sequence.
        Throws:
        IOException