Class HtmlDocumentFactory.HtmlDocument

    • Field Detail

      • parsed

        protected boolean parsed
        Whether we already parsed the document.
      • rawContent

        protected final InputStream rawContent
        The cached raw content.
    • Method Detail

      • title

        public CharSequence title()
        Description copied from interface: Document
        The title of this document.
        Returns:
        the title to be used to refer to this document.
      • uri

        public CharSequence uri()
        Description copied from interface: Document
        A URI that is associated with this document.
        Returns:
        the URI associated with this document, or null.
      • content

        public Object content​(int field)
                       throws IOException
        Description copied from interface: Document
        Returns the content of the given field.
        Parameters:
        field - the field index.
        Returns:
        the field content; the actual type depends on the field type, as specified by the DocumentFactory that built this document. For example, the returned object is going to be a Reader if the field type is DocumentFactory.FieldType.TEXT.
        Throws:
        IOException
      • wordReader

        public WordReader wordReader​(int field)
        Description copied from interface: Document
        Returns a word reader for the given DocumentFactory.FieldType.TEXT field.
        Parameters:
        field - the field index.
        Returns:
        a word reader object that should be used to break the given field.