it.unimi.di.mg4j.document
Interface Document

All Superinterfaces:
Closeable
All Known Implementing Classes:
AbstractDocument, CompositeDocumentFactory.CompositeDocument, HtmlDocumentFactory.HtmlDocument, ReplicatedDocumentFactory.ReplicatedDocument

public interface Document
extends Closeable

An indexable document.

Instance of this class represent a single document. Documents provide access to possibly several fields, which represent units of information that should be indexed separately.

Each field is accessible by a call to content(int). Note, however, that unless specified otherwise field content must be accessed in increasing order. You can skip some field, but the contract of this class does not require that you can access fields in random order (although implementations may provide this feature). Moreover, the data provided by a call to content(int) (e.g., a Reader for TEXT fields) may become invalid at the next call (similarly to the behaviour of DocumentCollection.document(int)). The same holds for wordReader(int).

After obtaining a document, it is your responsibility to close it.

It is advisable, although not strictly required, that documents have a toString() equal to their title.


Method Summary
 void close()
          Closes this document, releasing all resources.
 Object content(int field)
          Returns the content of the given field.
 CharSequence title()
          The title of this document.
 CharSequence uri()
          A URI that is associated with this document.
 WordReader wordReader(int field)
          Returns a word reader for the given DocumentFactory.FieldType.TEXT field.
 

Method Detail

title

CharSequence title()
The title of this document.

Returns:
the title to be used to refer to this document, or null.

uri

CharSequence uri()
A URI that is associated with this document.

Returns:
the URI associated with this document, or null.

content

Object content(int field)
               throws IOException
Returns the content of the given field.

Parameters:
field - the field index.
Returns:
the field content; the actual type depends on the field type, as specified by the DocumentFactory that built this document. For example, the returned object is going to be a Reader if the field type is DocumentFactory.FieldType.TEXT.
Throws:
IOException

wordReader

WordReader wordReader(int field)
Returns a word reader for the given DocumentFactory.FieldType.TEXT field.

Parameters:
field - the field index.
Returns:
a word reader object that should be used to break the given field.

close

void close()
           throws IOException
Closes this document, releasing all resources.

You should always call this method after manipulating a document. Implementations are invited to call this method in a finaliser as a safety net (even better, implement SafelyCloseable), but since there is no guarantee as to when finalisers are invoked, you should not depend on this behaviour.

Specified by:
close in interface Closeable
Throws:
IOException