|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface DocumentCollectionBuilder
An interface for classes that can build collections during the indexing process.
A builder is usually based on a basename.
Many different collections can be built using the same builder, using open(CharSequence)
to specify a suffix that will be added to the basename. Creating several collections
is a simple way to make collection construction scalable: for instance, Scan
creates
several collections, one per batch, and then puts them together using a ConcatenatedDocumentCollection
.
After creating an instance of this class and after having opened a new collection, it is possible to add incrementally
new documents. Each document must be started with startDocument(CharSequence, CharSequence)
and ended with endDocument()
; inside each document, each non-text field must be written by passing
an object to nonTextField(Object)
, whereas each text field must be
started with startTextField()
and ended with endTextField()
: inbetween, a call
to add(MutableString, MutableString)
must be made for each word/nonword pair retrieved
from the original collection. At the end, close()
returns a ZipDocumentCollection
that must be serialised.
Several collections (e.g., SimpleCompressedDocumentCollection
, ZipDocumentCollection
) can be
exact or approximated: in the latter case, nonwords are not recorded to decrease space usage.
Method Summary | |
---|---|
void |
add(MutableString word,
MutableString nonWord)
Adds a word and a nonword to the current text field, provided that a text field has started but not yet ended; otherwise, doesn't do anything. |
String |
basename()
Returns the basename of this builder. |
void |
close()
Terminates the contruction of the collection. |
void |
endDocument()
Ends a document entry. |
void |
endTextField()
Ends a new text field. |
void |
nonTextField(Object o)
Adds a non-text field. |
void |
open(CharSequence suffix)
Opens a new collection. |
void |
startDocument(CharSequence title,
CharSequence uri)
Starts a document entry. |
void |
startTextField()
Starts a new text field. |
void |
virtualField(ObjectList<Scan.VirtualDocumentFragment> fragments)
Adds a virtual field. |
Method Detail |
---|
String basename()
void open(CharSequence suffix) throws IOException
suffix
- a suffix that will be added to the basename provided at construction time.
IOException
void startDocument(CharSequence title, CharSequence uri) throws IOException
title
- the document title (usually, the result of Document.title()
).uri
- the document uri (usually, the result of Document.uri()
).
IOException
void endDocument() throws IOException
IOException
void startTextField()
void endTextField() throws IOException
IOException
void nonTextField(Object o) throws IOException
o
- the content of the non-text field.
IOException
void virtualField(ObjectList<Scan.VirtualDocumentFragment> fragments) throws IOException
fragments
- the virtual fragments to be added.
IOException
void add(MutableString word, MutableString nonWord) throws IOException
Usually, word
e nonWord
are just the result of a call
to WordReader.next(MutableString, MutableString)
.
word
- a word.nonWord
- a nonword.
IOException
void close() throws IOException
IOException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |