it.unimi.di.mg4j.document
Class SimpleCompressedDocumentCollectionBuilder

java.lang.Object
  extended by it.unimi.di.mg4j.document.SimpleCompressedDocumentCollectionBuilder
All Implemented Interfaces:
DocumentCollectionBuilder

public class SimpleCompressedDocumentCollectionBuilder
extends Object
implements DocumentCollectionBuilder

A builder for simple compressed document collections.

Author:
Sebastiano Vigna

Constructor Summary
SimpleCompressedDocumentCollectionBuilder(IOFactory ioFactory, String basename, DocumentFactory documentFactory, boolean exact)
           
SimpleCompressedDocumentCollectionBuilder(String basename, DocumentFactory documentFactory, boolean exact)
           
 
Method Summary
 void add(MutableString word, MutableString nonWord)
          Adds a word and a nonword to the current text field, provided that a text field has started but not yet ended; otherwise, doesn't do anything.
 String basename()
          Returns the basename of this builder.
 void build(DocumentSequence inputSequence)
           
 void close()
          Terminates the contruction of the collection.
 void endDocument()
          Ends a document entry.
 void endTextField()
          Ends a new text field.
 void nonTextField(Object o)
          Adds a non-text field.
 void open(CharSequence suffix)
          Opens a new collection.
 void startDocument(CharSequence title, CharSequence uri)
          Starts a document entry.
 void startTextField()
          Starts a new text field.
 void virtualField(ObjectList<Scan.VirtualDocumentFragment> fragments)
          Adds a virtual field.
static int writeSelfDelimitedUtf8String(OutputBitStream obs, CharSequence s)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimpleCompressedDocumentCollectionBuilder

public SimpleCompressedDocumentCollectionBuilder(String basename,
                                                 DocumentFactory documentFactory,
                                                 boolean exact)

SimpleCompressedDocumentCollectionBuilder

public SimpleCompressedDocumentCollectionBuilder(IOFactory ioFactory,
                                                 String basename,
                                                 DocumentFactory documentFactory,
                                                 boolean exact)
Method Detail

basename

public String basename()
Description copied from interface: DocumentCollectionBuilder
Returns the basename of this builder.

Specified by:
basename in interface DocumentCollectionBuilder
Returns:
the basename

open

public void open(CharSequence suffix)
          throws IOException
Description copied from interface: DocumentCollectionBuilder
Opens a new collection.

Specified by:
open in interface DocumentCollectionBuilder
Parameters:
suffix - a suffix that will be added to the basename provided at construction time.
Throws:
IOException

add

public void add(MutableString word,
                MutableString nonWord)
         throws IOException
Description copied from interface: DocumentCollectionBuilder
Adds a word and a nonword to the current text field, provided that a text field has started but not yet ended; otherwise, doesn't do anything.

Usually, word e nonWord are just the result of a call to WordReader.next(MutableString, MutableString).

Specified by:
add in interface DocumentCollectionBuilder
Parameters:
word - a word.
nonWord - a nonword.
Throws:
IOException

close

public void close()
           throws IOException
Description copied from interface: DocumentCollectionBuilder
Terminates the contruction of the collection.

Specified by:
close in interface DocumentCollectionBuilder
Throws:
IOException

endDocument

public void endDocument()
                 throws IOException
Description copied from interface: DocumentCollectionBuilder
Ends a document entry.

Specified by:
endDocument in interface DocumentCollectionBuilder
Throws:
IOException

endTextField

public void endTextField()
                  throws IOException
Description copied from interface: DocumentCollectionBuilder
Ends a new text field.

Specified by:
endTextField in interface DocumentCollectionBuilder
Throws:
IOException

nonTextField

public void nonTextField(Object o)
                  throws IOException
Description copied from interface: DocumentCollectionBuilder
Adds a non-text field.

Specified by:
nonTextField in interface DocumentCollectionBuilder
Parameters:
o - the content of the non-text field.
Throws:
IOException

writeSelfDelimitedUtf8String

public static int writeSelfDelimitedUtf8String(OutputBitStream obs,
                                               CharSequence s)
                                        throws IOException
Throws:
IOException

startDocument

public void startDocument(CharSequence title,
                          CharSequence uri)
                   throws IOException
Description copied from interface: DocumentCollectionBuilder
Starts a document entry.

Specified by:
startDocument in interface DocumentCollectionBuilder
Parameters:
title - the document title (usually, the result of Document.title()).
uri - the document uri (usually, the result of Document.uri()).
Throws:
IOException

startTextField

public void startTextField()
Description copied from interface: DocumentCollectionBuilder
Starts a new text field.

Specified by:
startTextField in interface DocumentCollectionBuilder

virtualField

public void virtualField(ObjectList<Scan.VirtualDocumentFragment> fragments)
                  throws IOException
Description copied from interface: DocumentCollectionBuilder
Adds a virtual field.

Specified by:
virtualField in interface DocumentCollectionBuilder
Parameters:
fragments - the virtual fragments to be added.
Throws:
IOException

build

public void build(DocumentSequence inputSequence)
           throws IOException
Throws:
IOException