it.unimi.di.mg4j.document
Class ZipDocumentCollectionBuilder

java.lang.Object
  extended by it.unimi.di.mg4j.document.ZipDocumentCollectionBuilder
All Implemented Interfaces:
DocumentCollectionBuilder

public class ZipDocumentCollectionBuilder
extends Object
implements DocumentCollectionBuilder

A builder for zipped document collections.


Constructor Summary
ZipDocumentCollectionBuilder(String basename, DocumentFactory factory, boolean exact)
          Creates a new zipped collection builder.
 
Method Summary
 void add(MutableString word, MutableString nonWord)
          Adds a word and a nonword to the current text field, provided that a text field has started but not yet ended; otherwise, doesn't do anything.
 String basename()
          Returns the basename of this builder.
 void build(DocumentSequence inputSequence)
           
 void close()
          Terminates the contruction of the collection.
 void endDocument()
          Ends a document entry.
 void endTextField()
          Ends a new text field.
static void main(String[] arg)
           
 void nonTextField(Object o)
          Adds a non-text field.
 void open(CharSequence suffix)
          Opens a new collection.
 void startDocument(CharSequence title, CharSequence uri)
          Starts a document entry.
 void startTextField()
          Starts a new text field.
 void virtualField(ObjectList<Scan.VirtualDocumentFragment> fragments)
          Adds a virtual field.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ZipDocumentCollectionBuilder

public ZipDocumentCollectionBuilder(String basename,
                                    DocumentFactory factory,
                                    boolean exact)
Creates a new zipped collection builder.

Parameters:
factory - the factory of the base document sequence.
exact - true iff also non-words should be preserved.
Method Detail

open

public void open(CharSequence suffix)
          throws FileNotFoundException
Description copied from interface: DocumentCollectionBuilder
Opens a new collection.

Specified by:
open in interface DocumentCollectionBuilder
Parameters:
suffix - a suffix that will be added to the basename provided at construction time.
Throws:
FileNotFoundException

basename

public String basename()
Description copied from interface: DocumentCollectionBuilder
Returns the basename of this builder.

Specified by:
basename in interface DocumentCollectionBuilder
Returns:
the basename

startDocument

public void startDocument(CharSequence title,
                          CharSequence uri)
                   throws IOException
Description copied from interface: DocumentCollectionBuilder
Starts a document entry.

Specified by:
startDocument in interface DocumentCollectionBuilder
Parameters:
title - the document title (usually, the result of Document.title()).
uri - the document uri (usually, the result of Document.uri()).
Throws:
IOException

endDocument

public void endDocument()
                 throws IOException
Description copied from interface: DocumentCollectionBuilder
Ends a document entry.

Specified by:
endDocument in interface DocumentCollectionBuilder
Throws:
IOException

startTextField

public void startTextField()
Description copied from interface: DocumentCollectionBuilder
Starts a new text field.

Specified by:
startTextField in interface DocumentCollectionBuilder

nonTextField

public void nonTextField(Object o)
                  throws IOException
Description copied from interface: DocumentCollectionBuilder
Adds a non-text field.

Specified by:
nonTextField in interface DocumentCollectionBuilder
Parameters:
o - the content of the non-text field.
Throws:
IOException

virtualField

public void virtualField(ObjectList<Scan.VirtualDocumentFragment> fragments)
                  throws IOException
Description copied from interface: DocumentCollectionBuilder
Adds a virtual field.

Specified by:
virtualField in interface DocumentCollectionBuilder
Parameters:
fragments - the virtual fragments to be added.
Throws:
IOException

endTextField

public void endTextField()
                  throws IOException
Description copied from interface: DocumentCollectionBuilder
Ends a new text field.

Specified by:
endTextField in interface DocumentCollectionBuilder
Throws:
IOException

add

public void add(MutableString word,
                MutableString nonWord)
         throws IOException
Description copied from interface: DocumentCollectionBuilder
Adds a word and a nonword to the current text field, provided that a text field has started but not yet ended; otherwise, doesn't do anything.

Usually, word e nonWord are just the result of a call to WordReader.next(MutableString, MutableString).

Specified by:
add in interface DocumentCollectionBuilder
Parameters:
word - a word.
nonWord - a nonword.
Throws:
IOException

close

public void close()
           throws IOException
Description copied from interface: DocumentCollectionBuilder
Terminates the contruction of the collection.

Specified by:
close in interface DocumentCollectionBuilder
Throws:
IOException

build

public void build(DocumentSequence inputSequence)
           throws IOException
Throws:
IOException

main

public static void main(String[] arg)
                 throws com.martiansoftware.jsap.JSAPException,
                        IOException,
                        ClassNotFoundException,
                        InvocationTargetException,
                        NoSuchMethodException,
                        IllegalAccessException,
                        InstantiationException,
                        IllegalArgumentException,
                        SecurityException
Throws:
com.martiansoftware.jsap.JSAPException
IOException
ClassNotFoundException
InvocationTargetException
NoSuchMethodException
IllegalAccessException
InstantiationException
IllegalArgumentException
SecurityException