Package it.unimi.di.big.mg4j.document
Class SimpleCompressedDocumentCollectionBuilder
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.SimpleCompressedDocumentCollectionBuilder
-
- All Implemented Interfaces:
DocumentCollectionBuilder
public class SimpleCompressedDocumentCollectionBuilder extends Object implements DocumentCollectionBuilder
A builder for simple compressed document collections.- Author:
- Sebastiano Vigna
-
-
Constructor Summary
Constructors Constructor Description SimpleCompressedDocumentCollectionBuilder(IOFactory ioFactory, String basename, DocumentFactory documentFactory, boolean exact)
SimpleCompressedDocumentCollectionBuilder(IOFactory ioFactory, String basename, DocumentFactory documentFactory, boolean exact, boolean relative)
SimpleCompressedDocumentCollectionBuilder(String basename, DocumentFactory documentFactory, boolean exact)
-
Method Summary
Modifier and Type Method Description void
add(MutableString word, MutableString nonWord)
String
basename()
Returns the basename of this builder.void
build(DocumentSequence inputSequence)
void
close()
Terminates the contruction of the collection.void
endDocument()
Ends a document entry.void
endTextField()
Ends a new text field.void
nonTextField(Object o)
Adds a non-text field.void
open(CharSequence suffix)
Opens a new collection.void
startDocument(CharSequence title, CharSequence uri)
Starts a document entry.void
startTextField()
Starts a new text field.void
virtualField(List<Scan.VirtualDocumentFragment> fragments)
Adds a virtual field.static int
writeSelfDelimitedUtf8String(OutputBitStream obs, CharSequence s)
-
-
-
Constructor Detail
-
SimpleCompressedDocumentCollectionBuilder
public SimpleCompressedDocumentCollectionBuilder(String basename, DocumentFactory documentFactory, boolean exact)
-
SimpleCompressedDocumentCollectionBuilder
public SimpleCompressedDocumentCollectionBuilder(IOFactory ioFactory, String basename, DocumentFactory documentFactory, boolean exact)
-
SimpleCompressedDocumentCollectionBuilder
public SimpleCompressedDocumentCollectionBuilder(IOFactory ioFactory, String basename, DocumentFactory documentFactory, boolean exact, boolean relative)
-
-
Method Detail
-
basename
public String basename()
Description copied from interface:DocumentCollectionBuilder
Returns the basename of this builder.- Specified by:
basename
in interfaceDocumentCollectionBuilder
- Returns:
- the basename
-
open
public void open(CharSequence suffix) throws IOException
Description copied from interface:DocumentCollectionBuilder
Opens a new collection.- Specified by:
open
in interfaceDocumentCollectionBuilder
- Parameters:
suffix
- a suffix that will be added to the basename provided at construction time.- Throws:
IOException
-
add
public void add(MutableString word, MutableString nonWord) throws IOException
Description copied from interface:DocumentCollectionBuilder
Adds a word and a nonword to the current text field, provided that a text field has started but not yet ended; otherwise, doesn't do anything.Usually,
word
enonWord
are just the result of a call toWordReader.next(MutableString, MutableString)
.- Specified by:
add
in interfaceDocumentCollectionBuilder
- Parameters:
word
- a word.nonWord
- a nonword.- Throws:
IOException
-
close
public void close() throws IOException
Description copied from interface:DocumentCollectionBuilder
Terminates the contruction of the collection.- Specified by:
close
in interfaceDocumentCollectionBuilder
- Throws:
IOException
-
endDocument
public void endDocument() throws IOException
Description copied from interface:DocumentCollectionBuilder
Ends a document entry.- Specified by:
endDocument
in interfaceDocumentCollectionBuilder
- Throws:
IOException
-
endTextField
public void endTextField() throws IOException
Description copied from interface:DocumentCollectionBuilder
Ends a new text field.- Specified by:
endTextField
in interfaceDocumentCollectionBuilder
- Throws:
IOException
-
nonTextField
public void nonTextField(Object o) throws IOException
Description copied from interface:DocumentCollectionBuilder
Adds a non-text field.- Specified by:
nonTextField
in interfaceDocumentCollectionBuilder
- Parameters:
o
- the content of the non-text field.- Throws:
IOException
-
writeSelfDelimitedUtf8String
public static int writeSelfDelimitedUtf8String(OutputBitStream obs, CharSequence s) throws IOException
- Throws:
IOException
-
startDocument
public void startDocument(CharSequence title, CharSequence uri) throws IOException
Description copied from interface:DocumentCollectionBuilder
Starts a document entry.- Specified by:
startDocument
in interfaceDocumentCollectionBuilder
- Parameters:
title
- the document title (usually, the result ofDocument.title()
).uri
- the document uri (usually, the result ofDocument.uri()
).- Throws:
IOException
-
startTextField
public void startTextField()
Description copied from interface:DocumentCollectionBuilder
Starts a new text field.- Specified by:
startTextField
in interfaceDocumentCollectionBuilder
-
virtualField
public void virtualField(List<Scan.VirtualDocumentFragment> fragments) throws IOException
Description copied from interface:DocumentCollectionBuilder
Adds a virtual field.- Specified by:
virtualField
in interfaceDocumentCollectionBuilder
- Parameters:
fragments
- the virtual fragments to be added.- Throws:
IOException
-
build
public void build(DocumentSequence inputSequence) throws IOException
- Throws:
IOException
-
-