it.unimi.di.mg4j.document
Class JdbcDocumentCollection

java.lang.Object
  extended by it.unimi.di.mg4j.document.AbstractDocumentSequence
      extended by it.unimi.di.mg4j.document.AbstractDocumentCollection
          extended by it.unimi.di.mg4j.document.JdbcDocumentCollection
All Implemented Interfaces:
DocumentCollection, DocumentSequence, SafelyCloseable, FlyweightPrototype<DocumentCollection>, Closeable, Serializable

public class JdbcDocumentCollection
extends AbstractDocumentCollection
implements Serializable

A DocumentCollection corresponding to the result of a query in a relational database.

An instance of this class is based on a query. The query should produce two fixed columns: the first, named id, must be an increasing integer which act as an identifier (i.e., as a key); the second, named title, must be a text field and will be used as a title. The remaining columns will be indexed, and the name of the corresponding field will be the name of the column (use judiciously AS).

In complex queries, the specification id for the first column could be ambiguous; in that case, you can provide an alternate (and hopefully more precise) specification.

At construction time, the query is executed, obtaining a bijection between the values of the identifier and document indices. The bijection is exposed by the methods id2doc(int) and doc2id(int). The class tolerates additions to the database (and they will be skipped), but deletions will cause errors.

This class provides a main method with a flexible syntax that serialises a query into a document collection.

See Also:
Serialized Form

Nested Class Summary
protected  class JdbcDocumentCollection.JdbcDocumentIterator
          An iterator over the whole collection that performs a single DBMS transaction.
 
Nested classes/interfaces inherited from class it.unimi.di.mg4j.document.AbstractDocumentCollection
AbstractDocumentCollection.PropertyKeys
 
Field Summary
protected  Connection connection
          The currently open connection, if any.
protected  String dbUri
          The URI pointing at the database.
protected  int[] doc2id
          The map (as an array) from documents to database identifiers.
protected  DocumentFactory factory
          The factory to be used by this collection.
protected  Int2IntMap id2doc
          The map from database identifiers to documents.
protected  String idSpec
          The spec for the id field; by default it is id, but in complex query it could be ambiguous.
protected  String select
          The query generating the collection (without the SELECT keyword).
protected  String where
          The WHERE part of the query generating the collection (without the WHERE keyword), or null.
 
Fields inherited from interface it.unimi.di.mg4j.document.DocumentCollection
DEFAULT_EXTENSION
 
Constructor Summary
JdbcDocumentCollection(String dbUri, String jdbcDriverName, String select, String where, DocumentFactory factory)
          Creates a document collection based on the result set of an SQL query using id as id specifier.
JdbcDocumentCollection(String dbUri, String jdbcDriverName, String select, String idSpec, String where, DocumentFactory factory)
          Creates a document collection based on the result set of an SQL query.
 
Method Summary
 void close()
          Closes this document sequence, releasing all resources.
 JdbcDocumentCollection copy()
           
 int doc2id(int doc)
          Returns the database identifier associated with a given document.
 Document document(int index)
          Returns the document given its index.
protected  void ensureConnection()
           
 DocumentFactory factory()
          Returns the factory used by this sequence.
 int id2doc(int id)
          Returns the document associated with a given database identifier.
 DocumentIterator iterator()
          Returns an iterator over the sequence of documents.
static void main(String[] arg)
           
 Reference2ObjectMap<Enum<?>,Object> metadata(int index)
          Returns the metadata map for a document.
protected  Reference2ObjectMap<Enum<?>,Object> metadata(int index, CharSequence title)
          Creates metadata with the given title; if the title is not available, it is fetched from the database.
 int size()
          Returns the number of documents in this collection.
 InputStream stream(int index)
          Returns an input stream for the raw content of a document.
 
Methods inherited from class it.unimi.di.mg4j.document.AbstractDocumentCollection
ensureDocumentIndex, printAllDocuments, toString
 
Methods inherited from class it.unimi.di.mg4j.document.AbstractDocumentSequence
filename, finalize, load
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface it.unimi.di.mg4j.document.DocumentSequence
filename
 

Field Detail

id2doc

protected final Int2IntMap id2doc
The map from database identifiers to documents.


doc2id

protected final int[] doc2id
The map (as an array) from documents to database identifiers.


dbUri

protected final String dbUri
The URI pointing at the database.


factory

protected final DocumentFactory factory
The factory to be used by this collection.


select

protected final String select
The query generating the collection (without the SELECT keyword).


idSpec

protected final String idSpec
The spec for the id field; by default it is id, but in complex query it could be ambiguous.


where

protected final String where
The WHERE part of the query generating the collection (without the WHERE keyword), or null.


connection

protected transient Connection connection
The currently open connection, if any.

Constructor Detail

JdbcDocumentCollection

public JdbcDocumentCollection(String dbUri,
                              String jdbcDriverName,
                              String select,
                              String where,
                              DocumentFactory factory)
                       throws SQLException,
                              ClassNotFoundException
Creates a document collection based on the result set of an SQL query using id as id specifier.

Beware. This class is not guaranteed to work if the database is deleted or modified after creation!

Parameters:
dbUri - a JDBC URI pointing at the database.
jdbcDriverName - the name of a JDBC driver, or null if you do not want to load a driver.
select - the SQL query generating the collection (without the SELECT keyword), except for the WHERE part.
where - the WHERE part (without the WHERE keyword) of the SQL query generating the collection, or null.
factory - the factory that will be used to create documents.
Throws:
SQLException
ClassNotFoundException

JdbcDocumentCollection

public JdbcDocumentCollection(String dbUri,
                              String jdbcDriverName,
                              String select,
                              String idSpec,
                              String where,
                              DocumentFactory factory)
                       throws SQLException,
                              ClassNotFoundException
Creates a document collection based on the result set of an SQL query.

Beware. This class is not guaranteed to work if the database is deleted or modified after creation!

Parameters:
dbUri - a JDBC URI pointing at the database.
jdbcDriverName - the name of a JDBC driver, or null if you do not want to load a driver.
select - the SQL query generating the collection (without the SELECT keyword), except for the WHERE part.
idSpec - the complete SQL spec for the id (necessary for complex queries with multiple tables).
where - the WHERE part (without the WHERE keyword) of the SQL query generating the collection, or null.
factory - the factory that will be used to create documents.
Throws:
SQLException
ClassNotFoundException
Method Detail

ensureConnection

protected void ensureConnection()
                         throws SQLException
Throws:
SQLException

close

public void close()
           throws IOException
Description copied from interface: DocumentSequence
Closes this document sequence, releasing all resources.

You should always call this method after having finished with this document sequence. Implementations are invited to call this method in a finaliser as a safety net (even better, implement SafelyCloseable), but since there is no guarantee as to when finalisers are invoked, you should not depend on this behaviour.

Specified by:
close in interface DocumentSequence
Specified by:
close in interface Closeable
Overrides:
close in class AbstractDocumentSequence
Throws:
IOException

copy

public JdbcDocumentCollection copy()
Specified by:
copy in interface DocumentCollection
Specified by:
copy in interface FlyweightPrototype<DocumentCollection>

factory

public DocumentFactory factory()
Description copied from interface: DocumentSequence
Returns the factory used by this sequence.

Every document sequence is based on a document factory that transforms raw bytes into a sequence of characters. The factory contains useful information such as the number of fields.

Specified by:
factory in interface DocumentSequence
Returns:
the factory used by this sequence.

size

public int size()
Description copied from interface: DocumentCollection
Returns the number of documents in this collection.

Specified by:
size in interface DocumentCollection
Returns:
the number of documents in this collection.

document

public Document document(int index)
                  throws IOException
Description copied from interface: DocumentCollection
Returns the document given its index.

Specified by:
document in interface DocumentCollection
Parameters:
index - an index between 0 (inclusive) and DocumentCollection.size() (exclusive).
Returns:
the index-th document.
Throws:
IOException

id2doc

public int id2doc(int id)
Returns the document associated with a given database identifier.

Parameters:
id - a database identifier.
Returns:
the associated document.

doc2id

public int doc2id(int doc)
Returns the database identifier associated with a given document.

Parameters:
doc - a document index.
Returns:
the associated database identifier.

metadata

protected Reference2ObjectMap<Enum<?>,Object> metadata(int index,
                                                       CharSequence title)
Creates metadata with the given title; if the title is not available, it is fetched from the database.

Parameters:
index - a document index.
title - a suggested title, or null.
Returns:
the metadata for the document index.

metadata

public Reference2ObjectMap<Enum<?>,Object> metadata(int index)
Description copied from interface: DocumentCollection
Returns the metadata map for a document.

Specified by:
metadata in interface DocumentCollection
Parameters:
index - an index between 0 (inclusive) and DocumentCollection.size() (exclusive).
Returns:
the metadata map for the document.

stream

public InputStream stream(int index)
                   throws IOException
Description copied from interface: DocumentCollection
Returns an input stream for the raw content of a document.

Specified by:
stream in interface DocumentCollection
Parameters:
index - an index between 0 (inclusive) and DocumentCollection.size() (exclusive).
Returns:
the raw content of the document as an input stream.
Throws:
IOException

iterator

public DocumentIterator iterator()
                          throws IOException
Description copied from interface: DocumentSequence
Returns an iterator over the sequence of documents.

Warning: this method can be safely called just one time. For instance, implementations based on standard input will usually throw an exception if this method is called twice.

Implementations may decide to override this restriction (in particular, if they implement DocumentCollection). Usually, however, it is not possible to obtain two iterators at the same time on a collection.

Specified by:
iterator in interface DocumentSequence
Overrides:
iterator in class AbstractDocumentCollection
Returns:
an iterator over the sequence of documents.
Throws:
IOException
See Also:
DocumentCollection

main

public static void main(String[] arg)
                 throws com.martiansoftware.jsap.JSAPException,
                        InvocationTargetException,
                        NoSuchMethodException,
                        IllegalAccessException,
                        IOException,
                        SQLException,
                        ClassNotFoundException,
                        InstantiationException
Throws:
com.martiansoftware.jsap.JSAPException
InvocationTargetException
NoSuchMethodException
IllegalAccessException
IOException
SQLException
ClassNotFoundException
InstantiationException