Class JdbcDocumentCollection
- java.lang.Object
-
- it.unimi.di.big.mg4j.document.AbstractDocumentSequence
-
- it.unimi.di.big.mg4j.document.AbstractDocumentCollection
-
- it.unimi.di.big.mg4j.document.JdbcDocumentCollection
-
- All Implemented Interfaces:
DocumentCollection
,DocumentSequence
,SafelyCloseable
,FlyweightPrototype<DocumentCollection>
,Closeable
,Serializable
,AutoCloseable
public class JdbcDocumentCollection extends AbstractDocumentCollection implements Serializable
ADocumentCollection
corresponding to the result of a query in a relational database.An instance of this class is based on a query. The query should produce two fixed columns: the first, named id, must be an increasing integer which act as an identifier (i.e., as a key); the second, named title, must be a text field and will be used as a title. The remaining columns will be indexed, and the name of the corresponding field will be the name of the column (use judiciously AS).
In complex queries, the specification id for the first column could be ambiguous; in that case, you can provide an alternate (and hopefully more precise) specification.
At construction time, the query is executed, obtaining a bijection between the values of the identifier and document indices. The bijection is exposed by the methods
id2doc(int)
anddoc2id(int)
. The class tolerates additions to the database (and they will be skipped), but deletions will cause errors.This class provides a main method with a flexible syntax that serialises a query into a document collection.
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected class
JdbcDocumentCollection.JdbcDocumentIterator
An iterator over the whole collection that performs a single DBMS transaction.-
Nested classes/interfaces inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentCollection
AbstractDocumentCollection.PropertyKeys
-
-
Field Summary
Fields Modifier and Type Field Description protected Connection
connection
The currently open connection, if any.protected String
dbUri
The URI pointing at the database.protected int[]
doc2id
The map (as an array) from documents to database identifiers.protected DocumentFactory
factory
The factory to be used by this collection.protected Int2IntMap
id2doc
The map from database identifiers to documents.protected String
idSpec
The spec for the id field; by default it is id, but in complex query it could be ambiguous.protected String
select
The query generating the collection (without the SELECT keyword).protected String
where
The WHERE part of the query generating the collection (without the WHERE keyword), ornull
.-
Fields inherited from interface it.unimi.di.big.mg4j.document.DocumentCollection
DEFAULT_EXTENSION
-
-
Constructor Summary
Constructors Constructor Description JdbcDocumentCollection(String dbUri, String jdbcDriverName, String select, String where, DocumentFactory factory)
Creates a document collection based on the result set of an SQL query using id as id specifier.JdbcDocumentCollection(String dbUri, String jdbcDriverName, String select, String idSpec, String where, DocumentFactory factory)
Creates a document collection based on the result set of an SQL query.
-
Method Summary
Modifier and Type Method Description void
close()
Closes this document sequence, releasing all resources.JdbcDocumentCollection
copy()
int
doc2id(int doc)
Returns the database identifier associated with a given document.Document
document(long index)
Returns the document given its index.protected void
ensureConnection()
DocumentFactory
factory()
Returns the factory used by this sequence.int
id2doc(int id)
Returns the document associated with a given database identifier.DocumentIterator
iterator()
Returns an iterator over the sequence of documents.static void
main(String[] arg)
Reference2ObjectMap<Enum<?>,Object>
metadata(long index)
Returns the metadata map for a document.protected Reference2ObjectMap<Enum<?>,Object>
metadata(long index, CharSequence title)
Creates metadata with the given title; if the title is not available, it is fetched from the database.long
size()
Returns the number of documents in this collection.InputStream
stream(long index)
Returns an input stream for the raw content of a document.-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentCollection
ensureDocumentIndex, printAllDocuments, toString
-
Methods inherited from class it.unimi.di.big.mg4j.document.AbstractDocumentSequence
filename, finalize, load
-
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface it.unimi.di.big.mg4j.document.DocumentSequence
filename
-
-
-
-
Field Detail
-
id2doc
protected final Int2IntMap id2doc
The map from database identifiers to documents.
-
doc2id
protected final int[] doc2id
The map (as an array) from documents to database identifiers.
-
dbUri
protected final String dbUri
The URI pointing at the database.
-
factory
protected final DocumentFactory factory
The factory to be used by this collection.
-
select
protected final String select
The query generating the collection (without the SELECT keyword).
-
idSpec
protected final String idSpec
The spec for the id field; by default it is id, but in complex query it could be ambiguous.
-
where
protected final String where
The WHERE part of the query generating the collection (without the WHERE keyword), ornull
.
-
connection
protected transient Connection connection
The currently open connection, if any.
-
-
Constructor Detail
-
JdbcDocumentCollection
public JdbcDocumentCollection(String dbUri, String jdbcDriverName, String select, String where, DocumentFactory factory) throws SQLException, ClassNotFoundException
Creates a document collection based on the result set of an SQL query using id as id specifier.Beware. This class is not guaranteed to work if the database is deleted or modified after creation!
- Parameters:
dbUri
- a JDBC URI pointing at the database.jdbcDriverName
- the name of a JDBC driver, ornull
if you do not want to load a driver.select
- the SQL query generating the collection (without the SELECT keyword), except for the WHERE part.where
- the WHERE part (without the WHERE keyword) of the SQL query generating the collection, ornull
.factory
- the factory that will be used to create documents.- Throws:
SQLException
ClassNotFoundException
-
JdbcDocumentCollection
public JdbcDocumentCollection(String dbUri, String jdbcDriverName, String select, String idSpec, String where, DocumentFactory factory) throws SQLException, ClassNotFoundException
Creates a document collection based on the result set of an SQL query.Beware. This class is not guaranteed to work if the database is deleted or modified after creation!
- Parameters:
dbUri
- a JDBC URI pointing at the database.jdbcDriverName
- the name of a JDBC driver, ornull
if you do not want to load a driver.select
- the SQL query generating the collection (without the SELECT keyword), except for the WHERE part.idSpec
- the complete SQL spec for the id (necessary for complex queries with multiple tables).where
- the WHERE part (without the WHERE keyword) of the SQL query generating the collection, ornull
.factory
- the factory that will be used to create documents.- Throws:
SQLException
ClassNotFoundException
-
-
Method Detail
-
ensureConnection
protected void ensureConnection() throws SQLException
- Throws:
SQLException
-
close
public void close() throws IOException
Description copied from interface:DocumentSequence
Closes this document sequence, releasing all resources.You should always call this method after having finished with this document sequence. Implementations are invited to call this method in a finaliser as a safety net (even better, implement
SafelyCloseable
), but since there is no guarantee as to when finalisers are invoked, you should not depend on this behaviour.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in interfaceDocumentSequence
- Overrides:
close
in classAbstractDocumentSequence
- Throws:
IOException
-
copy
public JdbcDocumentCollection copy()
- Specified by:
copy
in interfaceDocumentCollection
- Specified by:
copy
in interfaceFlyweightPrototype<DocumentCollection>
-
factory
public DocumentFactory factory()
Description copied from interface:DocumentSequence
Returns the factory used by this sequence.Every document sequence is based on a document factory that transforms raw bytes into a sequence of characters. The factory contains useful information such as the number of fields.
- Specified by:
factory
in interfaceDocumentSequence
- Returns:
- the factory used by this sequence.
-
size
public long size()
Description copied from interface:DocumentCollection
Returns the number of documents in this collection.- Specified by:
size
in interfaceDocumentCollection
- Returns:
- the number of documents in this collection.
-
document
public Document document(long index) throws IOException
Description copied from interface:DocumentCollection
Returns the document given its index.- Specified by:
document
in interfaceDocumentCollection
- Parameters:
index
- an index between 0 (inclusive) andDocumentCollection.size()
(exclusive).- Returns:
- the
index
-th document. - Throws:
IOException
-
id2doc
public int id2doc(int id)
Returns the document associated with a given database identifier.- Parameters:
id
- a database identifier.- Returns:
- the associated document.
-
doc2id
public int doc2id(int doc)
Returns the database identifier associated with a given document.- Parameters:
doc
- a document index.- Returns:
- the associated database identifier.
-
metadata
protected Reference2ObjectMap<Enum<?>,Object> metadata(long index, CharSequence title)
Creates metadata with the given title; if the title is not available, it is fetched from the database.- Parameters:
index
- a document index.title
- a suggested title, ornull
.- Returns:
- the metadata for the document
index
.
-
metadata
public Reference2ObjectMap<Enum<?>,Object> metadata(long index)
Description copied from interface:DocumentCollection
Returns the metadata map for a document.- Specified by:
metadata
in interfaceDocumentCollection
- Parameters:
index
- an index between 0 (inclusive) andDocumentCollection.size()
(exclusive).- Returns:
- the metadata map for the document.
-
stream
public InputStream stream(long index) throws IOException
Description copied from interface:DocumentCollection
Returns an input stream for the raw content of a document.- Specified by:
stream
in interfaceDocumentCollection
- Parameters:
index
- an index between 0 (inclusive) andDocumentCollection.size()
(exclusive).- Returns:
- the raw content of the document as an input stream.
- Throws:
IOException
-
iterator
public DocumentIterator iterator() throws IOException
Description copied from interface:DocumentSequence
Returns an iterator over the sequence of documents.Warning: this method can be safely called just one time. For instance, implementations based on standard input will usually throw an exception if this method is called twice.
Implementations may decide to override this restriction (in particular, if they implement
DocumentCollection
). Usually, however, it is not possible to obtain two iterators at the same time on a collection.- Specified by:
iterator
in interfaceDocumentSequence
- Overrides:
iterator
in classAbstractDocumentCollection
- Returns:
- an iterator over the sequence of documents.
- Throws:
IOException
- See Also:
DocumentCollection
-
main
public static void main(String[] arg) throws com.martiansoftware.jsap.JSAPException, InvocationTargetException, NoSuchMethodException, IllegalAccessException, IOException, SQLException, ClassNotFoundException, InstantiationException
- Throws:
com.martiansoftware.jsap.JSAPException
InvocationTargetException
NoSuchMethodException
IllegalAccessException
IOException
SQLException
ClassNotFoundException
InstantiationException
-
-