Hive Connections are Transaction Scoped
posted in: documentation, java, programming
HiveDB Connections have the same scope as a database transaction.
When writing an application that uses HiveDB you should consider the connections that you obtain from the Hive to have the same scope as a single database transaction. Further you should obtain a new connection from the Hive for each transaction. You don’t need to worry about pooling or the overhead involved in opening a new connection each time. The Hive takes care of pooling connections behind the scenes.
Holding onto Hive connections is bad.
The reason that holding onto connections is bad is that HiveDB implements partition key level locking. A partition key and all its dependent records can be writing. The primary use for this feature is to prevent updates while you are migrating a set of records from one data node to another, say if a node gets full. The promise made by the getConnection and getJdbcDaoSupport cache methods is that the connection will have permissions appropriate to the lock state at the time it was issued. So, if you hold onto a connection too long a record could become locked during that time. If you happened to be migrating the record your update hit the original node the record was on and now your data is out of sync.

One Comment
I’m interested to hear some reader feedback on this design decision.
Rather than asking clients to favor short-lived connections, HiveDB could have opted to throw exceptions on all JDBC operations that were capable of generating writes (for instance, preparedStatement.executeUpdate(…)).
Doing so would have provided a guarantee that nodes marked “read only” will not receive new records, however at the expense of overloading the existing SQLException semantics.
Thoughts? How do readers feel about catching SQLException as a mechanism for detecting whether an instance of a JDBC Connection has been marked read only?