A Solid Foundation
posted in: why hivedb?
Today on the mailing list we got a question that I think highlights one the big advantages of HiveDB. It builds on a well established technology, MySQL. Hence, there is a large pool of experience and tools at your disposal for handling all of the operational tasks like replication and monitoring that we software developers don’t necessarily want to deal with.
[If] we want to run Mysql Queries on the partitioned Mysql Database, does HiveDb expose any interface for that. I understand that it would not suppprt join operations on the partitioned database, but will it support simple, select and update query ?
The short answer is
Yes absolutely, anything you can do with MySQL you can do with HiveDB.
In essence HiveDB is just a coordinator that sits atop multiple MySQL databases and allows you to access them as a single data set. We think this is one of its great strengths. Anything you can do with MySQL you can do with HiveDB as long as you are aware of a few constraints. The first of which is that you can’t join across shards.
Another constraint is that keep the directory and data nodes must be kept in sync. HiveDB indexes certain values in a directory database so that it can use them to locate records. For example, say you are sharding your data by user id. When some one logs in you need to fetch their user data, but you may not know their user id. However you do know their email. HiveDB can keep an index of email -> user id -> shard so that you can locate the record. So, if the user’s email address changes you need to make sure that you update the directory entry.
HiveDB provides you with two ways to access you data. The first, is via the standard JDBC Connection. HiveDB can act as a connection provider to your sharded data. You say, “I need a read/write connection to joey@fakestreet.com’s data.” and HiveDB hands back an ordinary JDBC connect to the shard that your data resides on. If you it in this way you need to maintain directory synchronization yourself using the hive.directory().* methods. The second type of access it provides is via a Hibernate API. If you are using Hibernate ORM you can add some special annotations to your entity classes and swap in the HiveDB SessionFactory for Hibernate’s and use Hibernate just like you were before. Our Hibernate implementation will take care of all of the indexing behind the scenes.
Finally, if you need to do maintenance you can just connect directly to the shards or the directory with a MySQL client. Again you just have to be mindful of keeping the directory and data nodes in sync if you mutate any of the data.
We think one of the great advantages of HiveDB is that the guts of it are just MySQL. You can take advantage of all of the administration tools already available for MySQL, you can use MySQL native replication and there’s a large pool of expertise and wisdom on how to tune and configure MySQL. You can actually go out and hire some one to operate it.
Tags: mysql, operations, ops, replication