A Solid Foundation

Today on the mailing list we got a question that I think highlights one the big advantages of HiveDB. It builds on a well established technology, MySQL. Hence, there is a large pool of experience and tools at your disposal for handling all of the operational tasks like replication and monitoring that we software developers don’t necessarily want to deal with.

[If] we want to run Mysql Queries on the partitioned Mysql Database, does HiveDb expose any interface for that. I understand that it would not suppprt join operations on the partitioned database, but will it support simple, select and update query ?

The short answer is

Yes absolutely, anything you can do with MySQL you can do with HiveDB.

In essence HiveDB is just a coordinator that sits atop multiple MySQL databases and allows you to access them as a single data set. We think this is one of its great strengths. Anything you can do with MySQL you can do with HiveDB as long as you are aware of a few constraints. The first of which is that you can’t join across shards.

Another constraint is that keep the directory and data nodes must be kept in sync. HiveDB indexes certain values in a directory database so that it can use them to locate records. For example, say you are sharding your data by user id. When some one logs in you need to fetch their user data, but you may not know their user id. However you do know their email. HiveDB can keep an index of email -> user id -> shard so that you can locate the record. So, if the user’s email address changes you need to make sure that you update the directory entry.

HiveDB provides you with two ways to access you data. The first, is via the standard JDBC Connection. HiveDB can act as a connection provider to your sharded data. You say, “I need a read/write connection to joey@fakestreet.com’s data.” and HiveDB hands back an ordinary JDBC connect to the shard that your data resides on. If you it in this way you need to maintain directory synchronization yourself using the hive.directory().* methods. The second type of access it provides is via a Hibernate API. If you are using Hibernate ORM you can add some special annotations to your entity classes and swap in the HiveDB SessionFactory for Hibernate’s and use Hibernate just like you were before. Our Hibernate implementation will take care of all of the indexing behind the scenes.

Finally, if you need to do maintenance you can just connect directly to the shards or the directory with a MySQL client. Again you just have to be mindful of keeping the directory and data nodes in sync if you mutate any of the data.

We think one of the great advantages of HiveDB is that the guts of it are just MySQL. You can take advantage of all of the administration tools already available for MySQL, you can use MySQL native replication and there’s a large pool of expertise and wisdom on how to tune and configure MySQL. You can actually go out and hire some one to operate it.

Updating to Java 1.6

As of changeset 4d9e2702f31b478ef070b966961fbae2407cadcb the HEAD release of HiveDB requires Java6 (1.6). We were trying to maintain compatibility with both Java5 and Java6 but were unable to do so due to changes in the DataSource interface.

We have changed our default connection pooling implementation from DBCP to C3P0. C3P0 has better failure characteristics and we think it will make HiveDB more stable and more fault tolerant. However, it also forces us to update HiveBasicDataSource in such a way that it breaks Java5 compatibility.

Presentation Slides

Hello from the Mysql Conference and Expo. I just thought I’d post a preview of the presentation that we are going to give in 20 minutes.

About

HiveDB is an open source framework for building scalable, high-performance, partitioned MySQL systems created and maintained by:


Join us at the HiveDB-Dev Google group.

Comments

  • Ajit: Does hivedb support aggregation of data across shards? For me this is a very interesting use case and if hivedb...
  • Alex Li: It is a wonderful move to stay away from SVN. Unfortunately, Git seems does not handle file rename/move well...
  • britt: @MikeD Hi MikeD, I’m not sure that I understand your question correctly. So, if I’m answering the...
  • britt: @Divya B 1. HiveDB doesn’t handle replication. In general we defer to MySQL native replication. However,...
  • Divya B: Could you also give us some information about the following? 1. Is there any data replication happening in...