First Impressions of VoltDB

VoltDB is an intriguing competitor in the quickly developing field of data store alternatives. Answering the charge of NoSQL supporters that “OldSQL” is simply too slow to scale effectively to meet the demands of high transaction applications, VoltDB seeks to demonstrate that it is neither SQL nor the Relational Model that is at fault, but the antiquated disk-centric architecture of the venerable relational database management systems (RDBMS).

VoltDB is a natively partitioned (sharded), single threaded, memory resident data store. Other RDBMS implement sharding to allow for distributed data storage across commodity hardware. However, VoltDB was written with this principle as the basis of the architecture. Sharding in other RDBMS often must be implemented as part of the application stack and presents onerous maintenance hazards (such as managing backups and restores). In recent versions, MySQL has added partitioning abstraction to the product and many organizations have experienced success with MySQL sharding. I can’t speak to how VoltDB compares, but certainly MySQL was not designed with partitioning as a fundamental principle. Regardless, the VoltDB implementation effectively and cleanly abstracts the details of partitioning from the application tiers.

But the most important differentiator is that VoltDB is entirely memory resident. All of the mechanisms that traditional RDBMS rely on to function in a disk-centric architecture have been discarded in the name of performance. This simple and instructive graphic demonstrates how VoltDB rethinks storing relational data.

In addition, VoltDB rejects concurrency in favor of single threaded transactions, thereby eliminating costly locking. As a result VoltDB can measure throughput in hundreds of thousands of transactions per second, a truly astonishing accomplishment.

Even though common performance issues of relational data stores are addressed by VoltDB, NoSQL solutions can still rightly claim that it does nothing to solve the impedance mismatch that software developers have spent so much time trying to work around. Certainly, in time, ORMs will be developed which will greatly ease the pain developers experience. Until someone is able to make the translation as seamless as, say, working with the excellent MongoDB, this will remain a legitimate complaint and an impediment to adoption. However, VoltDB’s response to this should be, “yes, some NoSQL stores speak your language better than VoltDB, but look at what you are giving up!” Losing ad-hoc querying, transactional consistency (ACID), a standardized data access convention (ANSI SQL) and the data integrity promoted by the relational model are a huge price to pay. And as Dr. David DeWitt rightly asked in his presentation to the PASS Summit 2010, “why would you want to write your own query optimizer?” On-Demand Recording (signup required) .

This brings us to what I see are the greatest drawbacks to VoltDB. Currently, only a very limited subset of SQL is actually supported, which takes some of the effectiveness out of the ad-hoc query argument. No doubt this will improve over time, but remains a challenge at this point. Additionally, due to architectural constraints, VoltDB can only loosely call itself ‘relational.’ The lack of foreign key and check constraints are worrisome. The responsibility for maintaining data integrity in this way is pushed to the app, just like in NoSQL solutions. I see no reason why check constraints, at least in limited form, could not be supported, but enforcing foreign key constraints across partitions is a key architectural problem.

However, the most disconcerting problem is the lack of uniqueness constraints. Although VoltDB enforces uniqueness for keys, it can do so only on a partition basis. That is, for any given relation the key is only guaranteed to be unique within the partition in which it resides. If your table is partitioned across n nodes, a key may exist in n partitions. To my mind, this calls into question the claim that VoltDB makes to be a relational data store, but this is perhaps debatable. Nevertheless, uniqueness constraints, along with foreign keys and check constraints, are essential to the data integrity promises of the relational model. ACID is fantastic and certainly something that NoSQL advocates are too quick to dismiss. And it seems that VoltDB is mounting a powerful challenge to the CAP theorem. But, my conclusion is that VoltDB is not a truly relational data store, yet it presents the vital correction to NoSQL to not assume that transactional integrity must be thrown out in the name of performance. For this reason, if for nothing else, it deserves to be taken very seriously.

There is much more that can be said about VoltDB and I encourage you to read their excellent whitepaper. VoltDB, and others with a similar approach, are extraordinarily promising and welcome rewrites of relational data storage. The established players in the relational market are going to be forced to take notice, as they certainly have of NoSQL. And when the NoSQL crowd realizes that they are now dealing with the same issues that Codd was trying to solve forty years ago, VoltDB will be there with part of the solution. But, much more work must be done to find creative solutions to the lack of some essential relational features before we can truly say that it can carry both the data integrity and performance torches. In the meantime, VoltDB can rightfully claim to be one of the best options out there for what it claims to do well: high throughput OLTP workloads.

A special thanks to Seb Coursol of VoltDB for his openness and enthusiasm for his product.