I was recently reading the news of the Apache Foundation’s recent release of NoSQL database Cassandra version 1.2, and I ran across a description of the Cassandra Query Language (CQL), which Apache has also updated to version 3. I’ve always been amused by the NoSQL query languages, since they’re all doing their best to look like SQL even though they call themselves NoSQL, much as a Japanese restaurant might promote their artificial crab sushi. But I digress…
In any case, CQL3 contains a handful of SQL-like commands, including the predictable INSERT, UPDATE, and SELECT, as well as a few that are more specific to Cassandra’s NoSQL nature, like CREATE KEYSPACE. A keyspace is analogous to a schema in a relational database; it’s the outermost grouping of data for an application.
Here’s where the story gets interesting. When you create a keyspace, you must declare its replication strategy. There are a few options for replication strategy in Cassandra, with the simplest being a straightforward round robin distribution of replicas of your data across nodes. In English this strategy means that Cassandra automatically makes identical copies of your data (the replicas), and automatically distributes them across either virtual servers or physical servers (the nodes), so that if any of the nodes fails, Cassandra can recover the data on that node. Furthermore, you can select and configure your replication strategy however you like to configure the scalability, elasticity, and fault tolerance of your database.
In other words, Cassandra is built from the ground up for the Cloud. Sure, you’d expect that from Cassandra, but it’s important to understand that any application that runs in the Cloud should behave similarly. If your app isn’t designed to break into multiple pieces that can go on multiple nodes, where the number of nodes can change dynamically, and furthermore, individual nodes can fail without sinking your app, then your app isn’t Cloud ready.
So, next time you’re talking to a software vendor who’s trumpeting that their software will run in the Cloud, ask them what their replication strategy is. If they don’t have one — or worse, if they don’t even understand the question — then run the other direction. Quickly.