Twitter currently runs a couple hundred Cassandra nodes across a half dozen clusters. These span a variety of workloads– from time series to data, to low latency, high throughput key/value. Each workload has led the team to new techniques for operating Cassandra at scale. Chris Goffinet, an engineer at Twitter and Cassandra committer, will be sharing some of the most interesting ones. For those of you interested in Cassandra and operations, this is a must-attend talk.
by Ed Anuff
Cassandra provides a wide set of mechanisms for indexing and searching data. Cassandra’s built-in secondary indexes make it easy to get started, but it also provides unique and powerful capabilities for building custom indexes that can be used for sophisticated queries of object and document data. Cassandra 0.8 introduces Composite columns, which are a key building block of custom indexes. Find out which indexing techniques are best suited to your application, and see what’s possible using some of the advanced indexing techniques.
by Eric Evans
For years SQL has provided a stable and (nearly) compatible interface to relational databases, regardless of platform or development environment. Contrast this to the NoSQL ecosystem where each project has implemented it’s own query interface, with specialized tool chain, and a unique set of idiosyncrasies.
With Cassandra 0.8, we have the release of the Cassandra Query Language (CQL). CQL functions similarly to SQL and contains most of the SQL core keywords – CREATE, DROP, INSERT, UPDATE, SELECT, USE are all there and function as one would expect.
by Erik Onnen
by Jake Luciani
by Yewei Zhang
In our initial move of the Netflix streaming service to the cloud, we made copies of data and synchronized changes back to Oracle in the datacenter. This year Netflix is making the cloud the master copy, and phasing out Oracle as a data store. At the same time we are moving from our initial memcached / SimpleDB based back-end to Apache Cassandra with data replicated to AWS availability zones, and supporting a global business with asynchronous cross region replication. We have implemented full and incremental backup and restore to S3, and integration with our existing Business Intelligence back-end. We already use Hadoop, and are currently investigating Brisk. Netflix chose Apache Cassandra because it’s flexible operational model fits our need for highly available and globally distributed data sources. We have also engaged in the Apache development process, contributing fixes and new features such as off-heap row cache and incremental backup hooks, and find that it integrates well with our primarily Java based development environment.