In our initial move of the Netflix streaming service to the cloud, we made copies of data and synchronized changes back to Oracle in the datacenter. This year Netflix is making the cloud the master copy, and phasing out Oracle as a data store. At the same time we are moving from our initial memcached / SimpleDB based back-end to Apache Cassandra with data replicated to AWS availability zones, and supporting a global business with asynchronous cross region replication. We have implemented full and incremental backup and restore to S3, and integration with our existing Business Intelligence back-end. We already use Hadoop, and are currently investigating Brisk. Netflix chose Apache Cassandra because it’s flexible operational model fits our need for highly available and globally distributed data sources. We have also engaged in the Apache development process, contributing fixes and new features such as off-heap row cache and incremental backup hooks, and find that it integrates well with our primarily Java based development environment.
11th July 2011