Over the last year, Netflix has migrated its website and streaming service from a conventional datacenter implementation to the Amazon public cloud. Along the way, we re-wrote most of our code base, built a completely new data source backend based on SimpleDB and Cassandra, and re-tooled our processes with high levels of automation. As a result, despite high and accelerating growth rates in Netflix subscriber counts, the growth rate of Netflix’ datacenter footprint has been halted, and all capacity expansion is now leveraging AWS. Since data is modified in the datacenter and in the cloud, bidirectional replication has been implemented, and Netflix has had to learn “roman riding” (look it up) with one foot in each environment. In this talk, Netflix’ Cloud Architect Adrian Cockcroft will discuss the datacenter anti-patterns that motivated a new code architecture, data architecture and deployment model. The Netflix cloud architecture takes advantage of almost every feature of AWS, and is optimized for running in a highly automated environment with ephemeral instances, non-deterministic performance, and agile deployment processes.
by John Allspaw
Getting tight operationally means strengthening the resiliency of both your stack _and_ your organization's response to issues that arise. Outage postmortem meetings done wrong can be stress-filled blamefests, and done right can be collaborative illustrations of Resilience Engineering. I'll use Etsy.com examples to illustrate sticky topics such as Root Cause Analysis and Human Error.
14th–16th June 2011