Your current filters are…
Over the last year, Netflix has migrated its website and streaming service from a conventional datacenter implementation to the Amazon public cloud. Along the way, we re-wrote most of our code base, built a completely new data source backend based on SimpleDB and Cassandra, and re-tooled our processes with high levels of automation. As a result, despite high and accelerating growth rates in Netflix subscriber counts, the growth rate of Netflix’ datacenter footprint has been halted, and all capacity expansion is now leveraging AWS. Since data is modified in the datacenter and in the cloud, bidirectional replication has been implemented, and Netflix has had to learn “roman riding” (look it up) with one foot in each environment. In this talk, Netflix’ Cloud Architect Adrian Cockcroft will discuss the datacenter anti-patterns that motivated a new code architecture, data architecture and deployment model. The Netflix cloud architecture takes advantage of almost every feature of AWS, and is optimized for running in a highly automated environment with ephemeral instances, non-deterministic performance, and agile deployment processes.
by John Allspaw
Getting tight operationally means strengthening the resiliency of both your stack _and_ your organization's response to issues that arise. Outage postmortem meetings done wrong can be stress-filled blamefests, and done right can be collaborative illustrations of Resilience Engineering. I'll use Etsy.com examples to illustrate sticky topics such as Root Cause Analysis and Human Error.
by Yehuda Katz
The SproutCore framework has evolved over the past five years to be an extremely high-performance framework that focuses on making it possible to build native-like applications in the browser.
This means handling problems like working with extremely large data-sets, inconsistent connectivity, and complex DOMs. Lately, it has meant figuring out how to properly use new browser features that can make a big difference to perceived performance, like hardware acceleration.
In this talk, Yehuda will cover some of the techniques that SproutCore has used historically to enable extremely complex applications to perform well in the browser, as well as what new technologies the team is looking at to leverage the latest browser technologies in building compelling content for the web.
by Lew Cirne
New Relic’s multitenant, SaaS web application monitoring service collects and persists over 90,000 metrics every second on a sustained basis, while still delivering an average page load time of 1.5 seconds. In this session I will discuss how good architecture and good tools can help you handle an extremely large amount of data while still providing extremely fast service. I’ll show you how we scale to support customer growth, how we monitor our system, and what traps to look out for.
You should come away from this session with an understanding of how to:
by Jon Jenkins
Operations is can play a critical role in driving revenue for the business. This talk will explore some ways in which the ops team at Amazon is thinking outside the box to drive profitability. JJ will also issue a challenge for next year’s Velocity Conference.
by John Resig
Working on the development of jQuery one tends to learn about all the performance implications of a particular change to a JavaScript code base (whether it be from an API change or a larger internals rewrite). Performance is an ever-present concern for every single commit and for every release. Performance implications must be well-defended and well-tested. In this talk we’re going to look at all the different performance concerns that the project deals with (processor, memory, network) and the tools that are used to make sure development continues to move smoothly.
Sixty-five engineers at Etsy deploy code to our production servers more than 30 times a day. We keep this process safe with a suite of unit tests, integration tests, and a large number of application-centric dashboards written by engineers. We capture metrics in Ganglia, Cacti, and Graphite and these metrics from technical aspects like outgoing bandwidth and web server requests per second to business aspects like new registrations and gross sales.
I plan to present an overview of the tools we use for collecting metrics and the code we use to quickly build one-page dashboards for different aspects of our site (e.g. general health, image storage, search infrastructure). The underlying theme is that these tools are not difficult to use, but typically lie in the “operations” domain. At Etsy, we’ve gone to great strides to get engineers excited about contributing to metrics and dashboards, and make it dead simple to do these things quickly so that it doesn’t impact their ability to meet deadlines.
Between now and the summer, we’ll be releasing some of the tools we are using for metrics collection and dashboard building on GitHub. I will be going into some technical detail (read: real code!) on how we integrate these tools.