Over the last year, Netflix has migrated its website and streaming service from a conventional datacenter implementation to the Amazon public cloud. Along the way, we re-wrote most of our code base, built a completely new data source backend based on SimpleDB and Cassandra, and re-tooled our processes with high levels of automation. As a result, despite high and accelerating growth rates in Netflix subscriber counts, the growth rate of Netflix’ datacenter footprint has been halted, and all capacity expansion is now leveraging AWS. Since data is modified in the datacenter and in the cloud, bidirectional replication has been implemented, and Netflix has had to learn “roman riding” (look it up) with one foot in each environment. In this talk, Netflix’ Cloud Architect Adrian Cockcroft will discuss the datacenter anti-patterns that motivated a new code architecture, data architecture and deployment model. The Netflix cloud architecture takes advantage of almost every feature of AWS, and is optimized for running in a highly automated environment with ephemeral instances, non-deterministic performance, and agile deployment processes.
Infrastructure is code. The separation between how you manage infrastructure and applications is disappearing.
System administrators love Chef because it gives them flexibility to integrate all aspects of their infrastructure such as monitoring and trending tools with applications. Software developers love Chef because it helps them take care of the muck so they can focus on writing great applications.
This workshop will cover:
- How system integration goes beyond just configuration management. - Chef’s architecture and design including tools and capabilities - Anatomy of a Chef run. - Chef concepts, such as Roles, Recipes, Clients, Nodes. - How to download, customize and use existing cookbooks. - How to write data driven cookbooks that use the Chef Server search indexes and data storage.
by John Allspaw
Getting tight operationally means strengthening the resiliency of both your stack _and_ your organization's response to issues that arise. Outage postmortem meetings done wrong can be stress-filled blamefests, and done right can be collaborative illustrations of Resilience Engineering. I'll use Etsy.com examples to illustrate sticky topics such as Root Cause Analysis and Human Error.
by Yehuda Katz
The SproutCore framework has evolved over the past five years to be an extremely high-performance framework that focuses on making it possible to build native-like applications in the browser.
This means handling problems like working with extremely large data-sets, inconsistent connectivity, and complex DOMs. Lately, it has meant figuring out how to properly use new browser features that can make a big difference to perceived performance, like hardware acceleration.
In this talk, Yehuda will cover some of the techniques that SproutCore has used historically to enable extremely complex applications to perform well in the browser, as well as what new technologies the team is looking at to leverage the latest browser technologies in building compelling content for the web.
This keynote will be a whimsical whirlwind tour through the evolution of a “career” in web operations.
by Douglas Crockford
There are lies, damned lies, and benchmarks. Tuning language processor performance to benchmarks can have the unintended consequence of encouraging bad programming practices. This is the true story of developing a new benchmark with the intention of encouraging good practices.
by John Rauser
Modern monitoring software makes it easy to plot a statistic like average latency every minute—too easy. Fancy dashboards of time series plots often lull us into a false sense of security. Underneath every point on those plots is a distribution, and underneath that distribution is a series of individuals: your customers. If you don’t take the time to look deeply at your data, you don’t truly understand your business.
by Manny Gonzalez and Vik Chaudhary
Keynote demonstrates how you can improve the end-user experience of your latest smartphone apps. This year at Velocity, Keynote debuts Mobile Device Perspective 5.0 (MDP), a cloud based testing and monitoring platform for ensuring the end to end quality of iPhone, Android and BlackBerry mobile apps accessing online content, streaming video, music and games. Learn how MDP’s mobile monitoring capabilities are designed for testing and optimizing smartphone access with gestures and touch events, using real mobile devices connected to the latest 3G and 4G networks in multiple mobile markets across the globe.
by Mark Burgess
The key challenges for infrastructure designers and maintainers today are scale, speed and complexity. Mark Burgess was one of the first people to look for ways of managing these issues based on theoretical analysis. Much of his work has gone into the highly successful software Cfengine, which is still very much a leading light in the industry. In this session, Mark will ask if we have yet learned the lessons of infrastructure management, and, either way, what must come next.
by Tim O'Reilly
Tim O’Reilly shares his insights into the world of emerging technology, presenting his take on what matters most – and what will be most disruptive – to the tech community.
As web applications continue to become more interactive and sophisticated, real-time messaging and updates are becoming increasingly prevalent. One of the hottest new APIs in HTML5, is WebSocket, which enables true duplex communication without the overhead, complexity, and extraneous latency of HTTP-based solutions. In this talk, we will see how the WebSocket removes these barriers to create optimal real-time delivery of messages from servers to desktop and mobile web browsers. Although WebSocket is an exciting new API, we will see how we can easily fallback to HTTP-based techniques when WebSocket is not available with Dojo’s Socket API. The server-side is equally important, and real-time messaging has pushed the need for asynchronous I/O in the server. We look at how we can create scalable real-time applications using the Node.js platform that is so perfectly suited for Comet, using the Tunguska library. The presentation will cover the use of streaming abstractions to minimize buffering. We will also consider the performance implications of topic-based publish-subscribe distribution versus filtering techniques.
SPDY was proposed by Google back in November 2009 to reduce the latency and load time of web pages. It was provided as part of the Chromium open-source project and is enabled in Chrome by default.
We at Cotendo took on the challenge, implemented the server side, and extended our proxies to support SPDY, providing SPDY to HTTP “translation”. Guess what? It really speeds things up. But like all new good things, there is still work to do. We will share insights from our implementation, optimization of SSL-based traffic and present performance data both from Google’s and our customers’ deployment.
We believe the introduction of SPDY as a new application layer presents a unique opportunity to rethink web design concepts and front-end-optimization (FEO) techniques. We will discuss some optimizations we developed and suggest some guidelines on how you can approach these new types of optimizations.
by Lew Cirne
New Relic’s multitenant, SaaS web application monitoring service collects and persists over 90,000 metrics every second on a sustained basis, while still delivering an average page load time of 1.5 seconds. In this session I will discuss how good architecture and good tools can help you handle an extremely large amount of data while still providing extremely fast service. I’ll show you how we scale to support customer growth, how we monitor our system, and what traps to look out for.
You should come away from this session with an understanding of how to:
We at ImmobilienScout24 are Germany’s leading real estate listing portal. We run >700VMs hosting >100 services for operations, quality assurance and development based on Red Hat Linux, Java, Tomcat, Oracle/MySQL and all the other usual Open Source web solutions.
Currently we are in the process of packaging our entire software stack in RPM packages and even deploy configurations through RPMs.
We found out that this technology helped us a lot to work together with our developers in building our services. Unlike before the developers are now fully involved in all operational decisions about their applications and actually build their software RPMs themselves through automated build tools.
The integration between configuration, application stack and base operating system brought many additional benefits for provisioning, automated testing, auditing and others.
Having learned about other organization that use packages for deployment we would like to use the Velocity Conference as an opportunity to talk about package deployment and give those, who do not choose recipe-based deployment like chef and puppet, a place to talk to others doing the same.
A possible result of this BoF could be a collection of best practice approaches to package deployment.
HTML5; it’s new, it’s awesome, and it’s powerful, but can it take down the champ of video distribution, Flash. Which technology's got the ability to bring cat video to the next level. This talk will cover the many challenges of both HTML5 and YouTube adapting to change video distribution on the web.
by Jon Jenkins
Operations is can play a critical role in driving revenue for the business. This talk will explore some ways in which the ops team at Amazon is thinking outside the box to drive profitability. JJ will also issue a challenge for next year’s Velocity Conference.
Artur Bergman on SSDs.
It’s easy to forget that the story of infrastructure is a human story. In this session, we will take a step back and trace the history of automation and virtualization, touching on specific pioneers and the use cases they developed for both technologies, looking at the emergence of the devops role and its significance as organizations move towards the cloud.
by John Resig
Sixty-five engineers at Etsy deploy code to our production servers more than 30 times a day. We keep this process safe with a suite of unit tests, integration tests, and a large number of application-centric dashboards written by engineers. We capture metrics in Ganglia, Cacti, and Graphite and these metrics from technical aspects like outgoing bandwidth and web server requests per second to business aspects like new registrations and gross sales.
I plan to present an overview of the tools we use for collecting metrics and the code we use to quickly build one-page dashboards for different aspects of our site (e.g. general health, image storage, search infrastructure). The underlying theme is that these tools are not difficult to use, but typically lie in the “operations” domain. At Etsy, we’ve gone to great strides to get engineers excited about contributing to metrics and dashboards, and make it dead simple to do these things quickly so that it doesn’t impact their ability to meet deadlines.
Between now and the summer, we’ll be releasing some of the tools we are using for metrics collection and dashboard building on GitHub. I will be going into some technical detail (read: real code!) on how we integrate these tools.
Adam Jacob returns with Choose Your Own Adventure 2: Electric Boogaloo ;-)
14th–16th June 2011