Get Lanyrd on your mobile (iPhone, Android and more) - check it out here

Operating At Scale

A session at PuppetConf

With over 750 million active users worldwide, half of whom log in daily, Facebook is now the #1 most popular website. The infrastructure that powers Facebook is made up of many back-end services that all work together to provide a coherent user experience. The complex infrastructure has a cache with over 2 trillion objects in it, accessed 100 million times a second across multiple datacenters and geographies. There are over 900 million objects that people interact with and more than 30 billion items of content are uploaded every month.

Reliability is important to Facebook but failures will occur, and the Operations and Infrastructure Engineering teams need to respond to these failures quickly. Facebook is a fast paced environment and the principle of moving fast is applied not only to its engineering practices but also to how things are fixed. Systems, processes and culture all work together to make this happen.

This talk will highlight some of the systems and practices that are employed at Facebook to manage systems and software at scale. I will use a few case studies to describe how these are built and provide guidelines for how others can build their own systems and operations teams that can scale with infrastructure growth. It will touch on concepts like automation, communication, monitoring, incident management, infrastructure design & code releases.

About the speaker

This person is speaking at this event.
Pedro Canahuati

Manager of Site Reliability Engineering Department, Facebook

Sign in to add slides, notes or videos to this session

PuppetConf

United States United States, Portland

22nd23rd September 2011

Tell your friends!

When

Time 12:00pm12:45pm PST

Date Thu 22nd September 2011

Short URL

lanyrd.com/shtmm

Official event site

www.puppetconf.com

View the schedule

Topics

See something wrong?

Report an issue with this session