by Jos Boumans
This talk takes you through setting up your infrastructure as code on EC2 with Puppet, Cloudkick, Ubuntu, boto, a few more tools and a handful of conventions, to take you from a small scrappy startup to a 200+ node deployment as a one-man ops team without waking you up at 4 am.
Find out the need-to-knows, best practices, pitfalls and common failure scenarios and how you can use and configure Puppet to protect you from them.
by John Willis and Damon Edwards
As IT continues its tectonic shift from cost center to profit center, the importance of IT operations is also being redefined. If you are primarily worried about uptime and server-to-admin ratios, you are already behind the times. Today's top performing operations professionals are obsessed with how they can enable the business to innovate quicker, react quicker to market and competitive demands, and scale up for the win. With a heathy doses of DevOps, Lean thinking, infrastructure as code, and open source tools like Puppet, these top performers are turning their operations from a necessary cost center into a strategic weapon. Are you and your organization ready to compete against them?
With over 750 million active users worldwide, half of whom log in daily, Facebook is now the #1 most popular website. The infrastructure that powers Facebook is made up of many back-end services that all work together to provide a coherent user experience. The complex infrastructure has a cache with over 2 trillion objects in it, accessed 100 million times a second across multiple datacenters and geographies. There are over 900 million objects that people interact with and more than 30 billion items of content are uploaded every month.
Reliability is important to Facebook but failures will occur, and the Operations and Infrastructure Engineering teams need to respond to these failures quickly. Facebook is a fast paced environment and the principle of moving fast is applied not only to its engineering practices but also to how things are fixed. Systems, processes and culture all work together to make this happen.
This talk will highlight some of the systems and practices that are employed at Facebook to manage systems and software at scale. I will use a few case studies to describe how these are built and provide guidelines for how others can build their own systems and operations teams that can scale with infrastructure growth. It will touch on concepts like automation, communication, monitoring, incident management, infrastructure design & code releases.
22nd–23rd September 2011