Sessions at Surge 2010 on Friday 1st October

Your current filters are…

  • Enterprise solutions from commodity components: The Promise and the Peril

    by Bryan Cantrill

    The economics of commodity components are undeniable, but they also can suffer from acute reliability problems that introduce new (and often unanticipatable) failure modes. Even in a thoughtful architecture that is putatively designed around unreliable components, these failure modes can have dire consequences, potentially cascading into systemic failure. This talk will dissect some examples of these failures, exploring how the original failing component was able to induce broader failure, how the problem was ultimately understood, and what larger lessons can be drawn from the experience.

    At 9:00am to 10:00am, Friday 1st October

  • Go with the flow - Meditations on network infrastructure analysis

    by Benjamin Black

    Highly scaled distributed web applications are predicated on a functional network, yet organizations rarely have detailed information about the consumption and expense of network resources. This data is essential for effective denial of service detection, intrusion detection, troubleshooting, capacity planning, and traffic engineering, but the time, cost and knowledge required to acquire and analyze the data can be a prohibitive barrier. Most organizations default to reactively analyzing this information after the fact, if at all. The dynamic nature of modern infrastructures can make these challenges even more acute.

    This presentation will investigate representative scenarios that would benefit from detailed understanding of network traffic while outlining principles and tools for gathering and evaluating the data.

    At 9:00am to 10:00am, Friday 1st October

    Coverage video

  • Scaling - Lessons Learned From Rapid Growth

    by Gavin M. Roy is one of the top 25 most trafficked websites in the United States, experiencing large scale growth over a very short period of time. Employing technologies such as PHP, PostgreSQL, memcached as well as newer cutting edge technologies, has been able to achieve operational stability in the face of large volumes of traffic. In this talk Gavin will review the growing pains and methodologies used to handle the consistent growth and demand while affording the rapid development cycles required by the product development team.

    At 10:00am to 11:00am, Friday 1st October

  • Don't bet the farm on your cache

    What happens when the one part of your infrastructure that should never go down misbehaves? This is a case-study of all the chaos that ensued when the unthinkable happened -- the cache layer went down. endures several DDOS attempts everyday and, on that particular day, someone got lucky. We will discuss several key factors that inevitably caused an outage of one of the world's most popular new sites from a relatively minor DOS attack including:

    • Bugs in code
    • Overzealous configurations
    • Lack of real world testing
    • Insufficient monitoring

    We will also discuss the immediate solutions we used to get the site back on the air, as well as the long term fixes to the underlying issues.

    At 11:00am to 12:00pm, Friday 1st October

  • The "Go or No-Go": Operability and Contingency at Etsy

    by John Allspaw

    You've been working on the wicked new feature for a long time. Design is done, the product people love it, and the code's about as polished as it can be. Launching new public-facing features is different than making small changes to existing functionality. I'll talk about the process we have at Etsy (influenced by Flickr's) for making sure that new awesome thing is *operable* and the right attention has been given to contingency planning, on both the technical and human sides.

    At 11:00am to 12:00pm, Friday 1st October

  • Top 10 Lessons Learned from Deploying Hadoop in a Private Cloud

    by Rod Cope

    Hadoop, HBase, and friends are built from the ground up to support Big Data/NoSQL, but that doesn't make them easy. Just like with any other relatively new and complex technologies, there are some rough edges and growing pains to manage. I've learned some hard lessons while deploying HBase tables containing billions of rows and dozens of terabytes on OpenLogic's Hadoop infrastructure. Come to this session to learn about some of the "gotchas" you might run into when deploying Hadoop and HBase in your own private cloud and how to avoid them.

    Here are some general areas we'll explore:

    • Hard-to-find configuration problems and debugging techniques
    • Under-documented yet critical features
    • Deployment recommendations for particular use cases
    • Advice on how to import Big Data
    • Using JRuby/Ruby to make life with Hadoop and HBase easier

    At 11:00am to 12:00pm, Friday 1st October

    Coverage video

  • Design for Scale - Patterns, Anti-Patterns, Successes and Failures

    by Christopher Brown

    This isn't your "Gang of Four". Christopher will discuss his experiences building Amazon's EC2 and the Opscode Platform, and the experiences of others designing large-scale online services. From API to access control, to deployment and configuration, we'll explore the techniques that work, and some that don't with an critical eye toward your next design.

    At 1:30pm to 2:30pm, Friday 1st October

    Coverage video

  • Quantifying Scalability FTW

    by Neil Gunther

    You probably already collect performance data, but data ain't information. Successful scalability requires transforming your data to quantify the cost-benefit of any architectural decisions. In other words:

    information = measurement + method

    So, measurement alone is only half the story; you need a method to transform your data. In this presentation I will show you a method that I have developed and applied successfully to large-scale web sites and stack applications to quantify the benefits of proposed scaling strategies. To the degree that you don't quantify your scalability, you run the risk of ending up with WTF rather than FTW.

    At 1:30pm to 2:30pm, Friday 1st October

    Coverage video

  • Anycast Routing: Local Delivery

    by Tom Daly

    Anycast Routing is used on the Internet to provide many services, including NTP and DNS, but very few know that you can locally deliver websites and content over HTTP/TCP/Anycast. There's many factors that go into designing an anycasted network, including:

    • Site and Carrier Selection — why both are important
    • Routing Protocol Design and BGP Policy
    • Load Balancing in Datacenters without Load Balancers
    • Application Design, State Management, specifics for TCP applications
    • Statistics Collection, Reporting, and Monitoring (Internally and Externally)
    • Distributed Denial of Service Attacks and Anycast Benefits and Risks

    We'll discuss a real world event on Dyn Inc's network which caused a severe service degradation for one of our nameservers due to uncontrolled anycast route propagation, where global traffic landed in our Tokyo datacenter. (failure)

    We'll also depict how live DDoS attacks are contained to their source region based upon anycast routing. (success)

    At 2:30pm to 3:30pm, Friday 1st October

    Coverage video

  • From disaster to stability: scaling challenges of

    by Cosimo Streppone

    My Opera started around 2002 as a hacked version of phpBB. By 2007, it was slowly heading for disaster, with severely overloaded databases and backends. Our (back then) million of users were just as frustrated as us.

    Today, we have 5M+ users and growing, a lot more features, APIs, browser integration services, and the site is stable and fast. This talk tells the story of these last 3 years. Our successes, our failures, and what remains to be done.

    At 2:30pm to 3:30pm, Friday 1st October

    Coverage video

  • Availability, the Cloud and Everything

    by Joe Williams

    The talk will focus on how I (with the help of the entire Cloudant team) built our database service based on CouchDB on top of EC2. Specifically how we use Erlang, Chef, EC2 and other tools to build highly available and performant database clusters. This includes using Chef and Erlang's hot code upgrades to automate cluster-wide upgrades without restarting any services.

    At 4:00pm to 5:00pm, Friday 1st October

    Coverage video

  • Why Some Architects Almost Never Shard Their Applications

    by Baron Schwartz

    "Shard early, shard often" is common advice -- and it's often wrong. In reality, many systems don't have to be sharded. Sharding is a strategy that should be understood in its context: as one of the many legitimate choices. This session covers a spectrum of strategies for scaling an application. It gives special coverage to topics that typically force sharding, such as write workload, choice of database technology, and choice of deployment platform. You'll learn the pros and cons of various strategies, and how to avoid the pitfalls and capitalize on the upsides.

    At 4:00pm to 5:00pm, Friday 1st October

    Coverage video

  • Plenary Keynote - A Scalability Call to Action

    by Theo Schlossnagle

    At 5:00pm to 5:30pm, Friday 1st October

    Coverage video