Sessions at PyCon US 2012 about Stream Processing

Your current filters are…

Saturday 10th March 2012

  • Storm: the Hadoop of Realtime Stream Processing

    by Gabriel Grant

    Twitter's new scalable, fault-tolerant, and simple(ish) stream programming system... with Python!

    Storm is a high-volume, continuous, reliable stream processing system developed at BackType and recently open-sourced by Twitter. Though most of the system (and it's documentation) is written in Java-based languages, it is possible to use in a Python environment with Python-based analysis code. At DotCloud (our application-platform-as-a-service) we're doing just that, and we'll be showing how you can too.

    We collect a lot of data: we have tens of thousands of customers, many of whom have dozens of services running on our platform, each of which in turn produces dozens of metrics every second. All in all, we're dealing with millions of datapoints per minute. Storm will be the third iteration of our metrics system, an attempt at standardizing a number of previously-distinct pieces of our infrastructural software, to enable automated, real-time reactions to changes in the platform's state.

    We'll start by touching on what problems Storm is (and isn't) trying to solve and why it's model is so powerful, informed by our previous attempts to solve the stream processing problem. We'll then move on to a deep dive into how to get Storm up and running with the most Python and least Java-enduced-pain possible and finish up with tips to solve some of the challenges we've encountered while adopting Storm into our Python-based development process.

    The Problem:

    What is stream processing?
    High volume, Continuous, Reliable data analysis
    How do people solve this today?
    The Solution:

    Storm's overall model
    Automatic parts
    Why is this solution better?
    The Hard Part, Made Simple:

    Build a topology
    Code your processors
    The Simple Part, Made Hard (made simple):

    Java (ugh)
    Clojure (less ugh)

    At 2:15pm to 2:55pm, Saturday 10th March

    In E2, Santa Clara Convention Center