Sessions at Strange Loop 2011 about Hadoop

Your current filters are…

Sunday 18th September 2011

  • Cascalog

    by Nathan Marz

    Cascalog is a data processing library for Clojure for processing Big Data on top of Hadoop or doing analysis on a local computer from the REPL. Cascalog combines the conciseness of a DSL with the power of a general purpose programming language. Cascalog is inspired by Datalog and blends logic programming with functional programming.

    In this workshop, we’ll learn the basics of Cascalog. We’ll cover the API, inner and outer joins, functions, aggregators, negations, how the query planner works, how to create custom operations for queries, and how to read from diverse datasources like HDFS, MySQL databases, and others.

    The format of the workshop is short lectures followed by interactive problem solving sessions where you’ll work on problems that utilize the new concepts from the lectures.

    At 3:00pm to 6:00pm, Sunday 18th September

    In Gateway 4, Hilton St. Louis at the Ballpark

Tuesday 20th September 2011

  • Hadoop and Cassandra sitting in a tree...

    by Jake Luciani

    This talk will cover the open source Brisk project: a tight integration of the Hadoop stack with Cassandra.

    We will look at the operational and performance benefits of running Hadoop in a peer-to-peer masterless architecture. We will cover design and tradeoffs of the system and look at use cases people are solving with Brisk. Finally we will run some live demos!

    At 8:30am to 9:20am, Tuesday 20th September

    In Gateway 4/5, Hilton St. Louis at the Ballpark

  • Distributed Data Analysis with Hadoop and R

    by Jonathan Seidman and Ramesh Venkataramaiah

    The R programming language has become a standard environment for statistical computing, but out of the box R is restricted to analysis on data sets that fit in memory. Hadoop has become a popular platform for storing and analyzing data sets that are too large to fit on a single machine. Not surprisingly, there’s significant interest in bringing these two platforms together to perform sophisticated analysis on data that’s too large to fit in memory on a single machine. Although there are several systems being developed by the R community to support this such as Ricardo and RHIPE, as well as newer interfaces such as Segue and Hadoop InteractiVE, there’s still considerable confusion as to how to effectively use these two systems together. This talk will provide a survey of available R/Hadoop interfaces and use an example use case to provide a comparison between systems. We’ll also discuss problems that are a good fit for distributed analysis with R, and those that aren’t.

    At 9:30am to 10:20am, Tuesday 20th September

    In Lindbergh, Hilton St. Louis at the Ballpark

    Coverage slide deck