Your current filters are…
by Aaron Bedra
Our software collects data every day. It fills our databases, logs, and various other crevices. The question is, how do we use it? Most of us simply collect it in case we need it some day. The data your software collects can be the secret to unlocking new potential in your market. It can tell you things your users aren’t. It’s important to know how to make your data work for you.
Join Clojure/core’s Aaron Bedra as he takes you through the beginning of what will turn out to be a wonderful relationship. Aaron will introduce Incanter, a statistical programming package for Clojure. He will take you from raw data to raw power in just a few short lines of code.
The R programming language has become a standard environment for statistical computing, but out of the box R is restricted to analysis on data sets that fit in memory. Hadoop has become a popular platform for storing and analyzing data sets that are too large to fit on a single machine. Not surprisingly, there’s significant interest in bringing these two platforms together to perform sophisticated analysis on data that’s too large to fit in memory on a single machine. Although there are several systems being developed by the R community to support this such as Ricardo and RHIPE, as well as newer interfaces such as Segue and Hadoop InteractiVE, there’s still considerable confusion as to how to effectively use these two systems together. This talk will provide a survey of available R/Hadoop interfaces and use an example use case to provide a comparison between systems. We’ll also discuss problems that are a good fit for distributed analysis with R, and those that aren’t.
18th–20th September 2011