Sessions at Strata 2012 about Hadoop on Wednesday 29th February

Your current filters are…

  • The Apache Hadoop Ecosystem

    by Doug Cutting

    Apache Hadoop forms the kernel of an operating system for Big Data. This ecosystem of interdependent projects enables institutions to affordably explore ever vaster quantities of data. The platform is young, but it is strong and vibrant, built to evolve.

    At 8:50am to 9:00am, Wednesday 29th February

    In Mission City Ballroom, Santa Clara Convention Center

    Coverage video

  • Guns, Drugs and Oil: Attacking Big Problems with Big Data

    by Mike Olson

    Tools for attacking big data problems originated at consumer internet companies, but the number and variety of big data problems have spread across industries and around the world. I’ll present a brief summary of some of the critical social and business problems that we’re attacking with the open source Apache Hadoop platform.

    At 9:20am to 9:30am, Wednesday 29th February

    In Mission City Ballroom, Santa Clara Convention Center

    Coverage video

  • RHadoop, R meets Hadoop

    by Antonio Piccolboni

    Rhadoop is an open source project spearheaded by Revolution Analytics to grant data scientists access to Hadoop’s scalability from their favorite language, R. RHadoop is comprised of three packages.

    • rhdfs provides file level manipulation for HDFS, the Hadoop file system
    • rhbase provides access to HBASE, the hadoop database
    • rmr allows to write mapreduce programs in R

    rmr allows R developers to program in the mapreduce framework, and to all developers provides an alternative way to implement mapreduce programs that strikes a delicate compromise betwen power and usability. It allows to write general mapreduce programs, offering the full power and ecosystem of an existing, established programming language. It doesn’t force you to replace the R interpreter with a special run-time—it is just a library. You can write logistic regression in half a page and even understand it. It feels and behaves almost like the usual R iteration and aggregation primitives. It is comprised of a handful of functions with a modest number of arguments and sensible defaults that combine in many useful ways. But there is no way to prove that an API works: one can only show examples of what it allows to do and we will do that covering a few from machine learning and statistics. Finally, we will discuss how to get involved.

    At 10:40am to 11:20am, Wednesday 29th February

    In GA J, Santa Clara Convention Center

  • The Future of Hadoop: Becoming an Enterprise Standard

    by Eric Baldeschwieler

    During the last 12 months, Apache Hadoop has received an enormous amount of attention for its ability to transform the way organizations capitalize on their data in a cost effective manner. The technology has evolved to a point where organizations of all sizes and industries are testing its power as a potential solution to their own data management challenges.

    However, there are still technology and knowledge gaps hindering adoption of Apache Hadoop as an enterprise standard. Among these gaps are the complexity of the system, the lack of technical content that exists to assist with its usage, and that it requires intensive developer and data scientist skills to be used properly. With virtually every Fortune 500 company constructing their Hadoop strategy today, many in the IT community are wondering what the future of Hadoop will look like.

    In this session, Hortonworks CEO Eric Baldeschwieler will look at the current state of Apache Hadoop, how the ecosystem is evolving by working together to close the existing technological and knowledge gaps, and present a roadmap for the future of the project.

    At 10:40am to 11:20am, Wednesday 29th February

    In Ballroom CD, Santa Clara Convention Center

  • Hadoop + JavaScript: what we learned

    by Asad Khan

    In this session we will discuss two key aspects of using JavaScript in the Hadoop environment. The first one is how we can reach to a much broader set of developers by enabling JavaScript support on Hadoop. The JavaScript fluent API that works on top of other languages like PigLatin let developers define MapReduce jobs in a style that is much more natural; even to those who are unfamiliar to the Hadoop environment.

    The second one is how to enable simple experiences directly through an HTML5-based interface. The lightweight Web interface gives developer the same experience as they would get on the Server. The web interface provides a zero installation experience to the developer across all client platforms. This also allowed us to use HTML5 support in the browsers to give some basic data visualization support for quick data analysis and charting.

    During the session we will also share how we used other open source projects like Rhino to enable JavaScript on top of Hadoop.

    At 2:20pm to 3:00pm, Wednesday 29th February

    In Ballroom CD, Santa Clara Convention Center

  • Getting the Most from Your Hadoop Big Data Cluster

    by Rohit Valia

    The Hadoop framework is an established solution for big data management and analysis. In practice, Hadoop applications vary significantly. Your data center infrastructure is used by multiple lines of business and multiple differing workloads.

    This session looks at the requirements for a multi-tenant big data cluster: one where different lines of businesses, different projects, and multiple applications can be run with assured SLAs, resulting in higher utilization and ROI for these clusters.

    This session is sponsored by Platform Computing

    At 4:00pm to 4:40pm, Wednesday 29th February

    In Ballroom G, Santa Clara Convention Center

  • Hadoop Plugin for MongoDB: The Elephant in the Room

    by Steve Francia

    Learn how to integrate MongoDB with Hadoop for large-scale distributed data processing. Using tools like MapReduce, Pig and Streaming you will learn how to do analytics and ETL on large datasets with the ability to load and save data against MongoDB. With Hadoop MapReduce, Java and Scala programmers will find a native solution for using MapReduce to process their data with MongoDB. Programmers of all kinds will find a new way to work with ETL using Pig to extract and analyze large datasets and persist the results to MongoDB. Python and Ruby Programmers can rejoice as well in a new way to write native Mongo MapReduce using the Hadoop Streaming interfaces.

    At 4:00pm to 4:40pm, Wednesday 29th February

    In GA J, Santa Clara Convention Center

  • Analyzing Hadoop Source Code with Hadoop

    by Stefan Groschupf

    Using Hadoop based business intelligence analytics, this session looks at the Hadoop source code and its development over time and illustrates some interesting and fun facts we will share with the audience. This talk will illustrate text and related analytics with Hadoop on Hadoop to reveal the true hidden secrets of the elephant.

    This entertaining session highlights the value of data correlation across multiple datasets and the visualization of those correlations to reveal hidden data relationships.

    At 4:50pm to 5:30pm, Wednesday 29th February

    In GA J, Santa Clara Convention Center