•  

Sessions at Hadoop Summit 2012 about realtime

Your current filters are…

Wednesday 13th June 2012

  • Realtime analytics with Storm and Hadoop

    by Nathan Marz

    Storm is a distributed and fault-tolerant realtime computation system, doing for realtime computation what Hadoop did for batch computation. Storm can be used together with Hadoop to make a potent realtime analytics stack. Although Hadoop is not a realtime system, it can be used to support a realtime system. Building a batch/realtime stack requires solving a lot of sub-problems: – Getting data to both Hadoop and Storm – Exporting views of the Hadoop data into a readable index – Using an appropriate queuing broker to feed Storm – Choosing an appropriate database to serve the realtime indexes updated via Storm – Syncing the views produced independently by Hadoop and Storm when doing queries Come learn how we`ve solved these problems at Twitter to do complex analytics in realtime.

    At 10:30am to 11:10am, Wednesday 13th June

  • Big Data Real-Tie Applications: A Reference Architecture for Search, Discovery and Analytics

    by Justin Makeig

    In this time of change, Federal, State, and Local governments are turning to Hadoop and related big data projects for assistance in both providing new services to customers as well as cost cutting. This presentation will discuss several high profile projects using Hadoop in the public sector. The U.S. Naval Air Systems Command uses Hadoop for aircraft maintenance information. Tennessee Valley Authority uses Hadoop to store massive amounts of power utility sensor data. Pacific Northwest National Laboratory, a Department of Energy national lab, has applied Hadoop to bioinformatics analysis. These and other public sector examples will be discussed.

    At 4:30pm to 5:10pm, Wednesday 13th June

Thursday 14th June 2012

  • A Real-Time Sentiment Analysis Application using Hadoop and HBase in the Cloud

    by Jagane Sundar

    This application serves as a tutorial for the use of Big Data in the cloud. We start with the commoncrawl.org crawl of approximately 5 Billion web pages. We use a Map Reduce program to scour the commoncrawl corpus for web pages that contain mentions of a brand or keyword of interest, say, `Citibank`, and additionally, have a `Follow me on twitter` link. We harvest this twitter handle, and store it in HBase. Once we have harvested about 5000 twitter handles, we write and run a program to subscribe to the twitter streaming API for public status updates of these folks. As the twitter status updates pour in, we use a natural language processing library to evaluate the sentiment of these tweets, and store the sentiment score back in HBase. Finally, we use a program written in R, and the rhbase connector to do a real time statistical evaluation of the sentiment expressed by the twitterverse towards this brand or keyword. This presentation includes full details on installing and operating all necessary software in the cloud.

    At 2:25pm to 3:05pm, Thursday 14th June