Your current filters are…
by Nathan Marz
Storm is a distributed and fault-tolerant realtime computation system, doing for realtime computation what Hadoop did for batch computation. Storm can be used together with Hadoop to make a potent realtime analytics stack. Although Hadoop is not a realtime system, it can be used to support a realtime system. Building a batch/realtime stack requires solving a lot of sub-problems: – Getting data to both Hadoop and Storm – Exporting views of the Hadoop data into a readable index – Using an appropriate queuing broker to feed Storm – Choosing an appropriate database to serve the realtime indexes updated via Storm – Syncing the views produced independently by Hadoop and Storm when doing queries Come learn how we`ve solved these problems at Twitter to do complex analytics in realtime.
by Justin Makeig
In this time of change, Federal, State, and Local governments are turning to Hadoop and related big data projects for assistance in both providing new services to customers as well as cost cutting. This presentation will discuss several high profile projects using Hadoop in the public sector. The U.S. Naval Air Systems Command uses Hadoop for aircraft maintenance information. Tennessee Valley Authority uses Hadoop to store massive amounts of power utility sensor data. Pacific Northwest National Laboratory, a Department of Energy national lab, has applied Hadoop to bioinformatics analysis. These and other public sector examples will be discussed.
This application serves as a tutorial for the use of Big Data in the cloud. We start with the commoncrawl.org crawl of approximately 5 Billion web pages. We use a Map Reduce program to scour the commoncrawl corpus for web pages that contain mentions of a brand or keyword of interest, say, `Citibank`, and additionally, have a `Follow me on twitter` link. We harvest this twitter handle, and store it in HBase. Once we have harvested about 5000 twitter handles, we write and run a program to subscribe to the twitter streaming API for public status updates of these folks. As the twitter status updates pour in, we use a natural language processing library to evaluate the sentiment of these tweets, and store the sentiment score back in HBase. Finally, we use a program written in R, and the rhbase connector to do a real time statistical evaluation of the sentiment expressed by the twitterverse towards this brand or keyword. This presentation includes full details on installing and operating all necessary software in the cloud.
13th–14th June 2012