A Real-Time Sentiment Analysis Application using Hadoop and HBase in the Cloud

A session at Hadoop Summit 2012

Thursday 14th June, 2012

2:25pm to 3:05pm (PST)

This application serves as a tutorial for the use of Big Data in the cloud. We start with the commoncrawl.org crawl of approximately 5 Billion web pages. We use a Map Reduce program to scour the commoncrawl corpus for web pages that contain mentions of a brand or keyword of interest, say, `Citibank`, and additionally, have a `Follow me on twitter` link. We harvest this twitter handle, and store it in HBase. Once we have harvested about 5000 twitter handles, we write and run a program to subscribe to the twitter streaming API for public status updates of these folks. As the twitter status updates pour in, we use a natural language processing library to evaluate the sentiment of these tweets, and store the sentiment score back in HBase. Finally, we use a program written in R, and the rhbase connector to do a real time statistical evaluation of the sentiment expressed by the twitterverse towards this brand or keyword. This presentation includes full details on installing and operating all necessary software in the cloud.

About the speaker

This person is speaking at this event.
Jagane Sundar

Principal — Independent Software Professional

Jagane Sundar is a Cloud Compute and Hadoop expert. He has developed Livebackup and peer to peer disk synchronization technology for KVM. He has a great deal of expertise in cloud storage, which he applies to making Hadoop work well in the cloud. Earlier in his career, he developed NFS client software for Sun Microsystems JavaOS and a TCP stack for Packet Design.

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 2:25pm3:05pm PST

Date Thu 14th June 2012

Short URL


Official event site


View the schedule



See something wrong?

Report an issue with this session