It’s easy to find and create data. But what are you going to do with it? Can I ask the world complex questions such as what’s the local crime rate, distance to metro, or rating of my local school? Can you combine these all together to rate houses you may want to buy? And how do you then connect back to your government and local businesses to engage in collaborative decision making.
This talk with discuss how you should consider users and their personal interactions with data and information. We’ll also peel back the covers on how open source tools such as HBase, Cascading, Geos and Polymaps handle analyzing and streaming realtime data to maps and visualizations both on the web and to mobile devices.
To illustrate what’s possible, we’ll dive through GeoCommons, a large online community of data sharing and community analytics that uses open source mapping visualization, Hadoop analysis, and mobile interfaces to provide this to the world. Users can even build and socialize their own analysis methods to share their expert knowledge with other users. We’ll also review how global organizations like the World Bank and United Nations are using these tools to connect with citizens in developing countries to empower them to make decisions on building investment and understanding how climate science may affect their areas.
Adding security to an existing product is never easy, but our team at Yahoo added strong authentication to Apache Hadoop by integrating it with Kerberos. This project was delivered on time and is currently deployed on all of Yahoo's 40,000 Hadoop computers. Come learn how we added security to and why it matters.
YARN is the next generation of Hadoop Map-Reduce designed to scale out much further while allowing for running applications other than pure Map-Reduce in a highly fault-tolerant manner.
This talk introduces an open-source SQL-based system for continuous or ad-hoc analysis of streaming data built on top of Flume-based data collection for Hadoop. Attendees will understand how to use a new tool to extend their Hadoop data collection pipeline with real-time streaming analytics.
25th–27th July 2011