GraphBuilder - Scalable Graph Construction using Hadoop

A session at Strata New York 2012

  • Nilesh Jain

Wednesday 24th October, 2012

5:00pm to 5:50pm (EST)

The exponential growth in the study of graph-based data dependencies is fueling the need for large scale machine learning frameworks and techniques. The nature of these computations is iterative and compute-centered. Recently, frameworks, such as Google’s Giraph, Apache’s Hama, and CMU’s GraphLab, have emerged to perform these computations in a distributed manner at commercial scale. But feeding data to these frameworks is a huge challenge in itself. Since graph construction is a data-parallel problem, Hadoop is well-suited for this task but lacks some elements that would make things easier for Map-Reduce programmers.

In this talk, Nilesh will introduce GraphBuilder, a graph construction library for Apache Hadoop. GraphBuilder makes the job easy by providing services for transforming unstructured data into graphs, graph cleaning, output-formatting, and partitioning graphs ahead of cluster ingress.

Nilesh will review emerging frameworks for graph-based machine learning and explain the benefits of GraphBuilder by sharing end-to-end case studies for complex machine learning applications, such as sentiment analysis and perceptual computing. Finally he will explain how his work is evolving to accommodate more frameworks and complex ingress structures.

About the speaker

This person is speaking at this event.
Nilesh Jain

Sr. Research Scientist, Intel Corp

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 5:00pm5:50pm EST

Date Wed 24th October 2012

Short URL


View the schedule


See something wrong?

Report an issue with this session