Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX

A session at Spark Summit 2015

  • Ming Huang

Monday 15th June, 2015

1:30pm to 2:00pm

Most Taobao's graph datasets for production are very huge. An increasingly challenge in network analysis is efficient detection of communities in dynamic networks. It's necessary for model to dynamically update communities with real-time data streams. So we can better predict users' behavior. In our work, we propose Hybrid Community Detection, a hybrid process model which takes full advantage of Spark, combines with online incremental community detection using Spark Streaming, and offline community detection using Spark GraphX. Results of real-world network data demonstrate that Hybrid Community Detection can continuously provide stable results with high quality. Meanwhile, Hybrid Community Detection is much faster than other offline algorithms, and show great potential in many areas´╝î like fraud detection, marketing strategy and so on.

About the speaker

This person is speaking at this event.
Ming Huang

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 1:30pm2:00pm PST

Date Mon 15th June 2015

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session