Diagnosing Open-Source Community Health with Spark

A session at Spark Summit 2015

Monday 15th June, 2015

4:00pm to 4:30pm

Successful companies use analytic measures to identify and reward their best projects and contributors. Successful open source developers often make similar decisions when they evaluate whether or not to reward a project or community by investing their time. This talk will show how Spark enables a data-driven understanding of the dynamics of open source communities, using operational data from the Fedora Project as an example. With thousands of contributors and millions of users, Fedora is one of the world's largest open-source communities. Notably, Fedora also has completely open infrastructure: every event related to the project's daily operation is logged to a public messaging bus, and historical event data are available in bulk. We'll demonstrate best practices for using Spark SQL to ingest bulk data with rich, nested structure, using ML pipelines to make sense of software community data, and keeping insights current by processing streaming updates.

About the speaker

This person is speaking at this event.
Will Benton

Former computer scientist, husband, dad, confessional Lutheran; spare-time activities include rotating cogs, capturing light, and synthesizing sound. bio from Twitter

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 4:00pm4:30pm PST

Date Mon 15th June 2015

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session