Analyzing Milions of GitHub Commits: What Makes Developers Happy, Angry, and Everything in Between?

A session at Strata New York 2012

Wednesday 24th October, 2012

11:40am to 12:20pm (EST)

Open source developers all over the world contribute to millions of projects every day on GitHub: writing and reviewing code, filing and discussing bug reports, updating documentation and project wikis, and so forth. The data generated from this activity can reveal interesting trends across many industries, including popularity of programming languages over time, defect rates, contribution metrics, and popularity of specific frameworks and libraries.

To help us extract the insights from the public GitHub timeline which generated hundreds of thousands of daily events, we imported the entire dataset into Google BigQuery. This makes data about tens of millions of open source commits and discussions accessible to the world for quick interactive analysis. With that, we can run our analysis:

Who are the most productive developers using GitHub? Which languages are growing in popularity and why? Which language features result in the most angst and developer pain? What makes open source developers happy?

In this session, we will answer the above questions and much more. We will also discuss our experience in using BigQuery, how we modeled the GitHub event data, and the lessons learned in importing and making the data available.

About the speakers

This person is speaking at this event.
Brian Doll

Technology & Business Hacker / Marketing at GitHub / http://about.me/briandoll bio from Twitter

This person is speaking at this event.
Ilya Grigorik

Web performance engineer at Google; co-chair of W3C Webperf WG. In short, an internet plumber.

Coverage of this session

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 11:40am12:20pm EST

Date Wed 24th October 2012

Short URL


View the schedule



See something wrong?

Report an issue with this session