In our first ever guest post for the Lanyrd blog, Matt Biddulph shares an interactive visualisation of Lanyrd's conference topic data and explains how he created it.
A few months ago, when the SXSW 2012 Panel Picker was opened up to the public, I made a visualisation of the topics of all the proposed talks. It summarises the relationships between topics such as social media, blogging and UX by clustering them together if they are often used together to tag the same talk. Conversely, if two tags rarely occur together, they are far apart in the visualisation.
This week the guys at Lanyrd kindly provided me with the data I needed to the same thing for the whole world of conferences. The resulting interactive visualisation is a fascinating snapshot of the most interesting topics of discussion across many industries.
As regular internet conference-goers would expect, there are large clusters of core expertise such as design, development, mobile and social media. Zoom in closer and you find clusters of special interests: media, journalism and newspapers; aviation and aerospace; healthcare, pharmaceutical and biotech. At the edge of a large area of government, open data and public sector topics is a small offshoot of city-planning and public transport. An archipelago of librarians and book publishers extends from a broad business/startups/entrepreneurship island, and just offshore from there lie the shoals of taxonomy, morphology and cladistics (study of the classification of animals and plants). Spend some time zooming in deep and you'll find plenty more like this.
For the curious, here's how I made the visualisation:
Most conferences on Lanyrd have one or more tags on them to summarise the topics of the event. I used a great piece of open source graph-visualisation software called Gephi to explore the relationships between them. Simon provided me with a data dump that looked a bit like this:
Strata New York 2011: analytics,big-data,data-science,hadoop,oreilly ...
I wrote some Ruby code to load this file and make a graph structure: every time a tag like "hadoop" occurred along with another tag such as "analytics", that indicates that hadoop and analytics are related. So my code make a connection between the two. Every time the same two tags are used together again, the connection is strengthened.
Loading the resulting data into Gephi, I ran further analysis on it. Using an algorithm called HITS it calculates which are the "hub" nodes in the graph - the ones that are most central and most connected. I used this to decide the size of each tag, which is why core tags like social media, entrepreneurship and open source are visible even when you zoom out. I coloured the circles depending on how many times a tag is used anywhere on lanyard, from blue (little use) to red (lots of use).
Finally I used an algorithm called OpenOrd to decide the layout. It does a great job of using the relationships between the tags to bring related tags together and push unrelated tags apart, revealing clusters of tags as it does so. Before exporting the final visualisation, I switched off the drawing of lines between nodes because they look very cluttered and don't add much information for the viewer because of the high level of inter-relationship between topics.