Thursday 11th April, 2013
2:00pm to 2:30pm
The use of Twitter at professional conferences as part of social media “amplification" is now widespread. This practice generates large amounts of data which have the potential to reveal professional practices, connections, and meaning making around such events. Taking the Open Repositories 2012 conference’s #OR2012 hashtag as an example data set, we apply computational methods for analysing Twitter conversations, developed by the Analysing Social Media Collaboration research group for the ongoing JISC Twitter Analysis Workbench project, and reflect on the process and issues raised.
Tweets are clustered into groups using features extracted from the language used within that tweet. TF-IDF extracted keywords are used to determine the most significant terms and the tweets are split into groups using cosine similarity of the textual features. There are several parameters which can be manually adapted to the specific data set such as the threshold for cluster inclusion and the life span of a tweet, how long until it is removed from a cluster. The Twitter WorkBench offers several methods for visualizing this information including, a time sliced animation where the clusters appear as the topics contained in that cluster are discussed, and as a graphical figure which indicates amongst other things the quality, persistence and growth of the clusters over the entire time period.
We find that scale of participation in a hashtag significantly impacts how such methods - originally intended for the Twitter “firehose” of public tweets - may be applied, and the type and flow of conversations detected.
This analysis method captures and cluster many tweets but can exclude those using non-standard terms or abbreviations - a challenge for any computational analysis of Twitter where 140 character restrictions make abbreviation a common practice. Hybrid human-computer methods to allow clustering of such Twitter conversations are thus considered as an area for further investigation.
Further challenges raised around analysis of a hashtag with multiple strands are considered such as the identification of the different threads based on content and relevance of tweets; how cross-pollination and backchat between threads may be properly interpreted; and how noise (and spam) may be excluded. Whilst some advocate the use of unique identifiers for event sessions, such practices have not been widely adopted and risk creating silos in the Twitter back channel rather than encouraging the type of serendipitous discovery Twitter is well regarded for.
Whilst key conference themes do emerge the analysis of tweets finds travel plans, location, meals, and social plans all feature prominently in Twitter. This raises questions for those analysing Twitter discussions and for those organising events. Are such tweets relevant in the analysis of activity around a hashtag? Should these more informal discussions between professionals be removed or normalised? Can social connections between participants be mapped or inferred from these twitter interactions and indications of following/follower status - and is such analysis ethical despite the public nature of these conversations?
Sign in to add slides, notes or videos to this session