Tuesday 23rd October, 2012
9:00am to 12:30pm
Though getting the right data is perhaps the most essential part of any data journalism project, often one of the most difficult aspects of this type of project is cleaning and auditing the data so that it is usable – or even intelligible. In fact, one may not even know whether a particular data set is the right one for the story until it has been cleaned. Data problems can take many forms – from misspellings to mixed data types and everything in between. What’s more, there are a wide variety of tools that can be used to handle these cleaning tasks, and sometimes completing it efficiently requires applying several.
This 3-hour tutorial will provide novice users with an overview of a range of common tools use for data cleaning and analysis – including Microsoft Excel, Google Refine, Python and R – along with their relative strengths and weaknesses. In addition to demos of how the more advanced tools like Python and R can be used for text parsing and statistical analysis, hands-on training using concrete data examples with Excel and Refine will also be shown. By the end, attendees will not only have learned useful new skills in Excel and Refine, they will have a roadmap for what kind of expertise they need to look for if they have a more complex task.
Journalist, The New York World
Deputy Editor, @TheNYWorld, formerly @WSJGraphics interactive designer. bio from Twitter
Assistant Professor, Columbia J School & Tow Center for Digital Journalism. Data journalism, information visualization, web development, education. bio from Twitter
Sign in to add slides, notes or videos to this session