Dealing with Dirty Data - Finding the Right Tool for the Job

A session at Strata New York 2012

Tuesday 23rd October, 2012

9:00am to 12:30pm (EST)

Though getting the right data is perhaps the most essential part of any data journalism project, often one of the most difficult aspects of this type of project is cleaning and auditing the data so that it is usable – or even intelligible. In fact, one may not even know whether a particular data set is the right one for the story until it has been cleaned. Data problems can take many forms – from misspellings to mixed data types and everything in between. What’s more, there are a wide variety of tools that can be used to handle these cleaning tasks, and sometimes completing it efficiently requires applying several.

This 3-hour tutorial will provide novice users with an overview of a range of common tools use for data cleaning and analysis – including Microsoft Excel, Google Refine, Python and R – along with their relative strengths and weaknesses. In addition to demos of how the more advanced tools like Python and R can be used for text parsing and statistical analysis, hands-on training using concrete data examples with Excel and Refine will also be shown. By the end, attendees will not only have learned useful new skills in Excel and Refine, they will have a roadmap for what kind of expertise they need to look for if they have a more complex task.

About the speakers

This person is speaking at this event.
Alice Brennan

Journalist, The New York World

This person is speaking at this event.
Mike Sullivan

Deputy Editor, @TheNYWorld, formerly @WSJGraphics interactive designer. bio from Twitter

This person is speaking at this event.
Susan E. McGregor

Assistant Professor, Columbia J School & Tow Center for Digital Journalism. Data journalism, information visualization, web development, education. bio from Twitter

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 9:00am12:30pm EST

Date Tue 23rd October 2012

Short URL


View the schedule


See something wrong?

Report an issue with this session