Dealing with Messy Data

A session at Strata 2012

  • Q Ethan McCallum

Wednesday 29th February, 2012

10:40am to 11:20am (PST)

Welcome to data science’s dirty little secret: data is messy. and it’s your problem.

It’s bad enough that data comes from myriad sources and in a dizzying variety of formats. Malformed files, missing values, inconsistent and arcane formats, and a host of other issues all conspire to keep you away from your intended purpose: getting meaningful insight out of your data. Before you can touch any algorithms, before you feed any regressions, you’re going to have to roll up your sleeves and whip that data into shape.

Q Ethan McCallum, technology consultant and author of Parallel R (O’Reilly), will explore common pitfalls of this data munging and share solutions from his personal playbook. Most of all, he’ll show you how to do this quickly and effectively, so you can get back to the real work of analyzing your data.

About the speaker

This person is speaking at this event.
Q Ethan McCallum

Next session in Mission City B1

11:30am Street Fighting Data Science by Peter Skomoroch

Sign in to add slides, notes or videos to this session

Strata 2012

United States United States, Santa Clara

28th February to 1st March 2012

Tell your friends!


Time 10:40am11:20am PST

Date Wed 29th February 2012


Mission City B1, Santa Clara Convention Center

Short URL


View the schedule



See something wrong?

Report an issue with this session