Sessions at Strata 2012 about Data Science on Wednesday 29th February

Your current filters are…

Clear
  • Dealing with Messy Data

    by Q Ethan McCallum

    Welcome to data science’s dirty little secret: data is messy. and it’s your problem.

    It’s bad enough that data comes from myriad sources and in a dizzying variety of formats. Malformed files, missing values, inconsistent and arcane formats, and a host of other issues all conspire to keep you away from your intended purpose: getting meaningful insight out of your data. Before you can touch any algorithms, before you feed any regressions, you’re going to have to roll up your sleeves and whip that data into shape.

    Q Ethan McCallum, technology consultant and author of Parallel R (O’Reilly), will explore common pitfalls of this data munging and share solutions from his personal playbook. Most of all, he’ll show you how to do this quickly and effectively, so you can get back to the real work of analyzing your data.

    At 10:40am to 11:20am, Wednesday 29th February

    In Mission City B1, Santa Clara Convention Center

  • Understanding Social Contagion

    by Marcel Salathé

    Who influences whom? Social Contagion, the spread of sentiments and behaviors, is the dominant force shaping human dynamics.

    Businesses care about social contagion because they want to understand how their products can go viral. Politicians care about social contagion because the spread of hope or fear can win an election. Public health officials care about contagion because the spread of unhealthy behaviors will overwhelm our health care system.

    Measuring social contagion, however, is hard, and presents us with considerable data science challenges. I will present our research on social contagion in the context of health behaviors, and how we address the phenomenon of social contagion with data science approaches. I will take the audience on a journey starting with mining open data from online social media services, to supervised machine learning algorithms, to data analysis using novel methods from social network statistics, all the while using only open source tools. The goal of the talk is a) to introduce the audience to the basic concepts of social contagion and b) to demonstrate a real world example of social contagion using open data science tools.

    At 11:30am to 12:10pm, Wednesday 29th February

    In Ballroom E, Santa Clara Convention Center

  • Data Science in Product Development

    by Joris Poort

    Big data science and cloud computing is changing how engineering driven companies develop highly complex products. Utilizing a novel cloud platform based on hadoop, big data analytics, and applied mathematics tools, the traditional product development cycle can be drastically sped up and used to provide new unique insights into highly complex products improving their final designs. Data science on the cloud can be utilized as a platform to collaborate between disciplinary silo’s within engineering organizations providing new opportunities for applications of advanced machine learning and optimization tools. These tools are demonstrating drastic improvements in aerospace, automotive, and other high-tech industries.

    An airplane wing case study will be shown to illustrate the ideas and methods presented. The case study will show how complex engineering disciplines such as aerodynamics and structural analysis can be simultaneously run on the cloud and coupled to not only increase the speed of product development but also used to develop better final product designs. Several tools described in the case study will be shown through a live demonstration.

    At 4:50pm to 5:30pm, Wednesday 29th February

    In Mission City B1, Santa Clara Convention Center

  • Sketching with Data

    by Fabien Girardin

    Since the early days of the data deluge, Lift Lab has been helping many actors of the ‘smart city’ in transforming the accumulation of network data (e.g. cellular network activity, aggregated credit card transactions, real-time traffic information, user-generated content) into products or services. Due to their innovative and transversal incline, our projects generally involve a wide variety of professionals from physicist and engineers to lawyers, decision makers and strategists.

    Our innovation methods embark these different stakeholders with fast prototyped tools that promote the processing, recompilation, interpretation, and reinterpretation of insights. For instance, our experience shows that the multiple perspectives extracted from the use of exploratory data visualizations is crucial to quickly answer some basic questions and provoke many better ones. Moreover, the ability to quickly sketch an interactive system or dashboard is a way to develop a common language amongst varied and different stakeholders. It allows them to focus on tangible opportunities of product or service that are hidden within their data. In this form of rapid visual business intelligence, an analysis and its visualization are not the results, but rather the supporting elements of a co-creation process to extract value from data.

    We will exemplify our methods with tools that help engage a wide spectrum of professionals to the innovation path in data science. These tools are based on a flexible data platform and visual programming environment that permit to go beyond the limited design possibilities industry standards. Additionally they reduce the prototyping time necessary to sketch interactive visualizations that allow the different stakeholder of an organization to take an active part in the design of services or products.

    At 4:50pm to 5:30pm, Wednesday 29th February

    In Ballroom AB, Santa Clara Convention Center