Your current filters are…
End the day by joining leading data scientists in debating the hot issues in the profession.
by Q Ethan McCallum
Welcome to data science’s dirty little secret: data is messy. and it’s your problem.
It’s bad enough that data comes from myriad sources and in a dizzying variety of formats. Malformed files, missing values, inconsistent and arcane formats, and a host of other issues all conspire to keep you away from your intended purpose: getting meaningful insight out of your data. Before you can touch any algorithms, before you feed any regressions, you’re going to have to roll up your sleeves and whip that data into shape.
Q Ethan McCallum, technology consultant and author of Parallel R (O’Reilly), will explore common pitfalls of this data munging and share solutions from his personal playbook. Most of all, he’ll show you how to do this quickly and effectively, so you can get back to the real work of analyzing your data.
Who influences whom? Social Contagion, the spread of sentiments and behaviors, is the dominant force shaping human dynamics.
Businesses care about social contagion because they want to understand how their products can go viral. Politicians care about social contagion because the spread of hope or fear can win an election. Public health officials care about contagion because the spread of unhealthy behaviors will overwhelm our health care system.
Measuring social contagion, however, is hard, and presents us with considerable data science challenges. I will present our research on social contagion in the context of health behaviors, and how we address the phenomenon of social contagion with data science approaches. I will take the audience on a journey starting with mining open data from online social media services, to supervised machine learning algorithms, to data analysis using novel methods from social network statistics, all the while using only open source tools. The goal of the talk is a) to introduce the audience to the basic concepts of social contagion and b) to demonstrate a real world example of social contagion using open data science tools.
by Joris Poort
Big data science and cloud computing is changing how engineering driven companies develop highly complex products. Utilizing a novel cloud platform based on hadoop, big data analytics, and applied mathematics tools, the traditional product development cycle can be drastically sped up and used to provide new unique insights into highly complex products improving their final designs. Data science on the cloud can be utilized as a platform to collaborate between disciplinary silo’s within engineering organizations providing new opportunities for applications of advanced machine learning and optimization tools. These tools are demonstrating drastic improvements in aerospace, automotive, and other high-tech industries.
An airplane wing case study will be shown to illustrate the ideas and methods presented. The case study will show how complex engineering disciplines such as aerodynamics and structural analysis can be simultaneously run on the cloud and coupled to not only increase the speed of product development but also used to develop better final product designs. Several tools described in the case study will be shown through a live demonstration.
Since the early days of the data deluge, Lift Lab has been helping many actors of the ‘smart city’ in transforming the accumulation of network data (e.g. cellular network activity, aggregated credit card transactions, real-time traffic information, user-generated content) into products or services. Due to their innovative and transversal incline, our projects generally involve a wide variety of professionals from physicist and engineers to lawyers, decision makers and strategists.
Our innovation methods embark these different stakeholders with fast prototyped tools that promote the processing, recompilation, interpretation, and reinterpretation of insights. For instance, our experience shows that the multiple perspectives extracted from the use of exploratory data visualizations is crucial to quickly answer some basic questions and provoke many better ones. Moreover, the ability to quickly sketch an interactive system or dashboard is a way to develop a common language amongst varied and different stakeholders. It allows them to focus on tangible opportunities of product or service that are hidden within their data. In this form of rapid visual business intelligence, an analysis and its visualization are not the results, but rather the supporting elements of a co-creation process to extract value from data.
We will exemplify our methods with tools that help engage a wide spectrum of professionals to the innovation path in data science. These tools are based on a flexible data platform and visual programming environment that permit to go beyond the limited design possibilities industry standards. Additionally they reduce the prototyping time necessary to sketch interactive visualizations that allow the different stakeholder of an organization to take an active part in the design of services or products.
The advent of crowdsourcing has wildly expanded the ways we think of incorporating human judgments into computational workflows. Computer scientists, economists, and sociologists have explored how to effectively and efficiently distribute microwork tasks to crowds and use their work as inputs to create or improve data products. Simultaneously, crowdsourcing providers are exploring the bounds of mechanical QA flows, worker interfaces, and workforce management systems.
But what tasks should be performed by humans rather than algorithms? And what makes a set of human judgments robust? Quantity? Consensus? Quality or trustworthiness of the workers? Moreover, the robustness of judgments depends not only on the workers, but on the task design. Effective crowdsourcing is a cooperative endeavor.
In this talk, we will analyze various dimensions of microwork that characterize applications, tasks, and crowds. Drawing on our experience at companies that have pioneered the use of microwork (Samasource) and data science (LinkedIn), we will offer practical advice to help you design crowdsourcing workflows to meet your data product needs.
28th February to 1st March 2012