Your current filters are…
A lesson we have learned from our work on the data portal DataMarket.com and custom projects we’ve done for a wide variety of customers is: Regardless of how interesting the underlying data or how ground-breaking the analysis is, most people only realize the value and see the potential once the data has been properly visualized.
Put another way: Visualization is where normal people fall in love with data, and – when done right – where they can understand the data at a glance.
We are by no means alone in realizing this. Data visualization has become a hot field, and a lot of statisticians, designers and computer professionals are taking their first steps, learning by example from things they’ve seen elsewhere. Some of these examples are colorful, pretty and praised but still don’t communicate the data properly – the real stories may even be obscured or distorted with badly parsed data or gratuitous visual fluff. Other examples are breaking new ground and advancing the field. But which is which?
Visually communicating data is not a new field. People have been honing data visualization skills since the 19th century, learning a lot about what works – and what doesn’t. It is possible to do things both “right” and beautiful at the same time. In this presentation we hope to explain how by showing the audience some of the very best examples of such work from the leaders in this field – and others that have not done as well.
After providing this background we will walk the audience step-by-step through one particular data visualization project we have worked on (possibly our Earthquake and Eruptions video), explaining the methods, tools and process involved in putting that together and the decisions that led to those particular choices.
Nowadays, major news events prompt millions of responses online. Every message passing through the internet has a voice. Aggregate analysis and visualization helps us see the roar of the crowd.
The Guardian first explored this last year with an award-winning graphic that replays World Cup games , condensing 90 minutes of tweets into 90 seconds of interactive animation. By juxtaposing match events with surges in word popularity, viewers can relive the ripples of human reaction passing through Twitter.
Asked to apply similar techniques to the News International saga, we partnered with Datasift to capture and display public responses during key events in the story. This talk steps through the process of recording, processing and displaying a large volume of tweets which enabled a small team to build complex pieces of interactive content at newsroom speeds.
Above all, the presentation will aim to portray the delicate balance of design, data and storytelling at the heart of interactive news content.
Good graphs are extremely powerful tools for communicating quantitative information clearly and accurately. Unfortunately, many of the graphs we see today are poor graphs that confuse, mislead or deceive the reader. These poor graphs often occur because the graph designer is not familiar with principles of effective graphs or because the software used has a poor choice of default settings. We point out some of these graphical mistakes including using unnecessary dimensions, not making the data stand out, making mistakes with scales, showing changes in one dimension by area or volume, and not making your message clear. In most cases very simple changes make the resulting graphs easier for the reader to understand. In addition, we show some common mistakes with tables. We end with some useful little-known graph forms that communicate the data more clearly than the everyday graphs that are more commonly used.
by Irene Ros
Data visualization is an important communication medium in personal and public conversation spheres. Its wide use in entertainment and business settings alike has encouraged the creation of tools and frameworks that allow anyone to create visualizations and share them with their audience. While these tools offer tried and true visualization metaphors they also pose risks such as missing important data points or creating meaningless visuals.
This talk will introduce the concept of “responsible data visualization” in the context of two distinct uses: exploration and narrative. Using personal and industry examples to show best and worst practices in each approach, this talk will offer practical suggestions to bringing data visualization into one’s data workflow.
Economists utilize a data analysis toolkit and intuition that can be very helpful to Data Scientists. In particular, econometric methods are quite useful in disentangling correlation and causation, a use case not well-handled by standard machine learning and statistical techniques. This session will cover examples of econometric methods in action, as well as other economics-related insights. Think of it as a crash-course in basic econometric intuition that one receives during a PhD in Economics (I received my PhD from Stanford in 2008).
Why econometrics? The difference between econometrics and statistics is that statistical modeling is more concerned with fit, and econometric modeling is more concerned with properly estimating the coefficients in a regression. Getting the “right” (consistent & unbiased) estimates means that the analyst can more effectively measure how a change in one variable can strongly predict (or cause) a change in the dependent variable. These techniques can help solve problems in social/web data that previously were only solvable using future data collection from randomized multivariate experiments.
To do this, the analyst first develops an intuition for whether or not there is a source of “endogeneity” in the regression. This largely is determined by the relationship between the predictors and the error term in the regression. Once the source of the endogeneity is understood, econometric techniques like fixed/random effects and instrumental variables can be quite useful. The type of data that is collected and available is key to the extent to which the power of these techniques can be used. [I might also go into some other techniques, but these are the most useful]
The methods will be presented in a way so that a non-technical person can understand the basic intuition, and also so that a practitioner can apply the methods in the future. Examples will be provided. For panel data econometrics, we will discuss the example of how to identify actions taken early on by a LinkedIn member that are predictive of their future engagement with the product, a problem that is difficult due to the confounding of correlation and causation. For instrumental variables techniques, we will discuss how to use random variation in the weather to say cool things about politics, economics, and web usage.
In addition to the discussion of applied econometric techniques, there may also be time for economics-related data insights. Currently we are developing unemployment rate prediction models using time-series econometrics as well as indexes to measure changes in the supply/demand for talent across regions and industries.
How do data infrastructure, insights and products change when your user base grows by orders of magnitude? When should you move your user-facing data product off your laptop? (hint: now!) Does your data offer insights about the world at large, or is it just mirroring your early adopters?
In this talk, I will share some of the data scaling lessons we’ve learned at LinkedIn, recount war stories (and close calls!) and document the evolution of the data scientist.
by Alex Lundry
Political campaigns and causes have added another powerful weapon to their messaging arsenal: graphs, charts, infographics and other forms of data visualization. Over just the last year, Barack Obama urged voters to distribute and share a bar graph of job losses, a line graph of labor costs by a New York Times columnist prompted an official graphical response from the government of Spain, and an organizational chart of a health care reform bill became the subject of a Congressional investigation in the United States. To be sure, a good graph has been used as an advocacy tool for years, but only recently, with the rise of the Internet, blogs, hardware and software advances, and freely available machine readable data, political data visualizations have exploded into political discourse. Conveying objective authority, yet the product of dozens of subjective design decisions, political infographics imply hard truths despite their inherently editorial nature.
This talk, given by a political data scientist who has built persuasive data visualizations for political organizations, will dissect some of the most extraordinary and powerful examples of political data visualization used over the last election cycle, focusing upon the methods that make them work so well.
This is a talk aimed at people who know their data, and want to learn how to visualize it most effectively. If you have data, a need for answers, and a blank page, this is a great place to start.
We’ll start briefly addressing the value of visualization, and discuss the differences between visualization for analysis and presentation.
From there we’ll figure out what story to tell with your visualization by examining the holy visualization trinity:
Once the story has been selected, we need to construct it. We’ll discuss key considerations to make good choices about:
We’ll end with a brief discussion of some current tools, and look at some classic and innovative visualization examples.
by Peter Sirota
By pairing the elasticity and pay-as-you-go nature of the cloud with the flexibility and scalability of Hadoop, Amazon Elastic MapReduce has brought Big Data analytics to an even wider array of companies looking to maximize the value of their data. Each day, thousands of Hadoop clusters are run on the Amazon Elastic MapReduce infrastructure by users of every size—from University students to Fortune 50 companies—exposing the Elastic MapReduce team to an unparalleled number of use cases. In this session, we will contrast how three of these users, Amazon.com, Yelp, and Etsy, leverage the marriage of Hadoop and the cloud to drive their businesses in the face of explosive growth, including generating customer insights, powering recommendations, and managing core operations.
Google is a Data business: over the past few years, many of the tools Google created to store, query, analyze, visualize its data, have been exposed to developers as services.
This talk will give you an overview of Google services for Data Crunchers:
Data fuels 21st century business and society. Thanks to the rapid pace of innovation and widespread adoption of information technologies, data has become both a strategic asset and a potentially crippling liability. As consumers grow increasingly concerned about the stewardship of their data, policymakers, academics and advocates around the world are questioning boundaries and considering risks:
These questions are urgent in the current business climate where trust in our most basic institutions has been eroded. As organizations cope with growing tension between innovation, privacy and security, they are discovering that appropriate use and protection of data has broad impact on their reputations and bottom lines—a new, holistic ethos of data environmentalism is necessary.
by Solon Barocas, Betsy Masiello and Jane Yakowitz
Analytics can push the frontier of knowledge well beyond the useful facts that already reside in big data, revealing latent correlations that empower organizations to make statistically motivated guesses—inferences—about the character, attributes, and future actions of their stakeholders and the groups to which they belong.
This is cause for both celebration and caution. Analytic insights can add to the stock of scientific and social scientific knowledge, significantly improve decision-making in both the public and private sector, and greatly enhance individual self-knowledge and understanding. They can even lead to entirely new classes of goods and services, providing value to institutions and individuals alike. But they also invite new applications of data that involve serious hazards.
This panel considers these hazards, asking how analytics implicate:
The panel will also debate the appropriate response to these issues, reviewing the place of norms, policies, legal frameworks, regulation, and technology.
22nd–23rd September 2011