Sessions at Strata New York 2011 in Murray Hill Suite A

Your current filters are…

Thursday 22nd September 2011

  • Data Visualization - where normal people fall in love with data

    by Hjalmar Gislason

    A lesson we have learned from our work on the data portal DataMarket.com and custom projects we’ve done for a wide variety of customers is: Regardless of how interesting the underlying data or how ground-breaking the analysis is, most people only realize the value and see the potential once the data has been properly visualized.

    Put another way: Visualization is where normal people fall in love with data, and – when done right – where they can understand the data at a glance.

    We are by no means alone in realizing this. Data visualization has become a hot field, and a lot of statisticians, designers and computer professionals are taking their first steps, learning by example from things they’ve seen elsewhere. Some of these examples are colorful, pretty and praised but still don’t communicate the data properly – the real stories may even be obscured or distorted with badly parsed data or gratuitous visual fluff. Other examples are breaking new ground and advancing the field. But which is which?

    Visually communicating data is not a new field. People have been honing data visualization skills since the 19th century, learning a lot about what works – and what doesn’t. It is possible to do things both “right” and beautiful at the same time. In this presentation we hope to explain how by showing the audience some of the very best examples of such work from the leaders in this field – and others that have not done as well.

    After providing this background we will walk the audience step-by-step through one particular data visualization project we have worked on (possibly our Earthquake and Eruptions video), explaining the methods, tools and process involved in putting that together and the decisions that led to those particular choices.

    At 10:40am to 11:20am, Thursday 22nd September

    In Murray Hill Suite A, New York Hilton Midtown

    Coverage slide deck

  • Humble pie: helping the Guardian chart big stories through small details

    by Alastair Dant

    Nowadays, major news events prompt millions of responses online. Every message passing through the internet has a voice. Aggregate analysis and visualization helps us see the roar of the crowd.

    The Guardian first explored this last year with an award-winning graphic that replays World Cup games , condensing 90 minutes of tweets into 90 seconds of interactive animation. By juxtaposing match events with surges in word popularity, viewers can relive the ripples of human reaction passing through Twitter.

    Asked to apply similar techniques to the News International saga, we partnered with Datasift to capture and display public responses during key events in the story. This talk steps through the process of recording, processing and displaying a large volume of tweets which enabled a small team to build complex pieces of interactive content at newsroom speeds.

    Above all, the presentation will aim to portray the delicate balance of design, data and storytelling at the heart of interactive news content.

    At 11:30am to 12:10pm, Thursday 22nd September

    In Murray Hill Suite A, New York Hilton Midtown

  • How to Avoid Some Common Graphical Mistakes

    by Naomi B Robbins

    Good graphs are extremely powerful tools for communicating quantitative information clearly and accurately. Unfortunately, many of the graphs we see today are poor graphs that confuse, mislead or deceive the reader. These poor graphs often occur because the graph designer is not familiar with principles of effective graphs or because the software used has a poor choice of default settings. We point out some of these graphical mistakes including using unnecessary dimensions, not making the data stand out, making mistakes with scales, showing changes in one dimension by area or volume, and not making your message clear. In most cases very simple changes make the resulting graphs easier for the reader to understand. In addition, we show some common mistakes with tables. We end with some useful little-known graph forms that communicate the data more clearly than the everyday graphs that are more commonly used.

    At 1:40pm to 2:20pm, Thursday 22nd September

    In Murray Hill Suite A, New York Hilton Midtown

  • The Charts You Want Might Not Be the Charts You Need

    by Irene Ros

    Data visualization is an important communication medium in personal and public conversation spheres. Its wide use in entertainment and business settings alike has encouraged the creation of tools and frameworks that allow anyone to create visualizations and share them with their audience. While these tools offer tried and true visualization metaphors they also pose risks such as missing important data points or creating meaningless visuals.

    This talk will introduce the concept of “responsible data visualization” in the context of two distinct uses: exploration and narrative. Using personal and industry examples to show best and worst practices in each approach, this talk will offer practical suggestions to bringing data visualization into one’s data workflow.

    At 2:30pm to 3:10pm, Thursday 22nd September

    In Murray Hill Suite A, New York Hilton Midtown

  • Data Science from the Perspective of an Applied Economist

    by Scott Nicholson

    Economists utilize a data analysis toolkit and intuition that can be very helpful to Data Scientists. In particular, econometric methods are quite useful in disentangling correlation and causation, a use case not well-handled by standard machine learning and statistical techniques. This session will cover examples of econometric methods in action, as well as other economics-related insights. Think of it as a crash-course in basic econometric intuition that one receives during a PhD in Economics (I received my PhD from Stanford in 2008).

    Why econometrics? The difference between econometrics and statistics is that statistical modeling is more concerned with fit, and econometric modeling is more concerned with properly estimating the coefficients in a regression. Getting the “right” (consistent & unbiased) estimates means that the analyst can more effectively measure how a change in one variable can strongly predict (or cause) a change in the dependent variable. These techniques can help solve problems in social/web data that previously were only solvable using future data collection from randomized multivariate experiments.

    To do this, the analyst first develops an intuition for whether or not there is a source of “endogeneity” in the regression. This largely is determined by the relationship between the predictors and the error term in the regression. Once the source of the endogeneity is understood, econometric techniques like fixed/random effects and instrumental variables can be quite useful. The type of data that is collected and available is key to the extent to which the power of these techniques can be used. [I might also go into some other techniques, but these are the most useful]

    The methods will be presented in a way so that a non-technical person can understand the basic intuition, and also so that a practitioner can apply the methods in the future. Examples will be provided. For panel data econometrics, we will discuss the example of how to identify actions taken early on by a LinkedIn member that are predictive of their future engagement with the product, a problem that is difficult due to the confounding of correlation and causation. For instrumental variables techniques, we will discuss how to use random variation in the weather to say cool things about politics, economics, and web usage.

    In addition to the discussion of applied econometric techniques, there may also be time for economics-related data insights. Currently we are developing unemployment rate prediction models using time-series econometrics as well as indexes to measure changes in the supply/demand for talent across regions and industries.

    At 4:10pm to 4:50pm, Thursday 22nd September

    In Murray Hill Suite A, New York Hilton Midtown

    Coverage slide deck

  • 1M. 10M. 100M. Data!

    by Monica Rogati

    How do data infrastructure, insights and products change when your user base grows by orders of magnitude? When should you move your user-facing data product off your laptop? (hint: now!) Does your data offer insights about the world at large, or is it just mirroring your early adopters?

    In this talk, I will share some of the data scaling lessons we’ve learned at LinkedIn, recount war stories (and close calls!) and document the evolution of the data scientist.

    At 5:00pm to 5:40pm, Thursday 22nd September

    In Murray Hill Suite A, New York Hilton Midtown

Friday 23rd September 2011

  • Chart Wars: The Political Power of Data Visualization

    by Alex Lundry

    Political campaigns and causes have added another powerful weapon to their messaging arsenal: graphs, charts, infographics and other forms of data visualization. Over just the last year, Barack Obama urged voters to distribute and share a bar graph of job losses, a line graph of labor costs by a New York Times columnist prompted an official graphical response from the government of Spain, and an organizational chart of a health care reform bill became the subject of a Congressional investigation in the United States. To be sure, a good graph has been used as an advocacy tool for years, but only recently, with the rise of the Internet, blogs, hardware and software advances, and freely available machine readable data, political data visualizations have exploded into political discourse. Conveying objective authority, yet the product of dozens of subjective design decisions, political infographics imply hard truths despite their inherently editorial nature.

    This talk, given by a political data scientist who has built persuasive data visualizations for political organizations, will dissect some of the most extraordinary and powerful examples of political data visualization used over the last election cycle, focusing upon the methods that make them work so well.

    At 10:40am to 11:20am, Friday 23rd September

    In Murray Hill Suite A, New York Hilton Midtown

  • Designing Data Visualizations: Telling Stories With Data

    by Noah Iliinsky

    This is a talk aimed at people who know their data, and want to learn how to visualize it most effectively. If you have data, a need for answers, and a blank page, this is a great place to start.

    We’ll start briefly addressing the value of visualization, and discuss the differences between visualization for analysis and presentation.

    From there we’ll figure out what story to tell with your visualization by examining the holy visualization trinity:

    • your goals
    • your customer’s needs
    • the shape of your data

    Once the story has been selected, we need to construct it. We’ll discuss key considerations to make good choices about:

    • selecting appropriate data
    • selecting appropriate axes
    • visually encoding the data

    We’ll end with a brief discussion of some current tools, and look at some classic and innovative visualization examples.

    At 11:30am to 12:10pm, Friday 23rd September

    In Murray Hill Suite A, New York Hilton Midtown

  • Big Data Use Cases in the Cloud

    by Peter Sirota

    By pairing the elasticity and pay-as-you-go nature of the cloud with the flexibility and scalability of Hadoop, Amazon Elastic MapReduce has brought Big Data analytics to an even wider array of companies looking to maximize the value of their data. Each day, thousands of Hadoop clusters are run on the Amazon Elastic MapReduce infrastructure by users of every size—from University students to Fortune 50 companies—exposing the Elastic MapReduce team to an unparalleled number of use cases. In this session, we will contrast how three of these users, Amazon.com, Yelp, and Etsy, leverage the marriage of Hadoop and the cloud to drive their businesses in the face of explosive growth, including generating customer insights, powering recommendations, and managing core operations.

    At 1:40pm to 2:20pm, Friday 23rd September

    In Murray Hill Suite A, New York Hilton Midtown

  • Google Cloud for Data Crunchers

    by Chris Schalk and Ryan Boyd

    Google is a Data business: over the past few years, many of the tools Google created to store, query, analyze, visualize its data, have been exposed to developers as services.

    This talk will give you an overview of Google services for Data Crunchers:

    • Google Storage for developers: get your data in Google Cloud
    • BigQuery, fast interactive queries on Terabytes of data
    • Prediction API: Machine Learning made easy
    • Google App Engine:platform as a service to build web apps or expose APIs
    • Visualization API: many cool visualization components
    • Fusion Tables: collaborate and visualize your data on a Map
    • Google Public Data Explorer, to expose and visualize public data
    • Services that have not been announced as of the writing of this proposal but may be available when the conference happens:-)

    At 2:30pm to 3:10pm, Friday 23rd September

    In Murray Hill Suite A, New York Hilton Midtown

  • Data Environmentalism

    by Trevor Hughes

    Data fuels 21st century business and society. Thanks to the rapid pace of innovation and widespread adoption of information technologies, data has become both a strategic asset and a potentially crippling liability. As consumers grow increasingly concerned about the stewardship of their data, policymakers, academics and advocates around the world are questioning boundaries and considering risks:

    • What is private and what is not?
    • How should organizations explain what they’re doing with data?
    • What should happen when data is stolen or misused?
    • And, in an era of globalization, how do we manage the diverse social and legal expectations?

    These questions are urgent in the current business climate where trust in our most basic institutions has been eroded. As organizations cope with growing tension between innovation, privacy and security, they are discovering that appropriate use and protection of data has broad impact on their reputations and bottom lines—a new, holistic ethos of data environmentalism is necessary.

    At 4:10pm to 4:50pm, Friday 23rd September

    In Murray Hill Suite A, New York Hilton Midtown

  • Hazarding a Guess: ethical, legal, and policy issues in analytics and big data applications

    by Betsy Masiello, Jane Yakowitz and Solon Barocas

    Analytics can push the frontier of knowledge well beyond the useful facts that already reside in big data, revealing latent correlations that empower organizations to make statistically motivated guesses—inferences—about the character, attributes, and future actions of their stakeholders and the groups to which they belong.

    This is cause for both celebration and caution. Analytic insights can add to the stock of scientific and social scientific knowledge, significantly improve decision-making in both the public and private sector, and greatly enhance individual self-knowledge and understanding. They can even lead to entirely new classes of goods and services, providing value to institutions and individuals alike. But they also invite new applications of data that involve serious hazards.

    This panel considers these hazards, asking how analytics implicate:

    • Privacy — What are the privacy concerns involved in the kinds of inferences and applications that analytics enable? Are these concerns sufficiently well understood and accounted for?
    • Autonomy — What are the ethical stakes of applications that draw on analytic findings to selectively (and perhaps inadvertently) influence or limit individuals’ choices or decision-making?
    • Fairness — If organizations rely on certain discoveries to set criteria for unequal treatment or access, do analytics implicate questions of fairness and due process? More specifically, what if organizations draw on analytics to individualize risks or engage in adverse selection or cream skimming?
    • Fragmentation — Do attempts to personalize and customize goods and services (including media content) to individuals on the basis of inferred preferences shield individuals from certain views and issues and thus undermine social belonging and the functioning of the public sphere?

    The panel will also debate the appropriate response to these issues, reviewing the place of norms, policies, legal frameworks, regulation, and technology.

    At 5:00pm to 5:40pm, Friday 23rd September

    In Murray Hill Suite A, New York Hilton Midtown