•  

Sessions at SXSW Interactive 2011 about Data

View as grid

Your current filters are…

Saturday 12th March 2011

  • Dawn of the Data: Future of Consumer Lending

    by Jessica Jackley, Ryan Gilbert, Paul Leonard and Douglas Merrill

    Technology and mathematics are transforming consumer lending. Historically, it has been nearly impossible for people with bad credit to get loans. Yet, these are often the people who need it most - to buy groceries or pay bills.

    Until now, lenders determined who should get loans through a simple underwriting function based on a small amount of credit data. When this data is missing or wrong, banks deny the loan, leaving people to payday loans or pawn shops - very expensive options that put people further in debt.

    Millions of people are being denied credit because underwriting hasn’t evolved. Why use only a handful of variables when we have vast amounts of data provided by the customer, the Internet, and social media? All data is credit data and we should use it all to make better underwriting decisions.

    Analyzing vast amounts of data, however, requires complex machine learning more akin to search engines than your corner bank. The future of financial services is to become more like a recommendation engine, and less like a place where you stand in line to deposit checks.

    The panelists will discuss how to use large-scale data analysis to re-invent underwriting and replace today’s antiquated methods. Better underwriting will open up good credit to people who don't have a lot of good options and materially improve the financial lives of the people who need it most.

    LEVEL: Intermediate

    At 9:30am to 10:30am, Saturday 12th March

    In Salon F/G, Hilton Austin Downtown

  • Why Visualizing Government Data Makes Taxpayers Happy

    by Michael Castellon and Jeremiah Akin

    The expectation of transparency is creating demand for government agencies to develop new ways to communicate complex data and trends to the public in easy-to-access and easy-to-understand formats.

    Some agencies are turning to Google Maps and KML data to visualize raw information online and on mobile devices. Delivering data in more easily understandable formats not only boosts trust and confidence between government agencies and their publics, but also streamlines workloads among Data, Web, Editorial, and Customer Service teams.

    The Texas Comptroller is the state’s chief revenue officer, tax collector, and treasurer. The agency uses public-facing maps to communicate data and economic trends across the state, editorial coverage, and to promote initiatives such as its Unclaimed Property initiative, which works to reunite taxpayers with about $2 billion in unclaimed money and property.

    This discussion will focus on how agencies and other organizations can use free or inexpensive tools to deliver data to the public in both traditional online formats and mobile platforms, and how workflows can be arranged so that data visualization can be managed and administered by non-technical staff. We will also discuss how maps can be used internally to enhance strategic efforts.

    LEVEL: Intermediate

    At 11:00am to 12:00pm, Saturday 12th March

    In Room 9ABC, Austin Convention Center

  • Time Traveling: Interfaces for Geotemporal Visualization

    by Irene Ros, Ryan Shaw, Ana Boa-Ventura, Adam Rabinowitz and Nicholas Rabinowitz

    Displaying geography alone is easy: interactive maps are more and more a part of our everyday lives. Displaying time alone is easy: we are all familiar with charts and animations that show the passage of time. It is increasingly common to display time and space together in a single visual interface as well, but this combination has raised a number of new questions. There are few conventions or standards for geotemporal visualization, and we are still discovering which approaches are most effective for which datasets. Focusing particularly on historical data, this panel will explore issues in the modeling and visualization of geotemporal information, presenting existing approaches and discussing new trends.

    LEVEL: Intermediate

    At 12:30pm to 1:30pm, Saturday 12th March

    In Ballroom B, Austin Convention Center

Sunday 13th March 2011

  • Data-Driven Government: Improving the Citizen Experience [Cancelled]

    by Julie Germany, Aneesh Chopra and Matt Lira

    Over the past several years, there have been many discussions regarding how interactive technology can drive change in our nation’s politics – but of perhaps greater importance is how technology can improve the daily functioning of our nation’s government.

    The discussion should not be a partisan one – this panel will bring together leading innovators from both parties to engage in a post-partisan discussion about how technology can improve the public’s interactions with their government.

    This discussion should be about specifics – we can all agree on the broad principles that technology drives change – but we have all heard that conversation before. This panel will focus on the specific progress that has been made, the specific opportunities that exist in the near future, and the specific challenges that need to be addressed.

    As citizens increasingly become on-demand consumers in their daily lives, it is clear that government needs to better utilize interactive technology or it will only be more radically disconnected from the public.

    This is not a political conference, which is precisely why it should be where this conversation takes place – how can the innovations from the creative, marketing and interactive communities be applied to improving our nation?

    Our government needs to modernize. We need to move forward and debate new ideas, focusing on how we can collectively make our government work smarter, faster and better for all citizens.

    LEVEL: Intermediate

    At 9:30am to 10:30am, Sunday 13th March

    In Room 9ABC, Austin Convention Center

  • Finding Music With Pictures: Data Visualization for Discovery

    by Paul Lamere

    With so much music available, finding new music that you like can
    be like finding a needle in a haystack. We need new tools to help
    us to explore the world of music, tools that can help us separate
    the wheat from the chaff.

    In this panel we will look at how visualizations can be used to
    help people explore the music space and discover new, interesting
    music that they will like. We will look at a wide range of
    visualizations, from hand drawn artist maps, to highly interactive,
    immersive 3D environments. We'll explore a number of different
    visualization techniques including graphs, trees, maps, timelines
    and flow diagrams and we'll examine different types of music data
    that can contribute to a visualization.

    Using numerous examples drawn from commercial and research systems
    we'll show how visualizations are being used now to enhance music
    discovery and we'll demonstrate some new visualization techniques
    coming out of the labs that we'll find in tomorrow's music
    discovery applications.

    LEVEL: Advanced

    At 11:00am to 12:00pm, Sunday 13th March

    In Salon H, Hilton Austin Downtown

  • Health Data Everywhere: Not a Drop to Link?

    by Roni Zeiger, Indu Subaiya, Gilles Frydman, Aman Bhandari and Jamie Heywood

    The Health 2.0 and Open Gov movements have helped unlock large repositories of data - from user-generated data in hundreds of online communities to mobile devices to federal quality indicators to medical record data within provider organizations. But much remains to be done to connect these disconnected islands of data to generate information that's meaningful and actionable by end users. And what happens when you link informed patient communities with their health data? As Clay Shirky says, it gets weird. And interesting.

    A number of communities have cropped up to promote access to medical data and the integration of user-reported and behavioral data within the clinical decision stream including healthdatarights.org, #healthapps, #health2dev, #73cents, #getupandmove and #WhyPM. With the opening up of health datasets, platform APIs and increasingly sophisticated analytic engines to make user-generated health data clinically relevant, we can finally unleash the wider developer community to build robust and integrated tools to improve health and healthcare.

    This session brings together some of the leading voices in the Health 2.0 movement to discuss and demo technologies that help access, mine, display and distribute control of health information across a wide variety of interfaces and devices. We will also hear how government is opening healthcare datasets for access by the developer community and how patients are increasingly becoming "n of 1" platforms.

    LEVEL: Intermediate

    At 11:00am to 12:00pm, Sunday 13th March

    In Rio, Hilton Garden Inn Austin Downtown

    Coverage slide deck

  • Embracing NoSQL - Your First Cassandra Project

    by Ryan King, Jared Carroll, Rudy Jahchan, Michael Wynholds and Rob Pak

    What is Cassandra? What is NoSQL? Why are sites like Facebook, Twitter, Google and Digg all using these new technologies? And what does that mean to me?

    The popularity of the NoSQL movement has exploded in the last year or two, as a number of these non-traditional data storage systems have gone from experimental curiosities to powerful production-ready engines that power the largest real-time social networking sites on the Web.

    Born out of Facebook, Cassandra is one of super-hot players in this new movement. We recently had an opportunity to build a new social networking site using it for the first time, and we want to share what we learned.

    In this presentation:

    • You will discover the NoSQL movement and the big players who lead it.
    • You will learn both *how* and *why* you should build your site using Cassandra.
    • You will understand what Cassandra offers, and how it differs from traditional databases as well as other NoSQL competitors like CouchDB and MongoDB.
    • You will walk through real code examples that can help you bootstrap your own Cassandra project.
    • You will see where we stumbled along the way, so you can avoid making the same mistakes.

    Code samples are in Ruby on Rails.

    LEVEL: Intermediate

    At 3:30pm to 6:00pm, Sunday 13th March

    In Capitol E-H, Sheraton Austin Hotel at the Capitol

  • How Open Health Data Can Improve America's Health

    by Todd Park

    LEVEL: Intermediate

    At 3:30pm to 4:30pm, Sunday 13th March

    In Rio, Hilton Garden Inn Austin Downtown

    Coverage slide deck

  • Machines Trading Stocks on News

    by Adam Honore, Jacob Sisk, Armando Gonzalez and John Kittrell

    Trading on news is not new. Terminals have had news readers attached from the time trading went electronic. What is new is who, or what, is trading on news. Born from a hybrid of technological capability, electronification of the markets, algorithmic trading, and a little influence from the intelligence community, black box trading systems are now applying semantic analysis to trade on news items without a single human ever reading the story. While only 2% of trading firms were doing this two years ago, roughly one-third are exploring it today. This session looks at the data, drivers, and technology behind trading on unstructured content.

    LEVEL: Advanced

    At 3:30pm to 4:30pm, Sunday 13th March

    In Town Lake Ballroom, Radisson Hotel & Suites Austin-Town Lake

  • Solr Power FTW: Make NoSQL Your Bitch!

    by RC Johnson and Grant Ingersoll

    Solr is an open source, Lucene based search platform originally developed by CNET and used by the likes of Netflix, Yelp, and StubHub which has been rapidly growing in popularity and features during the last few years. Learn how Solr can be used as a Not Only SQL (NoSQL) database along the lines of Cassandra, Memcached, and Redis.
    NoSQL data stores are regularly described as non-relational, distributed, internet-scalable and are used at both Facebook and Digg.

    This presentation will quickly cover the fundamentals of NoSQL data stores, the basics of Lucene, and what Solr brings to the table. Following that we will dive into the technical details of making Solr your primary query engine on large scale web applications, thus relegating your traditional relational database to little more than a simple key store.

    Real solutions to problems like handling four billion requests per month will be presented. We'll talk about sizing and configuring the Solr instances to maintain rapid response times under heavy load. We'll show you how to change the schema on a live system with tens of millions of documents indexed while supporting real-time results. And finally, we'll answer your questions about ways to work around the lack of transactions in Solr and how you can do all of this in a highly available solution.

    LEVEL: Advanced

    At 3:30pm to 4:30pm, Sunday 13th March

    In Ballroom B, Austin Convention Center

    Coverage slide deck

  • Shopping as a Revolutionary Act?

    by Christopher Carfi, Tara Hunt and Adriana Lukas

    We've been called "consumers" for decades, but we are also producers - of data. Lots of it. Everywhere we take our business, we leave a trail of data, little of which we manage, and almost none of which we control. Most of that data, however, is produced without our knowledge or control.

    What happens when we take control of that activity? What happens when "the marketplace" is not a collection of customer fish in a sellers' barrel, but a truly open space where relationships are genuine, meaningful, and mutual?

    This is the start of a revolution. The Shopping as a Revolutionary Act? panel will look at some of the leading developments involved, where they are likely to head, and what changes these will bring to our economic, social and personal lives.

    LEVEL: Intermediate

    At 5:00pm to 6:00pm, Sunday 13th March

    In TX Ballroom 1, Hyatt Regency Austin

Monday 14th March 2011

  • Machine Learning and Social Media

    by Bruce Smith

    Social media applications encounter messy user-generated data in blog posts, status updates, tweets, user profiles, etc. These documents contain free-form text that obeys no particular rules of grammar, punctuation or spelling.

    If the data is so messy, how can a computer program recognize adult content or hate speech or spam? How can a computer program tell the difference between an advertisement and a product review? How can a computer program distinguish between a positive and a negative product review?

    Machine learning offers some solutions. For example, given sample tweets labeled (by people) as spam or non-spam, machine learning tools can generate a program (or model) that attempts to duplicate the human judgments. You could use this kind of model in your application to filter out tweet spam.

    In this talk we will describe
    •Some common machine learning algorithms
    •Machine learning tools – free and commercial
    •Acquiring and managing training data
    •Extracting useful features from your documents
    •Choosing the right technique for a problem
    •Measuring quality and improving your model over time
    •Integrating a machine learned model with your application

    Coming out of this session, you will know where you might use machine learning in your applications, and you will know how to get started.

    LEVEL: Intermediate

    At 9:30am to 10:30am, Monday 14th March

    In Salon J, Hilton Austin Downtown

  • Big Data and APIs for PHP Developers

    by Dennis Yang, Laura Thomson, Bradley Holt, EliW, David Zülke and Julie Steele

    Big Data creates problems and opportunities that do not exist when dealing with smaller datasets. You will learn how to scale, utilize, and visualize Big Data as well as create and integrate Big Data related APIs. We will talk about how to scale your data, expose your data through APIs, integrate existing data from the data marketplace, and communicate your data through visualization.You will find out what techniques and strategies work best when working with Big Data. Many developers have learned how to scale their systems for high levels of concurrency. However, scaling for Big Data has its own unique challenges. Sometimes strategies that would make no sense for smaller systems work great when dealing with larger datasets. This Workshop is geared towards PHP developers, but all are welcome.

    LEVEL: Intermediate

    At 11:00am to 1:30pm, Monday 14th March

    In Capitol E-H, Sheraton Austin Hotel at the Capitol

    Coverage slide deck

  • Love, Music & APIs

    by Matthew Ogle and Dave Haynes

    In the old days it was DJs, A&R folks, labels and record store owners that were the gatekeepers to music. Today, we are seeing a new music gatekeeper emerge... the developer. Using open APIs, developers are creating new apps that change how people explore, discover, create and interact with music. But developers can't do it alone. They need data like gig listings, lyrics, recommendation tools and, of course, music! And they need it from reliable, structured and legitimate sources.

    In this presentation we'll discuss and explore what is happening right now in the thriving music developer ecosystem. We'll describe some of the novel APIs that are making this happen and what sort of building blocks are being put into place from a variety of different sources. We'll demonstrate how companies within this ecosystem are working closely together in a spirit of co-operation. Each providing their own pieces to an expanding pool of resources from which developers can play, develop and create new music apps across different mediums - web, mobile, software and hardware. We'll highlight some of the next-generation of music apps that are being created in this thriving ecosystem.

    Finally we'll take a look at how music developers are coming together at events like Music Hack Day, where participants have just 24 hours to build the next generation of music apps. Someone once said, "APIs are the sex organs of software. Data is the DNA." If this is true, then Music Hack Days are orgies.

    LEVEL: Intermediate

    At 11:00am to 12:00pm, Monday 14th March

    In Room 18ABCD, Austin Convention Center

  • The Cassandra Database: When Performance Meets Scalability

    by Jonathan Ellis

    Faced with the costs of vertically scaling their relational database systems, developers are increasingly turning to Apache Cassandra as an alternative. Cassandra solves the scaling problem by partitioning data, expanding horizontally and promising replication consistency. Effectively utilizing Cassandra requires that developers take different approaches to the ways they model data used in their applications. This presentation will explain how Cassandra achieves scale and reliability, and give an example of porting a SQL schema to Cassandra.

    LEVEL: Advanced

    At 3:30pm to 4:30pm, Monday 14th March

    In Town Lake Ballroom, Radisson Hotel & Suites Austin-Town Lake

  • Big Data for Everyone (No Data Scientists Required)

    by Eric Sammer, Steve Watt, Stu Hood, ɹǝɯoɹʞ (dılɟ) dılıɥd and Matt Pfeil

    Big Data solutions, such as Apache Hadoop and Apache Cassandra, are growing up and are in the process of moving out of a grassroots movement to widespread adoption. Unfortunately, the majority of the technical expertise still lies in the hands of the open source project contributors and most solutions are tackled from the bottom up, starting with the technical problems. The collateral that is presently available is largely from the social media giants that tout solutions built using 10,000 node clusters that process petabytes of data a day. The reality? The average person just cannot relate or intuitively draw parallels to their own business problems.

    While Big Data solutions are worthwhile far before you reach petabyte scale data, just getting started can be a challenge in itself. New open source projects are being regularly released that tackle a variety of issues related to Big Data, some of which are just slightly different to existing technologies. Just how does one navigate the plethora of technologies to design workable solutions to business problems? What if you only have gigabytes or terabytes of "medium" data on a small cluster? This panel features Solution Architects from a variety of key companies in the Big Data space which will provide deep dive technical discussions on real solutions we've employed for our customers, across a variety of industries, starting with the business problems.

    LEVEL: Intermediate

    At 5:00pm to 6:00pm, Monday 14th March

    In Town Lake Ballroom, Radisson Hotel & Suites Austin-Town Lake

    Coverage slide deck

  • Our Media: Building An API For Public Media

    by John Bracken, Jake Shapiro, Kinsey Wilson, Robert Bole and Kavita Pillay

    Open APIs are sweeping through public media, just like the rest of the world, but folks at NPR, PBS and others are thinking even bigger. Public media is in an unprecedented project to build an open API called the Public Media Platform (PMP) that will help developers create applications that bring personalized public media content to new platforms. Come learn from the leaders of the PMP on how this project is rolling out, where it is headed and how it can benefit you. We will be discussing how public media is creating the right technology layer, as well as balancing business rules to build new opportunities for our media to be For, By and Of the People.

    LEVEL: Intermediate

    At 5:00pm to 6:00pm, Monday 14th March

    In Creekside, Sheraton Austin Hotel at the Capitol

Tuesday 15th March 2011

  • Innovating & Developing with Libraries, Archives & Museums

    by Jon Voss, Michael Edson, Deborah Boyer and Danielle Plumer

    For centuries, libraries, archives, and museums have been creating structured data, organizing information, and managing metadata in order to organize and share cultural artifacts and knowledge with the public. Unfortunately, the bulk of these systems have evolved in isolation, long before the advent of the World Wide Web. However, the convergence of developments in culture and technology are resulting in exciting new ways for individuals and developers alike to interact directly with unprecedented amounts of structured data, historical photos and archives, and more.

    Expert developers and project managers in this field will lead a discussion focused on the question: How can developers leverage open data from libraries, archives and museums being made available to the public? Panelists will review new developments and highlight examples, considering use cases with Linked Data, Flickr Commons, Smithsonian Commons, mobile apps, and scalability.

    LEVEL: Intermediate

    At 9:30am to 10:30am, Tuesday 15th March

    In Room 10AB, Austin Convention Center

    Coverage slide deck

  • Reid Hoffman Presentation: Data as Web 3.0

    by Reid Hoffman

    LEVEL: Intermediate

    At 3:30pm to 4:30pm, Tuesday 15th March

    In Ballroom D, Austin Convention Center