Your current filters are…
by Jessica Jackley, Ryan Gilbert, Paul Leonard and Douglas Merrill
Technology and mathematics are transforming consumer lending. Historically, it has been nearly impossible for people with bad credit to get loans. Yet, these are often the people who need it most - to buy groceries or pay bills.
Until now, lenders determined who should get loans through a simple underwriting function based on a small amount of credit data. When this data is missing or wrong, banks deny the loan, leaving people to payday loans or pawn shops - very expensive options that put people further in debt.
Millions of people are being denied credit because underwriting hasn’t evolved. Why use only a handful of variables when we have vast amounts of data provided by the customer, the Internet, and social media? All data is credit data and we should use it all to make better underwriting decisions.
Analyzing vast amounts of data, however, requires complex machine learning more akin to search engines than your corner bank. The future of financial services is to become more like a recommendation engine, and less like a place where you stand in line to deposit checks.
The panelists will discuss how to use large-scale data analysis to re-invent underwriting and replace today’s antiquated methods. Better underwriting will open up good credit to people who don't have a lot of good options and materially improve the financial lives of the people who need it most.
The expectation of transparency is creating demand for government agencies to develop new ways to communicate complex data and trends to the public in easy-to-access and easy-to-understand formats.
Some agencies are turning to Google Maps and KML data to visualize raw information online and on mobile devices. Delivering data in more easily understandable formats not only boosts trust and confidence between government agencies and their publics, but also streamlines workloads among Data, Web, Editorial, and Customer Service teams.
The Texas Comptroller is the state’s chief revenue officer, tax collector, and treasurer. The agency uses public-facing maps to communicate data and economic trends across the state, editorial coverage, and to promote initiatives such as its Unclaimed Property initiative, which works to reunite taxpayers with about $2 billion in unclaimed money and property.
This discussion will focus on how agencies and other organizations can use free or inexpensive tools to deliver data to the public in both traditional online formats and mobile platforms, and how workflows can be arranged so that data visualization can be managed and administered by non-technical staff. We will also discuss how maps can be used internally to enhance strategic efforts.
Displaying geography alone is easy: interactive maps are more and more a part of our everyday lives. Displaying time alone is easy: we are all familiar with charts and animations that show the passage of time. It is increasingly common to display time and space together in a single visual interface as well, but this combination has raised a number of new questions. There are few conventions or standards for geotemporal visualization, and we are still discovering which approaches are most effective for which datasets. Focusing particularly on historical data, this panel will explore issues in the modeling and visualization of geotemporal information, presenting existing approaches and discussing new trends.
Over the past several years, there have been many discussions regarding how interactive technology can drive change in our nation’s politics – but of perhaps greater importance is how technology can improve the daily functioning of our nation’s government.
The discussion should not be a partisan one – this panel will bring together leading innovators from both parties to engage in a post-partisan discussion about how technology can improve the public’s interactions with their government.
This discussion should be about specifics – we can all agree on the broad principles that technology drives change – but we have all heard that conversation before. This panel will focus on the specific progress that has been made, the specific opportunities that exist in the near future, and the specific challenges that need to be addressed.
As citizens increasingly become on-demand consumers in their daily lives, it is clear that government needs to better utilize interactive technology or it will only be more radically disconnected from the public.
This is not a political conference, which is precisely why it should be where this conversation takes place – how can the innovations from the creative, marketing and interactive communities be applied to improving our nation?
Our government needs to modernize. We need to move forward and debate new ideas, focusing on how we can collectively make our government work smarter, faster and better for all citizens.
by Paul Lamere
With so much music available, finding new music that you like can
be like finding a needle in a haystack. We need new tools to help
us to explore the world of music, tools that can help us separate
the wheat from the chaff.
In this panel we will look at how visualizations can be used to
help people explore the music space and discover new, interesting
music that they will like. We will look at a wide range of
visualizations, from hand drawn artist maps, to highly interactive,
immersive 3D environments. We'll explore a number of different
visualization techniques including graphs, trees, maps, timelines
and flow diagrams and we'll examine different types of music data
that can contribute to a visualization.
Using numerous examples drawn from commercial and research systems
we'll show how visualizations are being used now to enhance music
discovery and we'll demonstrate some new visualization techniques
coming out of the labs that we'll find in tomorrow's music
The Health 2.0 and Open Gov movements have helped unlock large repositories of data - from user-generated data in hundreds of online communities to mobile devices to federal quality indicators to medical record data within provider organizations. But much remains to be done to connect these disconnected islands of data to generate information that's meaningful and actionable by end users. And what happens when you link informed patient communities with their health data? As Clay Shirky says, it gets weird. And interesting.
A number of communities have cropped up to promote access to medical data and the integration of user-reported and behavioral data within the clinical decision stream including healthdatarights.org, #healthapps, #health2dev, #73cents, #getupandmove and #WhyPM. With the opening up of health datasets, platform APIs and increasingly sophisticated analytic engines to make user-generated health data clinically relevant, we can finally unleash the wider developer community to build robust and integrated tools to improve health and healthcare.
This session brings together some of the leading voices in the Health 2.0 movement to discuss and demo technologies that help access, mine, display and distribute control of health information across a wide variety of interfaces and devices. We will also hear how government is opening healthcare datasets for access by the developer community and how patients are increasingly becoming "n of 1" platforms.
What is Cassandra? What is NoSQL? Why are sites like Facebook, Twitter, Google and Digg all using these new technologies? And what does that mean to me?
The popularity of the NoSQL movement has exploded in the last year or two, as a number of these non-traditional data storage systems have gone from experimental curiosities to powerful production-ready engines that power the largest real-time social networking sites on the Web.
Born out of Facebook, Cassandra is one of super-hot players in this new movement. We recently had an opportunity to build a new social networking site using it for the first time, and we want to share what we learned.
In this presentation:
Code samples are in Ruby on Rails.
by Todd Park
by Adam Honore, Jacob Sisk, Armando Gonzalez and John Kittrell
Trading on news is not new. Terminals have had news readers attached from the time trading went electronic. What is new is who, or what, is trading on news. Born from a hybrid of technological capability, electronification of the markets, algorithmic trading, and a little influence from the intelligence community, black box trading systems are now applying semantic analysis to trade on news items without a single human ever reading the story. While only 2% of trading firms were doing this two years ago, roughly one-third are exploring it today. This session looks at the data, drivers, and technology behind trading on unstructured content.
Solr is an open source, Lucene based search platform originally developed by CNET and used by the likes of Netflix, Yelp, and StubHub which has been rapidly growing in popularity and features during the last few years. Learn how Solr can be used as a Not Only SQL (NoSQL) database along the lines of Cassandra, Memcached, and Redis.
NoSQL data stores are regularly described as non-relational, distributed, internet-scalable and are used at both Facebook and Digg.
This presentation will quickly cover the fundamentals of NoSQL data stores, the basics of Lucene, and what Solr brings to the table. Following that we will dive into the technical details of making Solr your primary query engine on large scale web applications, thus relegating your traditional relational database to little more than a simple key store.
Real solutions to problems like handling four billion requests per month will be presented. We'll talk about sizing and configuring the Solr instances to maintain rapid response times under heavy load. We'll show you how to change the schema on a live system with tens of millions of documents indexed while supporting real-time results. And finally, we'll answer your questions about ways to work around the lack of transactions in Solr and how you can do all of this in a highly available solution.
We've been called "consumers" for decades, but we are also producers - of data. Lots of it. Everywhere we take our business, we leave a trail of data, little of which we manage, and almost none of which we control. Most of that data, however, is produced without our knowledge or control.
What happens when we take control of that activity? What happens when "the marketplace" is not a collection of customer fish in a sellers' barrel, but a truly open space where relationships are genuine, meaningful, and mutual?
This is the start of a revolution. The Shopping as a Revolutionary Act? panel will look at some of the leading developments involved, where they are likely to head, and what changes these will bring to our economic, social and personal lives.
by Bruce Smith
Social media applications encounter messy user-generated data in blog posts, status updates, tweets, user profiles, etc. These documents contain free-form text that obeys no particular rules of grammar, punctuation or spelling.
If the data is so messy, how can a computer program recognize adult content or hate speech or spam? How can a computer program tell the difference between an advertisement and a product review? How can a computer program distinguish between a positive and a negative product review?
Machine learning offers some solutions. For example, given sample tweets labeled (by people) as spam or non-spam, machine learning tools can generate a program (or model) that attempts to duplicate the human judgments. You could use this kind of model in your application to filter out tweet spam.
In this talk we will describe
•Some common machine learning algorithms
•Machine learning tools – free and commercial
•Acquiring and managing training data
•Extracting useful features from your documents
•Choosing the right technique for a problem
•Measuring quality and improving your model over time
•Integrating a machine learned model with your application
Coming out of this session, you will know where you might use machine learning in your applications, and you will know how to get started.
Big Data creates problems and opportunities that do not exist when dealing with smaller datasets. You will learn how to scale, utilize, and visualize Big Data as well as create and integrate Big Data related APIs. We will talk about how to scale your data, expose your data through APIs, integrate existing data from the data marketplace, and communicate your data through visualization.You will find out what techniques and strategies work best when working with Big Data. Many developers have learned how to scale their systems for high levels of concurrency. However, scaling for Big Data has its own unique challenges. Sometimes strategies that would make no sense for smaller systems work great when dealing with larger datasets. This Workshop is geared towards PHP developers, but all are welcome.
In the old days it was DJs, A&R folks, labels and record store owners that were the gatekeepers to music. Today, we are seeing a new music gatekeeper emerge... the developer. Using open APIs, developers are creating new apps that change how people explore, discover, create and interact with music. But developers can't do it alone. They need data like gig listings, lyrics, recommendation tools and, of course, music! And they need it from reliable, structured and legitimate sources.
In this presentation we'll discuss and explore what is happening right now in the thriving music developer ecosystem. We'll describe some of the novel APIs that are making this happen and what sort of building blocks are being put into place from a variety of different sources. We'll demonstrate how companies within this ecosystem are working closely together in a spirit of co-operation. Each providing their own pieces to an expanding pool of resources from which developers can play, develop and create new music apps across different mediums - web, mobile, software and hardware. We'll highlight some of the next-generation of music apps that are being created in this thriving ecosystem.
Finally we'll take a look at how music developers are coming together at events like Music Hack Day, where participants have just 24 hours to build the next generation of music apps. Someone once said, "APIs are the sex organs of software. Data is the DNA." If this is true, then Music Hack Days are orgies.
Faced with the costs of vertically scaling their relational database systems, developers are increasingly turning to Apache Cassandra as an alternative. Cassandra solves the scaling problem by partitioning data, expanding horizontally and promising replication consistency. Effectively utilizing Cassandra requires that developers take different approaches to the ways they model data used in their applications. This presentation will explain how Cassandra achieves scale and reliability, and give an example of porting a SQL schema to Cassandra.
Big Data solutions, such as Apache Hadoop and Apache Cassandra, are growing up and are in the process of moving out of a grassroots movement to widespread adoption. Unfortunately, the majority of the technical expertise still lies in the hands of the open source project contributors and most solutions are tackled from the bottom up, starting with the technical problems. The collateral that is presently available is largely from the social media giants that tout solutions built using 10,000 node clusters that process petabytes of data a day. The reality? The average person just cannot relate or intuitively draw parallels to their own business problems.
While Big Data solutions are worthwhile far before you reach petabyte scale data, just getting started can be a challenge in itself. New open source projects are being regularly released that tackle a variety of issues related to Big Data, some of which are just slightly different to existing technologies. Just how does one navigate the plethora of technologies to design workable solutions to business problems? What if you only have gigabytes or terabytes of "medium" data on a small cluster? This panel features Solution Architects from a variety of key companies in the Big Data space which will provide deep dive technical discussions on real solutions we've employed for our customers, across a variety of industries, starting with the business problems.
Open APIs are sweeping through public media, just like the rest of the world, but folks at NPR, PBS and others are thinking even bigger. Public media is in an unprecedented project to build an open API called the Public Media Platform (PMP) that will help developers create applications that bring personalized public media content to new platforms. Come learn from the leaders of the PMP on how this project is rolling out, where it is headed and how it can benefit you. We will be discussing how public media is creating the right technology layer, as well as balancing business rules to build new opportunities for our media to be For, By and Of the People.
For centuries, libraries, archives, and museums have been creating structured data, organizing information, and managing metadata in order to organize and share cultural artifacts and knowledge with the public. Unfortunately, the bulk of these systems have evolved in isolation, long before the advent of the World Wide Web. However, the convergence of developments in culture and technology are resulting in exciting new ways for individuals and developers alike to interact directly with unprecedented amounts of structured data, historical photos and archives, and more.
Expert developers and project managers in this field will lead a discussion focused on the question: How can developers leverage open data from libraries, archives and museums being made available to the public? Panelists will review new developments and highlight examples, considering use cases with Linked Data, Flickr Commons, Smithsonian Commons, mobile apps, and scalability.
by Reid Hoffman
11th–15th March 2011