Your current filters are…
Edd Dumbill and Alistair Croll welcome you to Strata.
by Hilary Mason
Data science is evolving rapidly. I'll talk about our current and slightly future technical and philosophical challenges, including realtime vs non-realtime analysis, streams of data vs traditional databases, and some of the opportunities we have to learn amazing things about the world through our data and what this means for those of us who are immersed in working with it.
by James Powell
Ours is a new era of big behavioral data. Unprecedented business model experimentation is rapidly eroding individual privacy despite rising consumer concerns. Successfully managing privacy is a key differentiator for services providers. In the B2B space, the stakes to get privacy right are even higher. This talk will discuss the implications of privacy in order to succeed in the B2B space.
by Mark Madsen
Big data and analytics have developed a mythology rooted in underlying assumptions. We need to ignore these myths and think clearly about how organizations use data, which means understanding how people use information and make decisions.
The new data centricity drives that we have to rethink how we collect, store, manage, analyze and share our data, as all these processes now require limitless resources. This talk will focus on the changes in infrastructure requirements to support the new world and how innovations are removing barriers for companies to be successful.
by Zane Adam
by Abhishek Mehta, Mike Olson and Rod Smith
The tools we use play a key role in how we use and respond to big data. Hear about the changes being led by key architects of future big data systems.
by Ed Boyajian
The move to cloud infrastructure and the need to handle big data have created the perfect catalysts for organizations to introduce new infrastructure software and break ties from their expensive incumbent vendors. Ed will share a detailed strategy on how to leverage open source database solutions like PostgreSQL to contain database cost and free budget for other, more valuable initiatives.
Topics for any discipline that focuses on quantitative or technical data have always depended on the datasets that were available at the time. Crowdsourcing has changed that — democratizing the data-collection process and cutting researchers’ reliance on stagnant, overused datasets. Tools like Amazon Mechanical Turk allow anyone to gather data overnight, rather than waiting years.
Learn how to leverage data exhaust, the digital byproduct of our online activities, to solve problems and discover insights about the world around you. We will walk through a real world example which combines several datasets and statistical techniques to discover insights and make predictions about attendees at O'Reilly Strata.
by Jock Mackinlay
Interactive visualizations have become the new media for telling stories online. This session will focus on going from a good visualization to a great visualization by focusing on organization, user interface, and formatting. You should expect to leave this session confident in your ability to consistently create excellent interactive visuals.
by Dustin Kirk
When faced endless data and the need to manage it, there are a variety of proven design patterns that will help designers create usable, efficient, and effective interfaces. From distributing workload to reducing sensory overload, we’ll review the techniques that are enabling the highly scalable user interfaces of today and tomorrow.
Much useful business data is in "semi-structured" form: government filings, insurance claims, customer comment forms, etc. Although most search tools don't take advantage of it, knowing a little structure goes a long way. This talk will show how semi-structured data can be interpreted, summarized, and applied to produce business value in several real-life examples.
by Tim Estes
Developing a social network map is fundamental to comprehensively understanding a person. Social networks are dynamic and better derived from real-world data than static configurations. However, the vast majority of this real world data is unstructured. This preso will show how Synthesys uses very large scale unstructured data to create social network maps for reporting and further analysis.
A discussion of Big Data approaches to analysis problems in marketing, forecasting, academia and enterprise computing. We focus on practices to enhance collaboration and employ rich statistical methods: a Magnetic, Agile and Deep (MAD) approach to analytics. While the approach is language-agnostic, we show that sophisticated statistics can be easily scaled in traditional environments like SQL.
Birds of a Feather (BoF) sessions provide face to face exposure to those interested in the same projects and concepts. BoFs can be organized for individual projects or broader topics (best practices, open data, standards). BoF topics are entirely up to you. Wednesday's Lunchtime BoF sessions will happen on the hotel side of the Hyatt Regency, Mezzanine Level.
by Peter Jackson
Our talk summarizes some recent thinking in the field of vertical search and illustrates it in the context of a new version of Westlaw, called WestlawNext. We argue that getting the right allocation of function between person and machine is the key to making specialist content more findable and search results more understandable.
Data modeling competitions allow companies and researchers to post a problem and have it scrutinised by the world's best data scientists. By exposing a problem to a wide audience, competitions are a great way to get the most out of a dataset. In just a few months, Kaggle's competitions have helped to progress the state of the art in chess ratings and HIV research.
If you're a new startup looking for investment, or a team at a large company seeking the green light for a new product, nothing convinces like real running code. But how do you solve the chicken-and-egg problem of filling your early prototype with real data? We'll discuss how to use open datasets and public web APIs as a proxy for the final product while you're still in the development stage.
by Kim Rees
While the majority of charts were designed to handle a variety of data, there is a certain novelty of presenting data in a very succinct way. By designing a presentation method restricted to specific data points, we can realize an economy of space and interface.
How do you build a crack team of data scientists on a shoestring budget? In this 40-minute presentation from the co-founder of Infochimps, Flip Kromer will draw from his experiences as a teacher and his vast programming and data experience to share lessons learned in building a team of smart, enthusiastic hires.
by Pete Soderling and Pete Forde
The state of open data today is a real mess. It's very difficult to find the data you need and be confident that it's timely and accurate. There is a growing list of companies now vying to become the key destinations for people to gather around new datasets and be excited together. What projects, partnerships and even ventures would be created if there was a marketplace for data?
Information is changing healthcare forever. From the study of epidemics, to machine learning that can improve diagnosis, to the sequencing of the human genome, we're doing the math of life itself. This panel of practitioners will show us what they're doing in healthcare, pharmaceuticals, and genomics, and how it will change the way we discover, treat, and eliminate disease.
by Davin Potts
This session explores how to get more done, faster with high-performance Map/Reduce and expand the universe of Hadoop possibilities with tools to speed and simplify development and deployment of analytic applications.
by Sudhir Hasbe and Bruno Aziza
Windows Azure Marketplace includes data, imagery, and real-time web services from leading commercial data providers and authoritative public data sources. Customers have access to datasets such as demographic, environmental, financial, retail, weather and sports.
Certain recent academic developments in large data have immediate and sweeping applications in industry. They offer forward-thinking businesses the opportunity to achieve technical competitive advantages. However, these little-known techniques have not been discussed outside academia–until now. What if you knew about important new large data techniques that your competition don't yet know about?
Can machines help us make better decisions? In this panel, real-world practitioners from the travel, finance, and energy industry give us an inside look at how they’re applying machine learning to their industries, optimizing the use of resources and helping with decision support.
"Many hands make light work", as the saying goes. That's true when thousands of people can collaborate on a data set. In this session, we'll look at collective interfaces that allow many distributed users to examine and share data with one another, and how that's changing traditional desktop visualization tools.
Many of the tools Google created to store, query, analyze, visualize data are exposed to external developers. This talk will give you an overview of Google services for Data Crunchers: Google Storage for developers, BigQuery, Machine Learning API, App Engine, Visualization API.
Join practitioners from a range of industries to learn how they're putting new tools and massive data sets to work. We'll hear how music, geophysics, and the legal system are all changing by putting huge, rich information into the hands of business.
1st–3rd February 2011