Your current filters are…
by Michael Rys
Contrary to popular belief, SQL and NoSQL are not at odds with each other, they are duals—in fact NoSQL should really be called coSQL. Recognizing this duality can change the way we think about which technology to use when, and what we need to invest in next.
by Claudia Perlich
With the collection of almost every piece of information about your customers comes the ability to start asking your data the right question: Why do they do what they do? And even more: what would they do if I could interact with them. We show for the case of online display advertising, how causal analysis gives interesting new answers about the right (and wrong) ways of spending your money.
Getting training data for a recommender system is easy: if users clicked it, it’s a positive – if they didn’t, it’s a negative. … Or is it? In this talk, we use examples from production recommender systems to bring training data to the forefront: from overcoming presentation bias to the art of crowdsourcing subjective judgments to creative data exhaust exploitation and feature creation.
Learn various ways to bootstrap a custom corpus for training highly accurate natural language processing models. Real world examples will be presented with Python code samples using NLTK. Each example will show you how, starting from scratch, you can rapidly produce a highly accurate custom corpus for training the kinds of natural language processing models you need.
by Ben Gimpert
Twenty-first century big data is being used to train predictive models of emotional sentiment, customer churn, patient health, and other behavioral complexities. Variable importance and feature selection reduces the dimensionality of our models, so an unfeasible and complex problem may become somewhat more predictable.
The tools of social network analysis – centrality measures, clustering, graph-traversal algorithms, community detection and so forth – are largely based on mathematical network theory. There is very little in these techniques that actually requires that the data represents social activity. This presentation will show how these techniques can be applied to data from areas such as geo, the Wikipedia link graph and linguistics.
We’ll show how to take tabular or textual data and derive graph representations from it that can be used to apply these techniques. We’ll discuss practical applications of these techniques in delivering new features for web applications. We’ll also show how the powerful visualisation tool Gephi can be used to explore the data once it’s in graph form.
This talk will be partly based on content from an Ignite talk given at Strata NYC 2011: http://slideshare.net/mattb/plac...
Relational databases were based on Set theory — which insists that the order of items does not matter. For many (most?) data problems, however, order does matter. By using Array theory, a relational-like database gains a considerable advantage over set-theory based engines.
We examine the effectiveness of a statistical technique known as survival analysis to optimize the cache time-to-live for hotel rates in a hotel rate cache. We describe how we collect and prepare nearly a billion records per day utilizing MongoDB and Hadoop. Finally, we show how this analysis is improving the operation of our hotel rate cache.
End the day by joining leading data scientists in debating the hot issues in the profession.
by Jock Mackinlay
Visual analysis is an iterative process for working with data that exploits the power of the human visual system. The formal core of visual analysis is the mapping of data to appropriate visual representations.
In this talk, you’ll learn: •What years of research by psychologists, statisticians and others have taught us about designing great visualizations •Fundamental principles for designing effective data views for yourself and others •How to systematically analyze data using your visual system
Data visualization is often where people realize the real value in underlying data. Good data visualization tools are therefore vital for many data projects to reach their full potential.
Many companies have realized this and are looking for the best solutions to address their data visualization needs. There is plenty of tools to choose from, but even for relatively simple charting, many have found themselves with limited options. As the requirements pile up, options become limited: Cross-browser compatibility, server-side rendering, iOS support, interactivity, full control of branding, look and feel … and you’ll find yourself compromising, or – worse yet – building your own visualization library!
Building our data publishing platform – DataMarket.com – we’ve certainly been faced with the aforementioned challenges. In this session we’ll share our findings and approach for others to avoid our mistakes and learn from our – sometimes hard – lessons learned.
We’ll also share what we see the future of online data visualization holding: the technologies we’re betting on and how things will become easier, visualizations more effective, code easier to maintain and applications more user friendly as these technologies mature and develop.
Data isn’t just for supporting decisions and creating actionable interfaces. Data can create nuance, giving new understandings that lead to further questioning—rather than just actionable decisions. In particular, curiosity, and creative thinking can be driven by combining different data sets and techniques to develop a narrative around a set of data sets that tells the story of a place—the emotions, history, and change embedded in the experience of the place.
In this session, we’ll see how far we can go in exploring one street in San Francisco, Haight Street, and see how well we can understand it’s geography, ebbs and flows, and behavior by combining as many data sources as possible. We’ll integrate basic public data from the city, street and mapping data from Open Street Maps, real estate and rental listings data, data from social services like Foursquare, Yelp and Instagram, and analyze photographs of streets from mapping services to create a holistic view of one street and see what we can understand from this. We’ll show how you can summarize this data numerically, textually, and visually, using a number of simple techniques.
We’ll cover how traditional data analysis tools like R and NumPy can be combined with tools more often associated with robotics like OpenCV (computer-vision) to create a more complete data set. We’ll also cover how traditional data visualization techniques can be combined with mapping and augmented reality to present a more complete picture of any place, including Haight Street.
by Bitsy Hansen
I am frequently asked for advice about using data visualization to solve communication problems that are better served through improved information architecture. A nicely formatted bar chart won’t rescue you from a poorly planned user interface. When designing meaningful data experiences it’s essential to understand the problems your users are trying to solve.
In this case, I was asked to take a look at a global data-delivery platform with a number of issues. How do we appeal to a broad cross-section of business users? How do we surface information to our clients in a useful way? How do we facilitate action, beyond information sharing? How do we measure success?
A user-centered approach allowed us to weave together a more meaningful experience for our business users and usability testing revealed helpful insights about how information sharing and data analysis flows within large organizations.
Data visualization is a powerful tool for revealing simple answers to complex questions, but context is key. User-centered design methods ensure that your audience receives the information they need in a usable and actionable way. Data visualization and user experience practices are not mutually exclusive. They work best when they work together.
Many options exist when choosing a framework to build a custom data explorer on top of your company’s stack. With a brief nod to out-of-the-box business intelligence solutions, the presenters will offer an overview of the creative coding frameworks that lend themselves to data visualization on and across web browsers and native apps written for Mac OS X, iOS, Windows, and Android. Evaluation of the strengths and weaknesses of libraries such as Processing, OpenFrameworks, Cinder, Polycode, Nodebox, d3.js, PhiloGL, Raphael.js, Protovis, and WebGL will be explored through visual examples and code. The audience should come away with a sense of what investments into education will return a high value product that serves unique design goals.
Since the early days of the data deluge, Lift Lab has been helping many actors of the ‘smart city’ in transforming the accumulation of network data (e.g. cellular network activity, aggregated credit card transactions, real-time traffic information, user-generated content) into products or services. Due to their innovative and transversal incline, our projects generally involve a wide variety of professionals from physicist and engineers to lawyers, decision makers and strategists.
Our innovation methods embark these different stakeholders with fast prototyped tools that promote the processing, recompilation, interpretation, and reinterpretation of insights. For instance, our experience shows that the multiple perspectives extracted from the use of exploratory data visualizations is crucial to quickly answer some basic questions and provoke many better ones. Moreover, the ability to quickly sketch an interactive system or dashboard is a way to develop a common language amongst varied and different stakeholders. It allows them to focus on tangible opportunities of product or service that are hidden within their data. In this form of rapid visual business intelligence, an analysis and its visualization are not the results, but rather the supporting elements of a co-creation process to extract value from data.
We will exemplify our methods with tools that help engage a wide spectrum of professionals to the innovation path in data science. These tools are based on a flexible data platform and visual programming environment that permit to go beyond the limited design possibilities industry standards. Additionally they reduce the prototyping time necessary to sketch interactive visualizations that allow the different stakeholder of an organization to take an active part in the design of services or products.
by Max Gadney
Videographics achieve the two most important criteria of the visualizer.
They engage attention and they inform.
I am currently working with the BBC to define a new format – that of the ‘Video Dat Graphic’. Some of these exist online to degrees of success but we are codifying best practice, auditing current activity and can show our work in the market context.
I will discuss how video is an information rich medium – from a survey of data resolution across media and how these videos can compliment the BBC online offering as a whole.
Some subjects to cover will be - storytelling principles – what actually works in 2 minutes - scripting and storyboarding – drafting a plan - timescales, costs and resources - designing for cognition – how video needs to understand how we perceive
I’ll be showing many examples in addition to our work.
This is a high paced session, with lots to look at and an excellent mix of storytelling and information design ideas. There is an excellent balance between theory and practical advice.
by Ryan Ismert
Our presentation will cover the nascent fusion of automatically-collected live Digital Records of sports Events (DREs) with Augmented Reality (AR), primarily for television broadcast.
AR has long been used to in broadcast sports to show elements of the event that are otherwise difficult to see – the canonical examples are the virtual yellow “1st and 10” line for American Football and ESPNs KZone™ strike zone graphics. Similarly, sports leagues and teams have historically collected large amounts of data on events, often expending huge amounts of manual effort to do so. Our talk will discuss the evolution of data-driven AR graphics and the systems that make them possible. We’ll focus on systems for automating the collection of huge amounts of event data/metadata, such as the race car tracking technology used by NASCAR and the MLB’s PitchFX™ ball tracking system. We provide a rubric for thinking about classes of sports event data that encompasses scoring, event and action semantics metadata, and participant motion.
We’ll briefly discuss the history of these sports data collection technologies, and then take a deeper look at how the current first generation of automated systems are being leveraged for increasingly sophisticated analyses and visualizations, often via AR, but also through virtual worlds renderings from viewpoints unavailable or impossible from broadcast cameras. The remainder of the talk will examine two case studies highlighting the interplay between rich, live sports data and augmented reality visualization.
The first case study will describe one of the first of the next-gen digital records systems to come online and track players – Sportvision’s FieldFX™ system for baseball. Although exceeding difficult to collect, the availability of robust player motion data promises to revolutionize areas such as coaching and scouting performance analysis, fantasy sports and wagering, broadcast TV graphics and commentary, and sports medicine. We’ll show examples of some potential applications, and also cover data quality challenges in some detail, in order to examine the impact that these challenges have on the applications using the data.
The second case study will examine the rise of automated DRE collection as an answer to that nagging question about AR – ‘what sort of things do people want to see that way?’ Many of the latest wave of AR startups are banking huge amounts of venture money that the answer is in user-generated or crowd-sourced content. While this may end up being true for some consumer-focused mobile applications, our experience in the notoriously tight-fisted rights and monetization environment of sports has led directly to the requirement to create owned, curated data sources. This came about from four realizations that we think are more generally applicable to AR businesses…
Cool looking isn’t a business, even in sports.
It must be best shown in context, over video, or it won’t be shown at all.
The ability to technically execute AR is no longer a barrier to entry. Cutting edge visualization will only seem amazing for the next six seconds.
We established impossibly high quality expectations, and now the whole industry has to live with them.
In an increasingly mobile world, we are each generating tons of geo-tagged data. Photo uploads to Instagram, tweets, Foursquare check-ins, local searches, and even real-time public-transportation feeds are commonplace. The companies that gather this data make a lot of it freely available. The people who work for these companies have many opportunities to learn from this data. But in order to learn, we must first figure out what questions to ask. Visualization is a tool that helps us think of questions and begin to answer them.
There are 3 different major ways to think about geodata:
Additionally, creating tools that allow users to explore data on multiple scales (i.e. zoom) is important, but adds complexity: you have to find a tile source and perhaps even render your data to tiles.
Choice of projection is key. Most of us grew up with the Mercator projection, but an equal-area projection is often a better choice.
I will take one data set and walk through visualizing it using the 3 approaches described above.
The first example will use Processing and Tile Mill to generate a zoomable animated map, playing back a month worth of data. I’ll show how to render the map to a movie for easy distribution.
The second example will use d3.js to show the same data at a county level in a chloropleth map. I’ll discuss color schemes and interaction, and compare what can be done with d3.js to Fathom’s Stats of the Union project.
The last example will talk about how to make a heatmap with millions of data points.
by Mano Marks and Chris Broadfoot
Beautiful, useful and scalable techniques for analysing and displaying spatial information are key to unlocking important trends in geospatial and geotemporal data. Recent developments in HTML 5 enable rendering of complex visualisations within the browser, facilitating fast, dynamic user interfaces built around web maps. Client-side visualization allows developers to forgo expensive server-side tasks to render visualisations. These new interfaces have enabled a new class of application, empowering any user to explore large, enterprise-scale spatial data without requiring specialised geographic information technology software. This session will examine existing enterprise-scale, server-side visualization technologies and demonstrate how cutting edge technologies can supplement and replace them while enable additional capabilities.
by Robbie Allen
With recent advances in linguistic algorithms, data processing capabilities and the availability of large structured data sets, it is now possible for software to create long form narratives that rival humans in quality and depth. This means content development can take advantage of many of the positive attributes of software, namely, continuous improvement, collaborative development and significant computational processing.
Robbie Allen, the CEO of Automated Insights, and his team have done this to great effect by automatically creating over 100,000 articles covering College Basketball, College Football, NBA, MLB, NFL in a 10 month period. Automated Insights is now branching out beyond sports into finance, real-estate, government, and healthcare.
In this talk, Robbie will share the lesson’s his company has learned about the viability of automated content and where the technology is headed. It all started with short sentences of uniform content and has expanded to the point where software can generate several paragraphs of unique prose highlighting the important aspects of an event or story.
A story on the U.S. Census will tell the broad themes behind the data and use people to exemplify those themes. But what every reader also wants to know answers to more specific questions: How did my community change? What happened where I live, in my neighborhood? And being able to provide those answers through an interactive visualization is what story-telling through the data is all about. A story or report on a subject by its very nature summarizes the underlying data. But readers may have questions specific to a time, date or place. Visualizing the data and providing effective, targeted ways to drill deeper is key to giving the reader more than just the story. The visualization can enhance and deepen the experience. Cheryl Phillips will discuss data visualization strategies to do just that, providing examples from The Seattle Times and other journalism organizations.
28th February to 1st March 2012