Your current filters are…
by Jon Jenkins
The Kepler spacecraft launched on March 7, 2009, initiating NASA’s first search for Earth-size planets orbiting Sun-like stars, with stunning results after being on the job for just over two years. Designing and building the Kepler science pipeline software that processes and analyzes the resulting data to make the discoveries presented a daunting set of challenges.
Although capable of reaching a precision near 20 ppm in 6.5 hours in order to detect 80-ppm drops in brightness corresponding to Earth-size transits, the instrument is sensitive to its environment. Identifying and removing instrumental signatures from the data as well as characterizing the varability of the stars themselves has proven to be extremely important in the quest for Earth-size planets. In addition, the computational intensity of processing the accumulating data compelled us to port the detection and validation pipeline components to the Pleides supercomputer at NASA Ames Research Center. As we look forward to an extended mission of up to 10 years of flight operations, balancing the need for speed against the requirement for ultrahigh precision presents a challenge.
by John Lucker
Herbert Simon once wrote that “the central concern of administrative theory is with the boundary between rational and nonrational aspects of human social behavior.” Simon’s comment is especially pertinent to the still-emerging field of business analytics. The human dimension of business analytics might facetiously be called the discipline’s “dark matter”: it looms large while tending to remain hidden from view.
In many and diverse domains, human experts must make decisions that require weighing together disparate pieces of information and are made repeatedly. Unfortunately, we are not very good at this. We rely on mental heuristics (rules of thumb), which as psychological research shows, have surprising biases that limit our ability to make truly objective decisions. The implication is society is replete with inefficient markets and business processes that can be improved with business analytics.
Analytics projects are often bedeviled – or simply stopped in their tracks – by challenges emanating from organizational culture, misunderstanding of statistical concepts, and discomfort with probabilistic reasoning. Compounding these challenges is the fact that data scientists often “speak a different language from” the business domain experts that they are charged to help. In our experience, these challenges can be among the most difficult ones faced in an analytics project, and are ignored at one’s peril. This talk will provide a number of case studies and vignettes; relate these examples to relevant ideas from the decision sciences; and offer practical tips for achieving organizational buy-in.
by Joseph Adler
Marketing is the art of telling potential customers or users about products or services that they might find useful. Some technology people might look down on marketing as a dirty, but necessary, part of running a company. That’s unfortunate, because marketing is one of the most interesting and valuable things that you can do with data.
At LinkedIn, we look at marketing as a recommendation problem, not a sales problem. Our goal is to help our users get the most benefit from our service. We use a lot of data and technology to market our own services. To do this, we use a variety of big data systems: recommendation engines, data processing, and content delivery. We rely on a team of marketing professionals, designers, engineers, and data scientists. We approach marketing scientifically, and constantly test new hypotheses to learn how to market better.
In this talk, I’m going to describe LinkedIn’s approach to personalized marketing, using the story of the award-winning “Year in Review” email message. I’ll talk about how we come up with ideas, how we test new ideas, and how we quickly turn ideas into scalable production processes. And finally, I’ll talk about Tickle, the Hadoop based system that we built to generate and prioritize marketing email messages.
‘Crowdsourcing big data’ might sound like a randomly generated selection of buzz words, but it turns out to represent a powerful leap forward in the accuracy of predictive analytics. As companies and researchers are fast discovering, data prediction competitions provide a unique opportunity for advancing the state of the art in fields as diverse as astronomy, health care, insurance pricing, sports ratings systems and tourism forecasting. This session will focus not simply on the mechanics of data prediction competitions, but on why they work so effectively. As it turns out, the ‘why’ boils down to a couple of simple propositions, one associated with Archimedes and the other with world record-breaking sprinter Roger Bannister. Those propositions are not unique to the world of data science, but, as this session will show, have a particularly compelling application to it.
by Bill Schmarzo
Companies are wrestling with the challenges of managing and exploiting big data. Larger, more diverse data sources and the business need for low-latency access to that data combine to provide new data monetization opportunities. But “bolting” analytics onto your existing data warehouse and business intelligence environment does not work. How do business owners and IT work together to identify the right business problem and then design the right architecture, to exploit these new data monetization opportunities? How do you ensure the successful deployment of these new capabilities, given the historically high rate of failure for new technologies?
This session will present a tried and proven methodology that is based upon a simple premise—business opportunities must drive all information technology deployments. While a technology-led approach is useful for helping an organization gain insight into “what” a new technology does, it is critical that the business opportunities drive the “why,” “how,” and “where” to implement new technologies.
This methodology provides the following key benefits:
• Ensures that your big data analytics initiative is focused on the business opportunities that provide the optimal tradeoff between business benefit and implementation feasibility
• Builds the organizational consensus necessary for success by aligning corporate resources around common goals, assumptions, priorities, and metrics
Case study examples will demonstrate its use.
Banking is a data business, yet the tools it has had to leverage that asset have not worked (the credit crisis should be a hint). In this talk we set out to prove that the technological advances Hadoop brings (online access to data, cheap storage, parallelized analytics, massive scalability) are a perfect match (and answer) to the massive data problems the industry faces. And we do it with real life war stories and live examples.
Information technology has been meeting disaster head on with new software, crowdsourcing inputs, and mapping tools gaining incredible potential since the Haiti earthquake. How big data really fits into benefiting disaster response both from a humanitarian relief and business continuity side has yet to mature. I will discuss needs (filtering, interfaces, real-time data processing) specifically for the unique sociological and extreme environment constraints in professional disaster response, and untapped potential for business continuity.
by Robert Munro
Pandemics are the greatest current threat to humanity. Many unidentified pathogens are already hiding out in the open, reported in local online media as sudden clusters of ‘influenza-like’ or ‘pneumonia-like’ clinical cases many months or even years before careful lab tests confirm a new microbial scourge. For some current epidemics like HIV, SARS, and H1N1, the microbial enemies were anonymously in our midst for decades. With each new infection, viruses and bacteria mutate and evolve into ever more harmful strains, and so we are in a race to identify and isolate new pathogens as quickly as possible.
Until now, no organization has succeeded in the task of tracking every global outbreak and epidemic. The necessary information is spread across too many locations, languages and formats: a field report in Spanish, a news article in Chinese, an email in Arabic, a text-message in Swahili. Even among open data, simple key-word or white-list based searches tend to fall short as they are unable to separate the signal (an outbreak of influenza) from the noise (a new flu remedy). In a project called EpidemicIQ, the Global Viral Forecasting Initiative has taken on the challenge of tracking all outbreaks. We are complementing existing field surveillance efforts in 23 countries with a new initiative that leverages large-scale processing of outbreak reports across a myriad of formats, utilizing machine learning, natural language processing and microtasking coupled with advanced epidemiological analysis.
EpidemicIQ intelligently mines open web-based reports, social media, transportation networks and direct reports from healthcare providers globally. Machine-learning and natural language processing allows us to track epidemic-related information across several orders of magnitude more data than any prior health efforts, even across languages that we do not ourselves speak. By leveraging a scalable workforce of microtaskers we are able to quickly adapt our machine-learning models to new sources, languages and even diseases of unknown origin. During peak times, the use of a scalable microtasking workforce also takes much of the information processing burden off the professional epidemic intelligence officers and field scientists, allowing them to apply their full domain knowledge when needed most.
At Strata, we propose to introduce EpidemicIQ’s architecture, strategies, successes and challenges in big-data to date.
Global Pulse is a United Nations innovation initiative that is developing a new approach to crisis impact monitoring. One of the key outputs of the project is HunchWorks, a place where experts can post hypotheses—or hunches—that may warrant further exploration and then crowdsource data and verification. HunchWorks will be a key global platform for rapidly detecting emerging crises and their impacts on vulnerable communities. Using it, experts will be able to quickly surface ground truth and detect anomalies in data about collective behavior for further analysis, investigation and action.
The presentation will open with an introduction by Chris van der Walt (Project Lead, Global Pulse) to the problem that HunchWorks is being designed to address: How to detect the emerging impacts of global crises in real-time? A short discussion of the design thinking behind HunchWorks will follow plus an overview of the HunchWorks feature set.
Dane Petersen (Experience Designer, Adaptive Path) will then discuss some of the complex user experience design challenges that emerged as the team started to wrestle with developing HunchWorks and the approaches used to address them.
Sara Farmer (Chief Platform Architect, Global Pulse) will follow up with a discussion of the technology powering HunchWorks, which is based on autonomy, uncertain reasoning and human-machine team theories, and is designed to to allow users and automated tools to work collaboratively to reduce the uncertainty and missing data issues inherent in hunch formation and management.
The presentation will conclude with 10 minutes of Q&A from the audience.
Analytical culture is the last mile problem of organizations. More data and analytics frequently lead to decision ambiguity. Insights are either not actionable and when they are actionable, they are not widely adopted at an operational level.
There has been a lot of emphasis on technology and data quality aspects of analytics; however without the analytical culture most organizations will not be able to take advantage of the benefits.
After partnering with more than 100 client organizations as a consultant, from small point solution pilots to deploying large decision support systems, I have developed a series of principles which I think are critical to create and foster an analytical culture. I want to introduce the framework and highlight the organizational principles with some real life war stories.
Some of the organizational principles that I will speak about include:
I hope that the audience will embrace some of the principles and implement them as they build their analytical organizations and solutions.
Many enterprises are being overwhelmed by the proliferation of machine data. Websites, communications, networking and complex IT infrastructures constantly generate massive streams of data in highly variable and unpredictable formats that are difficult to process and analyze by traditional methods or in a timely manner. Yet this data holds a definitive record of all activity and behavior, including user transactions, customer behavior, system behavior, security threats and fraudulent activity. Quickly understanding and using this data can provide added value to a companies services, customer sat, revenue growth and profitability. This session examines the challenges and approaches for collecting, organizing and deriving real-time insights from terabytes to petabytes of data, with examples from Salesforce.com, the nation’s leading enterprise cloud computing company.
As CTO of DoubleClick, we scaled to serve 400,000 ads/second. We developed and used many custom data stores long before “nosql” was a buzzword. Over the years, I’ve seen companies I’ve worked with struggle with both scalability and agility. Writing the first lines of MongoDB code in 2007, we drew upon these experiences building large scale, high availability, robust systems. We wanted MongoDB to be a new kind of database that tackled the challenges we were trying to solve at DoubleClick.
This session will focus on internet infrastructure scaling and also cover the history and philosophy of MongoDB.
22nd–23rd September 2011