Information technology has been meeting disaster head on with new software, crowdsourcing inputs, and mapping tools gaining incredible potential since the Haiti earthquake. How big data really fits into benefiting disaster response both from a humanitarian relief and business continuity side has yet to mature. I will discuss needs (filtering, interfaces, real-time data processing) specifically for the unique sociological and extreme environment constraints in professional disaster response, and untapped potential for business continuity.
by Robert Munro
Pandemics are the greatest current threat to humanity. Many unidentified pathogens are already hiding out in the open, reported in local online media as sudden clusters of ‘influenza-like’ or ‘pneumonia-like’ clinical cases many months or even years before careful lab tests confirm a new microbial scourge. For some current epidemics like HIV, SARS, and H1N1, the microbial enemies were anonymously in our midst for decades. With each new infection, viruses and bacteria mutate and evolve into ever more harmful strains, and so we are in a race to identify and isolate new pathogens as quickly as possible.
Until now, no organization has succeeded in the task of tracking every global outbreak and epidemic. The necessary information is spread across too many locations, languages and formats: a field report in Spanish, a news article in Chinese, an email in Arabic, a text-message in Swahili. Even among open data, simple key-word or white-list based searches tend to fall short as they are unable to separate the signal (an outbreak of influenza) from the noise (a new flu remedy). In a project called EpidemicIQ, the Global Viral Forecasting Initiative has taken on the challenge of tracking all outbreaks. We are complementing existing field surveillance efforts in 23 countries with a new initiative that leverages large-scale processing of outbreak reports across a myriad of formats, utilizing machine learning, natural language processing and microtasking coupled with advanced epidemiological analysis.
EpidemicIQ intelligently mines open web-based reports, social media, transportation networks and direct reports from healthcare providers globally. Machine-learning and natural language processing allows us to track epidemic-related information across several orders of magnitude more data than any prior health efforts, even across languages that we do not ourselves speak. By leveraging a scalable workforce of microtaskers we are able to quickly adapt our machine-learning models to new sources, languages and even diseases of unknown origin. During peak times, the use of a scalable microtasking workforce also takes much of the information processing burden off the professional epidemic intelligence officers and field scientists, allowing them to apply their full domain knowledge when needed most.
At Strata, we propose to introduce EpidemicIQ’s architecture, strategies, successes and challenges in big-data to date.
Global Pulse is a United Nations innovation initiative that is developing a new approach to crisis impact monitoring. One of the key outputs of the project is HunchWorks, a place where experts can post hypotheses—or hunches—that may warrant further exploration and then crowdsource data and verification. HunchWorks will be a key global platform for rapidly detecting emerging crises and their impacts on vulnerable communities. Using it, experts will be able to quickly surface ground truth and detect anomalies in data about collective behavior for further analysis, investigation and action.
The presentation will open with an introduction by Chris van der Walt (Project Lead, Global Pulse) to the problem that HunchWorks is being designed to address: How to detect the emerging impacts of global crises in real-time? A short discussion of the design thinking behind HunchWorks will follow plus an overview of the HunchWorks feature set.
Dane Petersen (Experience Designer, Adaptive Path) will then discuss some of the complex user experience design challenges that emerged as the team started to wrestle with developing HunchWorks and the approaches used to address them.
Sara Farmer (Chief Platform Architect, Global Pulse) will follow up with a discussion of the technology powering HunchWorks, which is based on autonomy, uncertain reasoning and human-machine team theories, and is designed to to allow users and automated tools to work collaboratively to reduce the uncertainty and missing data issues inherent in hunch formation and management.
The presentation will conclude with 10 minutes of Q&A from the audience.
Analytical culture is the last mile problem of organizations. More data and analytics frequently lead to decision ambiguity. Insights are either not actionable and when they are actionable, they are not widely adopted at an operational level.
There has been a lot of emphasis on technology and data quality aspects of analytics; however without the analytical culture most organizations will not be able to take advantage of the benefits.
After partnering with more than 100 client organizations as a consultant, from small point solution pilots to deploying large decision support systems, I have developed a series of principles which I think are critical to create and foster an analytical culture. I want to introduce the framework and highlight the organizational principles with some real life war stories.
Some of the organizational principles that I will speak about include:
I hope that the audience will embrace some of the principles and implement them as they build their analytical organizations and solutions.
Many enterprises are being overwhelmed by the proliferation of machine data. Websites, communications, networking and complex IT infrastructures constantly generate massive streams of data in highly variable and unpredictable formats that are difficult to process and analyze by traditional methods or in a timely manner. Yet this data holds a definitive record of all activity and behavior, including user transactions, customer behavior, system behavior, security threats and fraudulent activity. Quickly understanding and using this data can provide added value to a companies services, customer sat, revenue growth and profitability. This session examines the challenges and approaches for collecting, organizing and deriving real-time insights from terabytes to petabytes of data, with examples from Salesforce.com, the nation’s leading enterprise cloud computing company.
As CTO of DoubleClick, we scaled to serve 400,000 ads/second. We developed and used many custom data stores long before “nosql” was a buzzword. Over the years, I’ve seen companies I’ve worked with struggle with both scalability and agility. Writing the first lines of MongoDB code in 2007, we drew upon these experiences building large scale, high availability, robust systems. We wanted MongoDB to be a new kind of database that tackled the challenges we were trying to solve at DoubleClick.
This session will focus on internet infrastructure scaling and also cover the history and philosophy of MongoDB.
22nd–23rd September 2011