Information technology has been meeting disaster head on with new software, crowdsourcing inputs, and mapping tools gaining incredible potential since the Haiti earthquake. How big data really fits into benefiting disaster response both from a humanitarian relief and business continuity side has yet to mature. I will discuss needs (filtering, interfaces, real-time data processing) specifically for the unique sociological and extreme environment constraints in professional disaster response, and untapped potential for business continuity.
by Robert Munro
Pandemics are the greatest current threat to humanity. Many unidentified pathogens are already hiding out in the open, reported in local online media as sudden clusters of ‘influenza-like’ or ‘pneumonia-like’ clinical cases many months or even years before careful lab tests confirm a new microbial scourge. For some current epidemics like HIV, SARS, and H1N1, the microbial enemies were anonymously in our midst for decades. With each new infection, viruses and bacteria mutate and evolve into ever more harmful strains, and so we are in a race to identify and isolate new pathogens as quickly as possible.
Until now, no organization has succeeded in the task of tracking every global outbreak and epidemic. The necessary information is spread across too many locations, languages and formats: a field report in Spanish, a news article in Chinese, an email in Arabic, a text-message in Swahili. Even among open data, simple key-word or white-list based searches tend to fall short as they are unable to separate the signal (an outbreak of influenza) from the noise (a new flu remedy). In a project called EpidemicIQ, the Global Viral Forecasting Initiative has taken on the challenge of tracking all outbreaks. We are complementing existing field surveillance efforts in 23 countries with a new initiative that leverages large-scale processing of outbreak reports across a myriad of formats, utilizing machine learning, natural language processing and microtasking coupled with advanced epidemiological analysis.
EpidemicIQ intelligently mines open web-based reports, social media, transportation networks and direct reports from healthcare providers globally. Machine-learning and natural language processing allows us to track epidemic-related information across several orders of magnitude more data than any prior health efforts, even across languages that we do not ourselves speak. By leveraging a scalable workforce of microtaskers we are able to quickly adapt our machine-learning models to new sources, languages and even diseases of unknown origin. During peak times, the use of a scalable microtasking workforce also takes much of the information processing burden off the professional epidemic intelligence officers and field scientists, allowing them to apply their full domain knowledge when needed most.
At Strata, we propose to introduce EpidemicIQ’s architecture, strategies, successes and challenges in big-data to date.
Many enterprises are being overwhelmed by the proliferation of machine data. Websites, communications, networking and complex IT infrastructures constantly generate massive streams of data in highly variable and unpredictable formats that are difficult to process and analyze by traditional methods or in a timely manner. Yet this data holds a definitive record of all activity and behavior, including user transactions, customer behavior, system behavior, security threats and fraudulent activity. Quickly understanding and using this data can provide added value to a companies services, customer sat, revenue growth and profitability. This session examines the challenges and approaches for collecting, organizing and deriving real-time insights from terabytes to petabytes of data, with examples from Salesforce.com, the nation’s leading enterprise cloud computing company.
22nd–23rd September 2011