Dr. Mahout: Analyzing clinical data using scalable and distributed computing

A session at ApacheCon North America 2011

  • Shannon Quinn

Thursday 10th November, 2011

5:00pm to 5:50pm (PST)

Of the few realms cloud computing has not solidly taken root, one in which it has great potential is medicine. Clinicians generate massive amounts of data during the diagnostic process, the analysis of which, whether manual or computational, can take a great deal of time. For example, the rare genetic disease primary ciliary dyskinesia (PCD) affects the cilia on cells, causing them to behave erratically and leading to breathing problems at best, necessitating lung transplants at worst. Cutting-edge diagnostic tools capture the ciliary motions with high-speed video and use automated methods to quantitatively describe the motion patterns. These methods, however, are computing-intensive and would benefit from parallelization. Here we propose using the Mahout framework to efficiently learn models that capture the motion patterns observed in the videos and aiding in objective diagnoses. Additionally, Hadoop's storage system will allow us to construct and preserve libraries of these motion models in the cloud for later comparison. The library will be in constant flux as new patterns are added and existing patterns are retrained, requiring a scalable and distributed architecture to handle the data and integrate it into the existing library. Ultimately this framework will be a boon for clinicians: they need only take biopsies, gather data as images or videos, upload them to a Mahout/Hadoop cluster, and wait for the results. Patient privacy is maintained by perpetuating only the low-dimensional motion models, computational time is reduced by parallelizing the model learning and comparison process, and models are available to clinicians everywhere through the cloud.

About the speaker

This person is speaking at this event.
Shannon Quinn

Coverage of this session

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 5:00pm5:50pm PST

Date Thu 10th November 2011

Short URL


Official session page


View the schedule


See something wrong?

Report an issue with this session