For more than a decade, people have imagined a future where the sequencing of a person's DNA would be as routine a medical practice as a visit to the doctor. We now stand on the cusp of this future, but the volume and complexity of the data exceed our ability to interpret it. Within this challenge lies a major opportunity for software to make a difference in the future of medicine.
In his keynote at OSCON Data 2011, Steve Yegge lamented the fact that so many talented software engineers were working on “ways for people to share cat pictures” instead of the hard problems society faces like the understanding of the human genome. In light of his sentiment, this talk is meant to be a timely introduction for software engineers, data scientists, and technologists to genome sequencing. I say “timely” because at current rates, we are tantalizingly close to the inflection point where it will become more cost-effective to sequence a person’s entire genome instead of using traditional (targeted) genetic testing. Concurrently, in the United States, there is a strong public policy effort driving the adoption of electronic medical records. This confluence of events means that medicine will be facing an unprecedented data deluge shortly and represents a major opportunity for data scientists and software developers to work on interesting problems with a major impact on society.
Our group of software developers, analysts, and clinicians at The Children’s Hospital of Philadelphia, in part through a grant from the National Human Genome Research Institute, is researching methods of integrating genomic sequence data with patient care. In this talk, attendees will get a gentle introduction to the topic of genomic sequencing with an overview of just what it means to “sequence” a genome, what information can be obtained, and the ways findings will be used by physicians and patients. I will also discuss the computational challenges associated with genome sequencing including the massive storage requirements, algorithmic approaches used for sequence data analysis and the human computer interface issues associated with displaying this very complex data to busy physicians in a clinical setting. Finally, I will discuss the ways open source efforts can and are contributing to this effort, which offers a major potential inroad to an industry that has traditionally been dominated by commercial enterprise solutions.
16th–20th July 2012