The Genome inside each cell works like a massively parallel computer. Some proteins called Transcription Factors (TF) attach into specific regions called "promoters". This attachment starts a complex process that can have different outcomes. One of the possible outcomes is the creation of another TF that will in turn attach to some promoter(s) creating a cascade of events. TFs are like functions that have side effects, call other TFs and also can call themselves recursively. In this talk, I will describe a machine learning technique that attempts to reverse engineer the Genome. To achieve this tricky task, you need versatile tools. First of all, Clojure plays an instrumental role in the development of visualizations and data processing pipelines. Clojure makes it really easy to filter, visualize, and synthesize many gigabytes of data. In addition, similarity search is used extensively to find patterns in a huge set of possibilities. I hope to convince you here that similarity search is the next "NoSQL" and that Clojure is an ideal tool for data science projects.
10th–12th November 2011