by Jay Kreps
The last few years have brought a wealth of new data technologies organized around horizontal scalability. This talk will cover the essential infrastructure areas: real-time stream processing, offline data crunching, large-scale data deployments and live serving. The focus will be on how these ingredients come together to enable innovative data-driven products at LinkedIn.
by Tom Wilkie
The standard Linux storage stack wasn't designed for write-heavy big data workloads, nor is it well-suited to modern hardware: large, slow SATA disks, SSDs or many cores. Castle, an open-source project, is a ground-up overhauling of RAID, file systems, and the POSIX interface.
Synthetic biology is a new field where basic biological components can be engineered to create something new. It often involves DNA synthesizers, ligation, promoters, and polymerase chain reaction -- which may or may not be safe for your in silico environment. However, as the size and complexity of the systems increase, tools become more and more important, thus CAD for biology has emerged.
Time Series sensors are being ubiquitously integrated in places like cell phones, environmental sensors, and the smart grid. As we scale out this type of data RDBMS systems strain to scale with the high insertion rates and real time query requirements. In this talk we introduce “Lumberyard” which is a scalable indexing and low latency fuzzy pattern searching time series data.
Location-based services are hot, but geographic datasets are complex. But this shouldn’t put you off writing awesome location-aware services. This talk will show how to create spatial models and query the Open Street Map dataset together with social data using the Neo4j graph database.
A talk about how scaling foursquare using MongoDB and Scala.
by Lindsay Snider and Andy Blyler
Solr, an open source enterprise search server, scales very well within an index (vertical scaling). It is when you have multiple indexes (horizontal scaling) that it starts to get hairy, which happens a lot when you are hosting a cloud based solution for multiple users. In this session we will discuss these issue as well as the techniques of how to overcome them in-depth.