Working with Dimensional data in Distributed Hash Tables

A session at Strange Loop 2010

Friday 15th October, 2010

2:00pm to 2:50pm (CST)

Recently a new class of database technologies has developed offering massively scalable distributed hash table functionality. Relative to more traditional relational database systems, these systems are simple to operate and capable of managing massive data sets. These characteristics come at a cost though: an impoverished query language that, in practice, can handle little more than exact-match lookups at scale.

This talk will explore the real world technical challenges we faced at SimpleGeo while building a web-scale spatial database on top of Apache Cassandra. Cassandra is a distributed database that falls into the broad category of second-generation systems described above. We chose Cassandra after carefully considering desirable database characteristics based on our prior experiences building large scale web applications. Cassandra offers operational simplicity, decentralized operations, no single points of failure, online load balancing and re-balancing, and linear horizontal scalability.

Unfortunately, Cassandra fell far short of providing the sort of sophisticated spatial queries we needed. We developed a short term solution that was good enough for most use cases, but far from optimal. Long term, our challenge was to bridge the gap without compromising any of the desirable qualities that led us to choose Cassandra in the first place.

The result is a robust general purpose mechanism for overlaying sophisticated data structures on top of distributed hash tables. By overlaying a spatial tree, for example, we’re able to durably persist massive amounts of spatial data and service complex nearest-neighbor and multidimensional range queries across billions of rows fast enough for an online consumer facing application. We continue to improve and evolve the system, but we’re eager to share what we’ve learned so far.

About the speaker

This person is speaking at this event.
Mike Malone

I make the stuff that makes the interwebs. Sometimes I do other stuff too. bio from Twitter

Coverage of this session

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 2:00pm2:50pm CST

Date Fri 15th October 2010

Short URL


Official event site


View the schedule



See something wrong?

Report an issue with this session