Querying Big Data Rapidly and Robustly with Cascalog

A session at Strange Loop 2010

Cascalog is a tool for querying data on Hadoop with Clojure in a concise, expressive, and highly readable manner. Cascalog combines two cutting edge technologies in Clojure and Hadoop and resurrects an old one in Datalog. Cascalog is high performance, flexible, and robust.

Most query languages, like SQL, Pig, and Hive, are custom languages -- and this leads to huge amounts of accidental complexity. Constructing queries dynamically by doing string manipulation is haphazard and leads to further complexity such as SQL injection attacks. The nature of Cascalog being a domain specific language in Clojure avoids these accidental complexities and allows a programmer to manipulate queries as first-class entities within the language. The Datalog syntax of Cascalog is simpler and more expressive than SQL-based languages.

Besides being a valuable tool in itself, Cascalog is a demonstration of the power of the Clojure programming language. Building an integrated query language like Cascalog is just not possible in any other language.

This talk will include a live demo of Cascalog.

About the speaker

This person is speaking at this event.
Nathan Marz

Twitter engineer. Author of Storm and Cascalog. Writing the upcoming book Big Data http://manning.com/marz/ bio from Twitter

Coverage of this session

Sign in to add slides, notes or videos to this session

Tell your friends!

Short URL

lanyrd.com/scccw

Official event site

strangeloop2010.com

View the schedule

Share

Topics

See something wrong?

Report an issue with this session