Big Data for the Masses: How We Opened Up the Doors to Google's Dremel

A session at Strata New York 2012

  • Jim Caputo

Wednesday 24th October, 2012

10:50am to 11:30am (EST)

60 hours of videos are uploaded to YouTube every minute. The Google search index contained 100 Million Gigabytes of data in 2010. Other Google services have hundreds of millions of users. Each of these products generates massive amounts of data. Google has developed custom technologies to analyze this data and make intelligent product decisions.

Dremel is a scalable, interactive ad-hoc query system. By combining multi-level execution trees and columnar data layout, Dremel allows users to run queries in a SQL-like language over tables with billions of rows in seconds. Dremel uses an architecture distinct from MapReduce-based platforms to improve efficiency when running multiple simultaneous query jobs. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google querying web logs, ad analytics and financial data.

Google’s situation is no longer unique. As more and more companies collect massive amounts of data, they need to quickly analyze it without large investments in infrastructure or human capital. We want everyone to have the power of Dremel.

BigQuery puts the powerful interactive querying capabilities of Dremel into the hands of users everywhere. It is designed for accessibility and ease of use, featuring a REST API as well as a web-based interface. BigQuery enables users to ingest 1 TB of data and run hundreds of queries on it with a SQL-like language in less than an hour.

This session will discuss the development and capabilities of Dremel, in particular its performance characteristics and ability to enable interactive ad-hoc querying on a multi-tenant architecture. We’ll also dive into the design challenges necessary to make the Dremel technology accessible and performant for third-party developers and business users to work with massive data sets.

About the speakers

This person is speaking at this event.
Jim Caputo

Tech Lead Manager, Cloud, Google, Inc.

This person is speaking at this event.
Michael Manoochehri

Xoogler, traveler, writer, data hacker. Author of Data Just Right. My goal is to help make computing more ubiquitous, affordable, and accessible. bio from Twitter

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 10:50am11:30am EST

Date Wed 24th October 2012

Short URL


View the schedule


See something wrong?

Report an issue with this session