Parquet: open columnar storage for Hadoop

A session at DEVIEW 2013

Monday 14th October, 2013

2:00pm to 2:45pm (KST)

We would like to introduce Parquet, a columnar file format for Hadoop.
Performance and compression benefits of using columnar storage formats for storing and processing large amounts of data are well documented in academic literature as well as several commercial analytical databases.
Parquet supports deeply nested structures, efficient encoding and column compression schemes, and is designed to be compatible with a variety of higher-level type systems. It is available as a standalone library, allowing any Hadoop framework or tool to build support for it with minimal dependencies. As of this release, Parquet is supported by Apache Pig, plain Hadoop Map-Reduce, and Cloudera’s Impala, and is being put into production at Twitter.
We will discuss Parquet’s design and share performance numbers.

About the speaker

This person is speaking at this event.
Julien Le Dem

@DremioHQ Architect, formerly tech lead at @Twitter, @Yahoo, Kelkoo, @ApacheParquet co-author and VP, Apache Pig and @ApacheArrow PMC, Kitesurfer, parent

Sign in to add slides, notes or videos to this session


South Korea South Korea, Seoul

14th15th October 2013

Tell your friends!


Time 2:00pm2:45pm KST

Date Mon 14th October 2013

Short URL


Official session page


View the schedule


See something wrong?

Report an issue with this session