Sqoop on Spark: Data Ingestion Reimagined

A session at Spark Summit 2015

Tuesday 16th June, 2015

5:45pm to 6:00pm

Apache Sqoop has been primarily used for transfer of data between relational databases to HDFS using the Hadoop Mapreduce engine. Recently we have redesigned the system to be more generic and allow data transfer between any two data sources, for instance it is possible to use the latest Apache Sqoop to transfer data from Apache Kafka to MySQL and vice versa. This talk will focus on how Apache Sqoop project has now evolved to support ingestion between any two data sources with the Apache Spark engine's parallelism, speed and reliability under the hoods. We will do a demo of how sqoop connectors can be written to take advantage of this and chain data ingestion and data processing. We will talk about the future work in this area to integrate Apache Sqoop with near-real time and streaming ingestion.

About the speakers

This person is speaking at this event.
Vinoth Chandar

Distributed Data Systems, Dreamer, Retweeter bio from Twitter

This person is speaking at this event.
Veena Basavaraj

one too many traits to survive, one too many balls to juggle in the air and I like it this way bio from Twitter

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 5:45pm6:00pm PST

Date Tue 16th June 2015

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session