Tuesday 16th June, 2015
5:45pm to 6:00pm
Apache Sqoop has been primarily used for transfer of data between relational databases to HDFS using the Hadoop Mapreduce engine. Recently we have redesigned the system to be more generic and allow data transfer between any two data sources, for instance it is possible to use the latest Apache Sqoop to transfer data from Apache Kafka to MySQL and vice versa. This talk will focus on how Apache Sqoop project has now evolved to support ingestion between any two data sources with the Apache Spark engine's parallelism, speed and reliability under the hoods. We will do a demo of how sqoop connectors can be written to take advantage of this and chain data ingestion and data processing. We will talk about the future work in this area to integrate Apache Sqoop with near-real time and streaming ingestion.
one too many traits to survive, one too many balls to juggle in the air and I like it this way bio from Twitter
Sign in to add slides, notes or videos to this session