Petabyte-Scale Text Processing with Spark

A session at JEEConf 2016

At Grammarly, we have long used Amazon EMR with Hadoop and Pig in support of our big data processing needs. However, we were really excited about the improvements that the maturing Apache Spark offers over Hadoop and Pig, and so set about getting Spark to work with our petabyte text data set. This talk describes the challenges we had in the process and a scalable working setup of Spark that we have discovered as a result.

About the speaker

This person is speaking at this event.
Alex Slusarenko

Research Engineer at Grammarly bio from LinkedIn

Sign in to add slides, notes or videos to this session

JEEConf 2016

Ukraine Ukraine, Kiev

20th21st May 2016

Tell your friends!

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session