Deconstructing Recommendations on Spark

A session at Spark Summit 2015

Monday 15th June, 2015

4:30pm to 5:00pm

This talk will focus on the practical details of building a recommendation engine on top of Spark's ML Lib ALS collaborative-filtering algorithm that can reliably generate predictions for 25 million users from a space of 5 million products. The unique aspect of this work is two-fold. First, we are able to generate scores for every combination of user and product (125 trillion possible values) on a small 6-node cluster. Secondly, clever optimization provides several orders of magnitude improvement over ML Lib's predictive step with linear performance scaling as more cores are added to the system. The primary goal is to present the optimizations and parameter tuning necessary to achieve these gains coupled with a discussion of the Spark internals that come into play. The talk will be tailored for the intermediate Spark developer who wishes to understand the trickier aspects of Spark and how these affect both stability and performance.

About the speaker

This person is speaking at this event.
Ilya Ganelin

Senior Data Engineer at Capital One Labs bio from LinkedIn

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 4:30pm5:00pm PST

Date Mon 15th June 2015

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session