Tuesday 16th June, 2015
3:30pm to 4:00pm
Recommendation systems are among the most popular applications of machine learning. MLlib implements alternating least squares (ALS) for collaborative filtering, a very popular algorithm for making recommendations. We utilize Spark's in-memory caching and a special partitioning strategy to make ALS efficient and scalable. MLlib's ALS runs 10x faster than Apache Mahout's implementation and it scales up to billions of ratings. In this talk, we present a more scalable implementation of ALS with scalability results on 100 billion ratings. It is based on the issues we experienced with the old implementation. We will review the ALS algorithm, and describe the internal data storage we used in the new implementation as well as techniques used to accelerate the computation and to improve JVM efficiency. We will also discuss the next steps for recommendation algorithms in MLlib.
Sign in to add slides, notes or videos to this session