Flyby: Improved dense matrix multiplication

A session at Spark Summit 2015

Monday 15th June, 2015

5:00pm to 5:30pm

At Thomson Reuters Reseach and Development, we have found several business tasks with the same computational dependencies as distributed matrix multiplication, such as computing user-product recommendations and interpretating LDA topics. While matrix multiplation is an elementary distributed algorithm, Spark is inefficient on it. This talk proposes a formalism to express these kinds of tasks and demonstrates gains from optimizing it. In naive dense matrix multiplication, a cartesian or join generates n**3 data, which are shuffled and reduced to n**2 data. The shuffle is expensive; morever, the partitioning of the reduction is known ahead of time becaue of the key assignment pattern, and this can be used to optimize the procedure. We propose a method called "flyby," which allows the reduction to be localized and avoids the shuffle. I well introduce flyby and provide timing comparisons on business tasks and synthetic problems.

About the speaker

This person is speaking at this event.
Thomas Vacek

Research Scientist at Thomson Reuters bio from LinkedIn

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 5:00pm5:30pm PST

Date Mon 15th June 2015

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session