Use of Spark MLlib for Predicting the Offlining of Digital Media

A session at Spark Summit 2015

Monday 15th June, 2015

1:00pm to 1:30pm

Our file cluster stores hundreds of terabytes of media files for international cable TV distribution. Effective management of this online resource is necessary to support distribution to our international clients. Thus, we sought to develop a machine learning system that could learn from a combination of factors (eg. file age, future schedule, days since last airing, etc) to predict whether a file is likely to be unused in the future and therefore can be taken off line. In the development of this system, several methods were investigated before settling on Spark MLlib's Support Vector Machines as the best method due to it's accuracy and robustness. The system has been tested in production for a couple of months and the results are positive, and therefore plans are to move it into full production usage later this year.

About the speaker

This person is speaking at this event.
Christopher Burdorf

Software Engineer at NBCUniversal, Inc. bio from LinkedIn

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 1:00pm1:30pm PST

Date Mon 15th June 2015

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session