Building, Debugging, and Tuning Spark Machine Learning Pipelines

A session at Spark Summit 2015

Monday 15th June, 2015

3:00pm to 3:30pm

Machine Learning workflows are often complex. This talk discusses Pipelines, which were introduced in Spark 1.2 and 1.3 to facilitate ML development. We will cover the basic concepts, usage examples, a few implementation details, and plans for the future. Key takeaways: (1) Motivation: ML workflows are complex, and Pipelines simplify constructing such workflows. (2) Concepts: Pipelines are sequences of ML algorithms which transform datasets. (3) Datasets: Pipelines use DataFrames as ML datasets, so they support diverse types. (4) Usage: This talk will give examples of usage and the API.

About the speaker

This person is speaking at this event.
Joseph Bradley

Software Engineer at Databricks bio from LinkedIn

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 3:00pm3:30pm PST

Date Mon 15th June 2015

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session