Scaling Machine Learning in Python

A session at PyData Silicon Valley 2013

Tuesday 19th March, 2013

3:20pm to 4:10pm (PST)

In this talk we will introduce the typical predictive modeling tasks on "not-so-big-data-but-not- quite-small-either" that benefit from distributed the work on several cores or nodes in a small cluster (e.g. 20 * 8 cores).

We will talk about cross validation, grid search, ensemble learning, model averaging, numpy memory mapping, Hadoop or Disco MapReduce, MPI AllReduce and disk & memory locality.

We will also feature some quick demos using scikit-learn and IPython.parallel from the notebook on an spot-instance EC2 cluster managed by StarCluster.

About the speaker

This person is speaking at this event.
Olivier Grisel

Datageek, engineer @Parietal_INRIA, contributor to scikit-learn. I like Python, NumPy, Spark & interested in Machine Learning, NLProc, {Big|Linked|Open} Data. bio from Twitter

Coverage of this session

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 3:20pm4:10pm PST

Date Tue 19th March 2013

Short URL


Official event site


View the schedule



See something wrong?

Report an issue with this session