Tuesday 19th March, 2013
3:20pm to 4:10pm
In this talk we will introduce the typical predictive modeling tasks on "not-so-big-data-but-not- quite-small-either" that benefit from distributed the work on several cores or nodes in a small cluster (e.g. 20 * 8 cores).
We will talk about cross validation, grid search, ensemble learning, model averaging, numpy memory mapping, Hadoop or Disco MapReduce, MPI AllReduce and disk & memory locality.
We will also feature some quick demos using scikit-learn and IPython.parallel from the notebook on an spot-instance EC2 cluster managed by StarCluster.
Datageek, engineer @Parietal_INRIA, contributor to scikit-learn. I like Python, NumPy, Spark & interested in Machine Learning, NLProc, {Big|Linked|Open} Data. bio from Twitter
Sign in to add slides, notes or videos to this session