Sessions at Strata 2012 about Machine Learning and Random Forests and Algorithms on Tuesday 28th February

Your current filters are…

Clear
  • The Two Most Important Algorithms in Predictive Modeling Today

    by Mike Bowles and Jeremy Howard

    When doing predictive modelling, there are two situations in which you might find yourself:

    You need to fit a well-defined parameterised model to your data, so you require a learning algorithm which can find those parameters on a large data set without over-fitting
    You just need a “black box” which can predict your dependent variable as accurately as possible, so you need a learning algorithm which can automatically identify the structure, interactions, and relationships in the data
    For case (1), lasso and elastic-net regularized generalized linear models are a set of modern algorithms which meet all these needs. They are fast, work on huge data sets, and avoid over-fitting automatically. They are available in the “glmnet” package in R.

    For case (2), ensembles of decision trees (often known as “Random Forests”) have been the most successful general-purpose algorithm in modern times. For instance, most Kaggle competitions have at least one top entry that heavily uses this approach. This algorithm is very simple to understand, and is fast and easy to apply. It is available in the “randomForest” package in R.

    Mike and Jeremy will explain in simple terms, using no complex math, how these algorithms work, and will also explain using numerous examples how to apply them using R. They will also provide advice on how to select from these algorithms, and will show how to prepare the data, and how to use the trained models in practice.

    At 1:30pm to 5:00pm, Tuesday 28th February

    In Ballroom CD, Santa Clara Convention Center