•  

The Two Most Important Algorithms in Predictive Modeling Today

A session at Strata 2012

  • Mike Bowles

Tuesday 28th February, 2012

1:30pm to 5:00pm (PST)

When doing predictive modelling, there are two situations in which you might find yourself:

You need to fit a well-defined parameterised model to your data, so you require a learning algorithm which can find those parameters on a large data set without over-fitting
You just need a “black box” which can predict your dependent variable as accurately as possible, so you need a learning algorithm which can automatically identify the structure, interactions, and relationships in the data
For case (1), lasso and elastic-net regularized generalized linear models are a set of modern algorithms which meet all these needs. They are fast, work on huge data sets, and avoid over-fitting automatically. They are available in the “glmnet” package in R.

For case (2), ensembles of decision trees (often known as “Random Forests”) have been the most successful general-purpose algorithm in modern times. For instance, most Kaggle competitions have at least one top entry that heavily uses this approach. This algorithm is very simple to understand, and is fast and easy to apply. It is available in the “randomForest” package in R.

Mike and Jeremy will explain in simple terms, using no complex math, how these algorithms work, and will also explain using numerous examples how to apply them using R. They will also provide advice on how to select from these algorithms, and will show how to prepare the data, and how to use the trained models in practice.

About the speakers

This person is speaking at this event.
Mike Bowles

Sole Proprietor

This person is speaking at this event.
Jeremy Howard

Kaggle

Sign in to add slides, notes or videos to this session

Strata 2012

United States United States, Santa Clara

28th February to 1st March 2012

Tell your friends!

When

Time 1:30pm5:00pm PST

Date Tue 28th February 2012

Where

Ballroom CD, Santa Clara Convention Center

Short URL

lanyrd.com/smmtq

View the schedule

Share

Topics

See something wrong?

Report an issue with this session