An introduction to machine learning on small scale datasets (PyData)

A session at PyCon Ireland 2015

Saturday 24th October, 2015

2:30pm to 3:00pm (DMT)

Buy Tickets Here

An introduction to machine learning on small scale datasets – identifying Irish farmers who plant forests on their farms.

The purpose of this talk is to illustrate the differences between explanatory modelling (classical statistics) and predictive modelling (machine learning) as these two approaches are often conflated. The scikit-learn machine learning library was used to classify Irish farmers who planted forests on their land. The dataset was relatively small providing data on 799 Irish farmers and approximately 135 different variables. Prior to classifying farmers, irrelevant and redundant variables were removed from the dataset using a feature wrapper technique which improves the predictive power of models. This illustrates the power of machine learning for inductive analysis by uncovering previously unknown relationships between variables (features). As the Ipython notebooks were computationally demanding the final code was run on gaia, a high performance computer within UCD using runipy. Earlier versions of the Ipython notebooks were run on Amazon EC2 using StarCluster which makes high performance computing available to the general public at reasonable cost.

About the speaker

This person is speaking at this event.
Conor Lynch

PhD Fellow at Earth Institute UCD bio from LinkedIn

I started my professional career as a civil engineer. I completed a MSc. in applied geographical information systems (GIS) at Kingston University London and worked with Mallon Technology Ltd. I am just finishing a PhD with UCD that combines behavioural economics and machine learning techniques to identify Irish farmers who plant forests. My main interest is using python code and high performance computers to investigate complex patterns in human behaviour that are driven by a multitude of factors.

Next session in Goldsmith 3

3pm Analysing user behaviour - from histograms to random forests (PyData) by David Brodigan

13 attendees

  • Andrew McCarthy
  • Sri Harsha
  • Brian Ward
  • Conor Lynch
  • Felipe Guth
  • Barry Kennedy
  • Naomi Ceder
  • Nicolas Laurance
  • Orestes Gonzalo Manzanilla-Salazar
  • Peter Mulholland
  • Sorcha (Nic Amhalaí) Bowler
  • Maciek S.
  • Xevi

Sign in to add slides, notes or videos to this session

Sign in to track this session

PyCon Ireland 2015

Ireland Ireland, Dublin

24th25th October 2015

Tell your friends!


Time 2:30pm3:00pm DMT

Date Sat 24th October 2015


Goldsmith 3, Radisson Blu Royal Hotel

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session