I will introduce the theory and goals of clustering algorithms. The literature in statistical analysis is made up of dense mathematical equations; so I will translate equations into pseudocode to make the topic more accessible to programmers.
I will expand on the theoretical discussing by demonstrating a simple example of a clustering problem: how to group volcanos in Alaska by geographical proximity. I will move on to algorithms with real-world applications, such as how to group users with similar tastes given a database of user ratings.
I may touch on more advanced techniques to improve the accuracy of resulting clusters. I will also discuss current limitations of statistical analysis. As an example, Netflix' ongoing competition for an algorithm that can predict whether or not a user will like the movie Napolean Dynamite.
17th–19th June 2009