Scaling Machine Learning Algorithms with Hadoop for Transactional and Clickstream Data

A session at Strata New York 2012

  • Abhijit Bose
  • Venkat Varadachary

Wednesday 24th October, 2012

5:00pm to 5:50pm (EST)

American Express has a closed loop network in which it is both an issuer of credit cards as well as an acquirer of merchants. It is uniquely positioned among the financial services companies to understand customer transaction behavior, and goals and aspirations of its merchants. By analyzing vast amount of data from our network, web properties and data from our partners, we can personalize our products and services for all of our customers including merchants. Through recent partnerships with social media companies, we have also started to offer many of these products and services through digital and social channels.

We have been using Hadoop and its ecosystem of tools for some of our routine machine learning tasks in generating recommendations, offer personalization and marketing studies. In this session, we will consider several commonly used algorithms such as k-means clustering, association rule mining, CART, etc. and discuss our experience with implementing them on a production Hadoop cluster from ground up. We will describe our experiments with different compression and serialization techniques for our unique datasets and some of the algorithm enhancements we have made. We will also share our experiences in implementing these algorithms in Hive, Pig and native Java, as we learned from our initial Hadoop deployment to its transition into a production environment. Attend this session if your organization is experimenting with Hadoop for similar machine learning tasks on transactional and Clickstream data.

About the speakers

This person is speaking at this event.
Abhijit Bose

Director and Sr. Data Scientist, American Express

This person is speaking at this event.
Venkat Varadachary

Head, Strategic Insights and Digital Analytics, American Express

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 5:00pm5:50pm EST

Date Wed 24th October 2012

Short URL


View the schedule


See something wrong?

Report an issue with this session