Do your 500 "friends" on social networks really know what you will like? How many of your friends' shared links that you click each day are interesting to you? The social graph brings trust and meaning to the web, but often creates information overload from over-sharing. And because real-time updates and feeds emphasize recency over relevance, rare gems often fall through the cracks. This talk will discuss the issues and considerations when designing a personalized discovery engine, one that combines the social, peer and taste graphs to produce relevant, peer-sourced recommendations and serendipitous discovery of new online content. StumbleUpon CEO Garrett Camp will go over the concepts and mechanisms behind such recommendation systems, and highlight findings from analysis of StumbleUpon's database of over 15 billion personalized stumbles.
As more people use the internet to share statuses, tweets, links, and other content the task of separating the wheat from the chaff is quickly becoming more and more important. Luckily, there are a number of approaches to finding the most interesting content in use across the internet, both by analyzing content itself and by giving users themselves the tools to identify what is good.
Our panel will explore the details of how sites we use everyday have attempted to solve this problem. We’ll talk about voting systems where democracy works on a smaller scale, social systems that try to figure out who you care about or whose style you share, content analysis approaches that try to show you things based on your explicit or implicit set of interests, and other interesting algorithms for scoring and ranking content. We’ll also talk about implementation, touching on scaling distributed databases, training Machine Learning models, etc.
We’ll talk about some common issues across these systems. Something as simple as counting votes can actually turn into a long lesson in statistics. And there are other factors our algorithms must balance, including making sure we show recent stuff vs. the overall best, ensuring new content gets a fair chance to prove itself, and keeping the a site simple with all this complexity happening behind the scenes. Finally, we’ll talk about how algorithms that control content distribution end up being big targets for gaming and abuse.
11th–15th March 2011