In the context of building predictive models, predictability is usually considered a blessing. After all – that is the goal: build the model that has the highest predictive performance. The rise of ‘big data’ has in fact vastly improved our ability to predict human behavior thanks to the introduction of much more informative features. However, in practice things are more differentiated than that. For many applications, the relevant outcome is observed for possibly very different reasons. In such mixed scenarios, the model will automatically gravitate to the one, that is easiest to predict at the expense of the others. This even holds if the predictable scenario is by far less common or relevant. We present a number of such scenarios: clicks on ads being performed ‘intentionally’ vs. ‘accidentally’, online forms being filled out by people or fraudulent bots, and finally consumers visiting store locations vs. their phones pretending to be there. The combination of different and highly informative features can have significantly negative overall impact on the usefulness of predictive modeling.
Sign in to add slides, notes or videos to this session