Economists utilize a data analysis toolkit and intuition that can be very helpful to Data Scientists. In particular, econometric methods are quite useful in disentangling correlation and causation, a use case not well-handled by standard machine learning and statistical techniques. This session will cover examples of econometric methods in action, as well as other economics-related insights. Think of it as a crash-course in basic econometric intuition that one receives during a PhD in Economics (I received my PhD from Stanford in 2008).
Why econometrics? The difference between econometrics and statistics is that statistical modeling is more concerned with fit, and econometric modeling is more concerned with properly estimating the coefficients in a regression. Getting the “right” (consistent & unbiased) estimates means that the analyst can more effectively measure how a change in one variable can strongly predict (or cause) a change in the dependent variable. These techniques can help solve problems in social/web data that previously were only solvable using future data collection from randomized multivariate experiments.
To do this, the analyst first develops an intuition for whether or not there is a source of “endogeneity” in the regression. This largely is determined by the relationship between the predictors and the error term in the regression. Once the source of the endogeneity is understood, econometric techniques like fixed/random effects and instrumental variables can be quite useful. The type of data that is collected and available is key to the extent to which the power of these techniques can be used. [I might also go into some other techniques, but these are the most useful]
The methods will be presented in a way so that a non-technical person can understand the basic intuition, and also so that a practitioner can apply the methods in the future. Examples will be provided. For panel data econometrics, we will discuss the example of how to identify actions taken early on by a LinkedIn member that are predictive of their future engagement with the product, a problem that is difficult due to the confounding of correlation and causation. For instrumental variables techniques, we will discuss how to use random variation in the weather to say cool things about politics, economics, and web usage.
In addition to the discussion of applied econometric techniques, there may also be time for economics-related data insights. Currently we are developing unemployment rate prediction models using time-series econometrics as well as indexes to measure changes in the supply/demand for talent across regions and industries.
22nd–23rd September 2011