Wednesday 24th October, 2012
2:30pm to 3:10pm
Reproducibility is key in quantitative research. For data scientists, easily replicated results means greater confidence in correctness, less time investigating discrepancies, and improved robustness regardless of complexity. At the organizational level, processes that emphasizes the reproducibility criterion will improve robustness and help scale the depth and complexity of research.
In this presentation, I will discuss and demonstrate best practices for data scientists that emphasizes the goal of reproducibility. The principles presented are generally applicable, but the examples will focus on financial research in Python. I will present a case study in quantitative asset management to outline four main processes.
First, raw data should be properly stored and managed, with an emphasis on enabling the retrieval of historical time series data anchored by a particular observation date. Second, processed data and intermediate results should be persisted to enable a researcher to pinpoint the causes of discrepancies in final results. Third, code, trading signals, and model configurations should be version controlled so that various pieces of the computation can be rolled back to control the number of variables when comparing new and previous output. Fourth, regular testing of the entire research data and code stack helps catch and document bugs in code as well as changes to configuration. This aids in attributing changes in final output to either changes in underlying fundamentals, or changes in data, computations, and configurations.
The process of identifying the cause of discrepancies in results is a controlled experiment whose main inputs are raw and processed data, computational code, and configuration parameters for models and factors. Practitioners with the right tools and habits will be able to maximize the reproducibility of their results, easily diagnose differences, and ultimately be more productive and effective.
Sign in to add slides, notes or videos to this session