Best Practices for Reproducible Research: A Case Study in Quantitive Finance

A session at Strata New York 2012

Wednesday 24th October, 2012

2:30pm to 3:10pm (EST)

Reproducibility is key in quantitative research. For data scientists, easily replicated results means greater confidence in correctness, less time investigating discrepancies, and improved robustness regardless of complexity. At the organizational level, processes that emphasizes the reproducibility criterion will improve robustness and help scale the depth and complexity of research.

In this presentation, I will discuss and demonstrate best practices for data scientists that emphasizes the goal of reproducibility. The principles presented are generally applicable, but the examples will focus on financial research in Python. I will present a case study in quantitative asset management to outline four main processes.

First, raw data should be properly stored and managed, with an emphasis on enabling the retrieval of historical time series data anchored by a particular observation date. Second, processed data and intermediate results should be persisted to enable a researcher to pinpoint the causes of discrepancies in final results. Third, code, trading signals, and model configurations should be version controlled so that various pieces of the computation can be rolled back to control the number of variables when comparing new and previous output. Fourth, regular testing of the entire research data and code stack helps catch and document bugs in code as well as changes to configuration. This aids in attributing changes in final output to either changes in underlying fundamentals, or changes in data, computations, and configurations.

The process of identifying the cause of discrepancies in results is a controlled experiment whose main inputs are raw and processed data, computational code, and configuration parameters for models and factors. Practitioners with the right tools and habits will be able to maximize the reproducibility of their results, easily diagnose differences, and ultimately be more productive and effective.

About the speaker

This person is speaking at this event.
Chang She

Ex-Quant Trader, Python/pandas developer bio from Twitter

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 2:30pm3:10pm EST

Date Wed 24th October 2012

Short URL


View the schedule


See something wrong?

Report an issue with this session