Overview
Our workflow for joint projects draws extensively from Hunt Allcott's lab wiki and the Gentzkow and Shapiro RA manual on GitHub. In some places we simply link to their manuals or even quote them directly. This workflow is closer to Hunt's in that it is stripped-down relative to the full Gentzkow and Shapiro workflow. The simpler workflow conveys the benefit of being accessible to a broader range of collaborators who cannot pay the fixed costs of using the full system. For example, we do not use SCons, as SCons requires additional coding and file maintenance that may not be necessary except on larger projects. An excellent overview of the Gentzkow-Shapiro RA manual is the PDF Code and Data for the Social Sciences: A Practitioner's Guide. While outdated on some of the specifics (e.g., SVN vs GitHub), this overview is a worthwhile read on principles for managing code and data.
The workflow has three core principles:
- One-stroke production. The entire project, from initial data to all final results, tables, and figures, can be run from one command, typically GitHub/ProjectName/MakePaper.sh. This means that all intermediate steps (e.g. importing and analyzing data, taking a csv output table and inserting into the paper, compiling the paper) are fully automated. This prevents us from, for example, changing a data prep routine but forgetting to update all the results.
- Coding for replication. At the end of the project, we will publicly post all code and data that we are legally able to post.
- Unambiguous processes. One the concept for the project has been decided on, any new collaborator should be able to jump in and continue where a previous person had left off. We keep task management (GitHub Issues) up-to-date. We have no half-finished or legacy files in the folders.