How Stata and a Makefile can make your day

By relying on scripts (do-files), Stata encourages you to work in a structured, efficient and reproducible way. This text-based approach is familiar and attractive to anyone who has ever used a unix shell and the standard utilities. Actually, unix-flavoured utilities can make your stata experience even better. One non-obvious candidate is make, which is usually used for programming projects that require some sort of compilation.

Consider the following scenario. You have two ascii files of raw data, micro.raw and macro.raw. You want to read in both files, correct some errors, convert them to stata’s .dta format, merge them, apply some recodes, do a lot of preliminary analyses, and finally produce a postscript file and a table for inclusion in LaTeX. Of course, you could write one large do-File that does the job. But then, changing a tiny detail of the final graph would require you to repeat the whole procedure, which is time-consuming (especially when you work with very large datasets). Moreover, you often want to perform some interactive checks at an intermediate stage. So breaking down the large job in a number of smaller jobs that are much easier to maintain and produce a set of intermediate files seems like a good idea.

However, now running these jobs in the correct order (and determining whether a job needs to be re-run at all) becomes crucial. And this is where make can help you. A Makefile simply contains information on dependencies amongst your files as well as instructions for getting from A to B. Here is an example for the above scenario

#your comments here

#Define new implicit rule: if you want to create dataset x.dta, run
%.dta :
tab wstata -e do $<

micro.dta : micro.raw

macro.dta: macro.raw

combined.dta: micro.dta macro.dta

final.dta: combined.dta

finalmodel.ster: final.dta
tab wstata -e do

tabandgraph: finalmodel.ster
tab wstata -e do

The first two lines after the comment define a new rule: Whenever you ask make to make a datset, it will run Stata in batch-mode with a do-file of the same name as input. The next line specifies that micro.dta depends on the raw data (micro.raw) as well as on the code that transforms micro.raw into micro.dta. If either file changes, make knows that micro.dta has to be recreated, otherwise, it can be left alone. The same applies to macro.dta, combined.dta and final.dta. So if you type (at the command prompt) “make final.dta” will (re-create) combined.dta, micro.dta, and macro.dta if and only if this is necessary (i.e. if either the scripts or the raw data have changed. The same goes for your tables and graphs. If you change but none of the other files, only will be re-executed. If, on the other hand, you correct an error in or get a new release of micro.raw, all the intermediate datasets and the saved estimates will be re-created in the correct order before is re-executed.

Two fine points: a) instructions for making something start with a tabstop and b) -e (instead of /E) is required because MSDOS-like options confuse make. If your project is complex, makefiles can save your day. If you work in a *nix environment, make and its documentation is most probably installed on your system. In a Windows environment, you will have to install either Cygwin or MinGW to get make. The Wikipedia article is an excellent starting point for learning the syntax of makefiles.

Technorati Tags: stata, data, make political science

Social Bookmarks:
How Stata and a Makefile can make your day 1How Stata and a Makefile can make your day 2How Stata and a Makefile can make your day 3How Stata and a Makefile can make your day 4How Stata and a Makefile can make your day 5How Stata and a Makefile can make your day 6How Stata and a Makefile can make your day 7How Stata and a Makefile can make your day 8How Stata and a Makefile can make your day 9How Stata and a Makefile can make your day 10How Stata and a Makefile can make your day 11How Stata and a Makefile can make your day 12

7 thoughts on “How Stata and a Makefile can make your day”

  1. One drawback is that Stata doesn’t give error codes to the shell. Thus the make process cannot stop when it fails. I use a wrapper script (on Linux) that scans for errors in the Stata output and posts status codes accordingly.


Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.