kai arzheimer

How Stata and a Makefile can make your day

By relying on scripts (do-files), Stata encourages you to work in a structured, efficient and reproducible way. This text-based approach is familiar and attractive to anyone who has ever used a unix shell and the standard utilities. Actually, unix-flavoured utilities can make your stata experience even better. One non-obvious candidate is make, which is usually used for programming projects that require some sort of compilation.

Consider the following scenario. You have two ascii files of raw data, micro.raw and macro.raw. You want to read in both files, correct some errors, convert them to stata’s .dta format, merge them, apply some recodes, do a lot of preliminary analyses, and finally produce a postscript file and a table for inclusion in LaTeX. Of course, you could write one large do-File that does the job. But then, changing a tiny detail of the final graph would require you to repeat the whole procedure, which is time-consuming (especially when you work with very large datasets). Moreover, you often want to perform some interactive checks at an intermediate stage. So breaking down the large job in a number of smaller jobs that are much easier to maintain and produce a set of intermediate files seems like a good idea.

However, now running these jobs in the correct order (and determining whether a job needs to be re-run at all) becomes crucial. And this is where make can help you. A Makefile simply contains information on dependencies amongst your files as well as instructions for getting from A to B. Here is an example for the above scenario


#your comments here

#Define new implicit rule: if you want to create dataset x.dta, run x.do
%.dta : %.do
tab wstata -e do $<

micro.dta : micro.raw micro.do

macro.dta: macro.raw macro.do

combined.dta: micro.dta macro.dta combined.do

final.dta: combined.dta final.do

finalmodel.ster: finalanalysis.do final.dta
tab wstata -e do finalanalysis.do

tabandgraph: finalmodel.ster graphtab.do
tab wstata -e do graphtab.do

The first two lines after the comment define a new rule: Whenever you ask make to make a datset, it will run Stata in batch-mode with a do-file of the same name as input. The next line specifies that micro.dta depends on the raw data (micro.raw) as well as on the code that transforms micro.raw into micro.dta. If either file changes, make knows that micro.dta has to be recreated, otherwise, it can be left alone. The same applies to macro.dta, combined.dta and final.dta. So if you type (at the command prompt) “make final.dta” will (re-create) combined.dta, micro.dta, and macro.dta if and only if this is necessary (i.e. if either the scripts or the raw data have changed. The same goes for your tables and graphs. If you change graphtab.do but none of the other files, only graphtab.do will be re-executed. If, on the other hand, you correct an error in macro.do or get a new release of micro.raw, all the intermediate datasets and the saved estimates will be re-created in the correct order before graphtab.do is re-executed.

Two fine points: a) instructions for making something start with a tabstop and b) -e (instead of /E) is required because MSDOS-like options confuse make. If your project is complex, makefiles can save your day. If you work in a *nix environment, make and its documentation is most probably installed on your system. In a Windows environment, you will have to install either Cygwin or MinGW to get make. The Wikipedia article is an excellent starting point for learning the syntax of makefiles.

Technorati Tags: stata, data, make political science

Social Bookmarks:
Exit mobile version