Nov 102013

Sometimes, things take a teeny-weeny bit longer. Thanks to a student asking for it, I realised that I had failed to deposit replication data for Liz Carter’s and my 2006 EJPR piece on the POS/Extreme Right nexus. This has been rectified.

Arzheimer, Kai; Carter, Elisabeth, 2013, “Replication data for: Political Opportunity Structures and Right-Wing Extremist Party Success”, doi:10.7910/DVN/23280 UNF:5:zWdBbOEjWqbNoUiRlLTKwg== Harvard Dataverse Network [Distributor] V2

The data, which cover Western Europe during the 1984-2001 period, are now obviously of historical interest in more than one sense. The micro information is drawn from a host of national election studies (this was before the ESS) that were lovingly harmonised under the auspices of EREPS. Clearly, those were the days. But hey, according to Google Scholar, this still attracted 25 citations in 2013 and is my second-most cited piece overall. More importantly, I’m still rather happy with it.

If you are interested and don’t have access to the EJPR, our pre-publication version is here.

Sep 232010

Someone asked me for the syntax/data required to replicate my old Electoral Studies piece on party identification in Germany 1977-2002. Its slightly preposterous title not withstanding, as of today it holds a proud 18th rank in Electoral Studies’ list of its current “Top 25 hottest articles” (bringing Web 2.0 and the sciences together was always bound to end up in disaster). So in the very unlikely event that you have been holding your breath for this, breath out: I finally got around to de-clutter my old files and uploaded the replication information to my dataverse:

Kai Arzheimer, 2010, “Replication data for: Dead Men Walking? Party Identity in Germany 1977-2002”, hdl:1902.1/15091 UNF:5:AkxnvIlPgabqb2zsZ39k1A==

Aug 312009

Should one weight their survey data? Is it worth the effort? The short answer must be ‘maybe’ or ‘it depends’. A slightly longer and much more useful answer was given by Leslie Kish in his enormously helpful paper ‘Weighting: Why, when and how’. Today (well, actually I submitted the final manuscript 2.5 years ago – that’s scientific progress for you!), I have added my own two cent with a short chapter that looks at the effects and non-effects of common weighting procedures (in German). The bottom line is that if you employ the usual weighting variables (age, gender, education and maybe class or region) as controls in your regression, weighting will make next to no difference but might mess with your standard errors.

Reblog this post [with Zemanta]
Nov 032008

The US might face unprecedented levels of turnout in tomorrow’s election, but historically, the non-voters are the biggest camp in American politics. One intriguing explanation for this well-known fact is that low turnout could be a consequence of the very high (by any standard) levels of income inequality: because voters lack experience with universalistic institutions, they are less likely to adopt norms and values that foster participation in elections. This is the gist of an article that appeared recently (by social science standards) in the British Journal of Politics and International Relations. While the thesis is interesting enough, I did not find the evidence (design, operationalisation, statistical model) particularly convincing and consequentially embarked on a major replication exercise. As it turned out, there are indeed major problems with the original analysis, including a rather problematic application of the ever popular time-series cross-sectional approach (aka Beck&Katz). Last week, my own article on the (non-)relationship between inequality and turnout has finally appeared in the BJPIR. If you don’t have access to the journal, you can still download the preprint version (“Something Old, Something New, Something Borrowed, Something True?”) from my homepage. And if you in turn find this rather unconvincing, you can download the replication data for the various inequality/turnout models and do your own analysis. Enjoy.
Technorati-Tags: , , , , , , , , , , , , , ,

Mar 202008

By relying on scripts (do-files), Stata encourages you to work in a structured, efficient and reproducible way. This text-based approach is familiar and attractive to anyone who has ever used a unix shell and the standard utilities. Actually, unix-flavoured utilities can make your stata experience even better. One non-obvious candidate is make, which is usually used for programming projects that require some sort of compilation.

Consider the following scenario. You have two ascii files of raw data, micro.raw and macro.raw. You want to read in both files, correct some errors, convert them to stata’s .dta format, merge them, apply some recodes, do a lot of preliminary analyses, and finally produce a postscript file and a table for inclusion in LaTeX. Of course, you could write one large do-File that does the job. But then, changing a tiny detail of the final graph would require you to repeat the whole procedure, which is time-consuming (especially when you work with very large datasets). Moreover, you often want to perform some interactive checks at an intermediate stage. So breaking down the large job in a number of smaller jobs that are much easier to maintain and produce a set of intermediate files seems like a good idea.

However, now running these jobs in the correct order (and determining whether a job needs to be re-run at all) becomes crucial. And this is where make can help you. A Makefile simply contains information on dependencies amongst your files as well as instructions for getting from A to B. Here is an example for the above scenario

#your comments here

#Define new implicit rule: if you want to create dataset x.dta, run
%.dta :
tab wstata -e do $<

micro.dta : micro.raw

macro.dta: macro.raw

combined.dta: micro.dta macro.dta

final.dta: combined.dta

finalmodel.ster: final.dta
tab wstata -e do

tabandgraph: finalmodel.ster
tab wstata -e do

The first two lines after the comment define a new rule: Whenever you ask make to make a datset, it will run Stata in batch-mode with a do-file of the same name as input. The next line specifies that micro.dta depends on the raw data (micro.raw) as well as on the code that transforms micro.raw into micro.dta. If either file changes, make knows that micro.dta has to be recreated, otherwise, it can be left alone. The same applies to macro.dta, combined.dta and final.dta. So if you type (at the command prompt) “make final.dta” will (re-create) combined.dta, micro.dta, and macro.dta if and only if this is necessary (i.e. if either the scripts or the raw data have changed. The same goes for your tables and graphs. If you change but none of the other files, only will be re-executed. If, on the other hand, you correct an error in or get a new release of micro.raw, all the intermediate datasets and the saved estimates will be re-created in the correct order before is re-executed.

Two fine points: a) instructions for making something start with a tabstop and b) -e (instead of /E) is required because MSDOS-like options confuse make. If your project is complex, makefiles can save your day. If you work in a *nix environment, make and its documentation is most probably installed on your system. In a Windows environment, you will have to install either Cygwin or MinGW to get make. The Wikipedia article is an excellent starting point for learning the syntax of makefiles.

Technorati Tags: stata, data, make political science

Social Bookmarks: