## Working with repeated comparative survey data – almost a howto

There is now a bonanza of studies that rely on surveys which are replicated across countries and time, often with fairly short intervals, with the ESS arguably one of the most prominent examples (but also see the “barometer” studies in various regions). Multi-level analysis is now the weapon of choice to tackle these data, but the appropriate structure of such models is not immediately obvious: are we looking at waves nested in countries? Countries nested in waves? Or rather at surveys cross-classified by year and country? What’s the role of the small-n problem when we are talking about countries? And does the notion of sampling even make sense when we are talking about what is effectively the whole population of countries that could be studied?

• Schmidt-Catran, A. W., & Fairbrother, M. (2016). The random effects in multilevel models: getting them wrong and getting them right. European Sociological Review, 32(1), 23–38. http://dx.doi.org/10.1093/esr/jcv090
• Schmidt-Catran, A. W., Fairbrother, M., & Andreß, H. (2019). Multilevel models for the analysis of comparative survey data: common problems and some solutions. , 71(1), 99–128. http://dx.doi.org/10.1007/s11577-019-00607-9

## What we liked

It’s difficult to have a discussion about a text that provides a lot of factual information about methodological bits and bobs, especially when you have little prior knowledge. Having said that, students found both texts (which are related but complementary) remarkably accessible and helpful.

Sad but true: comparative analysis is hard, and multi-level models are no panacea. Nothing ever is. Bugger.

## What we did not like so much

Nothing. Students liked these two. So did I. Period.

Like social networks, multilevel data structures are everywhere once you start thinking about it. People live in neighbourhoods, neighbourhoods are nested in municipalities, which make up provinces – well, you get the picture. Even if we have no substantive interest in their effects, it often makes sense to control for structures in our data to get more realistic standard errors.

Now the good folks over at the European Social Survey have reacted and spent the Descartes Prize money on compiling multilevel information and merging them with their own data. So far, the selection is a little bit disappointing in some respects. Homicide rates, for instance, are reported on the national level only. But there are some pleasant surprises (I guess due to Eurostat, who collect such things): We get unemployment, GDP growth and even student numbers at the NUTS-3 level. Since you asked, NUTS is the Nomenclature of (subnational) Territory, and level 3 is the lowest level for which comparative data are normally published.

Regrettably, the size and number of level 3 units is not necessarily comparable across countries: For Germany, level 3 corresponds to about 400 local government districts, while France is divided into 96 European Departments. But if you need to combine top-notch survey data with small(ish) regional data, it’s a start, and not a bad one.

In the past, I did a lot of multi-level modelling with MLwiN 2.02, which I quickly learned to loath. Back in the late 1990s, MLwiN was perhaps the first ML software that had a somewhat intuitive interface, i.e. it allowed one to build a model by pointing and clicking. Moreover, it printed updated estimates on the screen while cycling merrily through the parameter space. That was sort of cool, as it could take minutes to reach convergence, and without the updating, one would never have been sure that the program had not crashed yet. Which it did quite often, even for simple models.

Worse than the bugs was the lack of proper scriptability. Pointing and clicking  loses its appeal when you need to run the same model on 12 different datasets, or when you are looking at three variants of the same model and 10 recodes of the same variable. Throw in the desire to semi-automatically re-compile the findings from these exercises into two nice tables for inclusion in $LaTeX$ again and again after finding yet another problem with a model, and you will agree that any  piece of software that is not scriptable is pretty useless for scientists.

MLwiN’s command language was unreliable and woefully underdocumented, and everything was a pain. So I embraced xtmixed when it came along with Stata 9/10, which solved all of these problems.

runmlwin presentation (pdf)

But xtmixed is slow with large datsets/complex models. It relies on quadrature, which is exact but computationally intensive. MLwiN works with approximations of the likelihood function (quick and dirty) or MCMC (strictly speaking a Bayesian approach, but people don’t ask to many questions because it tends to be faster than quadrature). Moreover, MLwiN can run a lot of fancy models that xtmixed cannot, because it is a highly specialised program that has been around for a very long time.

Enter the good people over at the Centre for Multilevel Modelling at Bristol, who have come up with runmlwin, an ado that essentially makes the functionality of MLwiN available as a Stata command, postestimation analysis and all. Can’t wait to see if this works with Linux, wine and my ancient binaries, too.

MLwiN is one of the granddaddies of multi-level modelling software (the other being HLM).  Essentially, it is a 1990s-ish looking and sometimes quirky GUI wrapped around  an old DOS program (MLn). The one feature that set MLwiN apart in the late 1990s is point-and-click interface that allows you to build the equations for a multi-level in a stepwise fashion. The underlying command language is still slightly confusing and less than well documented, and some of the modern features (such as modelling categorical dependent variables) are implemented as external macros, which does not need to concern you unless something goes horribly wrong, which happens occassionally.

That said, MLwiN is reasonably fast, does now incorporate modern MCMC estimators, has an interface with WINBUGS and can be convinced to do most things you would possibly want to do with it.  I bought version 1.10 ca. 1998, received free upgrades to 2.02 and good support well until 2004/2005 or so.  These days, Stata, R and MPlus can all estimate multi-level models, but working with MLwiN may still be worthwhile for you (by the way, you can download the free stata2mlwin addon from UCLA academic technology to export your variables from Stata to MLwiN).

Rather amazingly, MLwiN is now freely available for anyone working in UK universities: just enter your details including your ac.uk-email, and few days later, they will send you a download link.

Last year, the “Kölner Zeitschrift für Soziologie and Sozialpsychologie” published an article on the level of support for the European Union’s core principles (democracy, gender equality, religious freedom, rule of law) in Turkey. In essence, the author claimed that the level of support for these principles in Turkey is low because a) the level of economic development is low while b) the number of Muslims is very high. Thanks to the very efficient PR office at the university of Cologne, these findings made their way into the mainstream media in Germany (including the English service of the Deutsche Welle) and Turkey and eventually even into the more shady parts of the blogosphere (that are normally the object rather than the consumer of sociological studies).

I felt, however, that the analysis suffered from a whole host of serious methodological and theoretical shortcomings, and that the claims of the original paper are untenable. Therefore, I wrote a comment on “Paßt die Türkei zur EU und die EU zu Europa” (in German, also as PDF). The Kölner Zeitschrift has recently accepted my article, and it will appear in the next issue. Replication data and stata scripts for my paper are available, too.

Technorati Tags: turkey, democracy, Islam, Muslims, multi-level analysis, european, union, values, survey, sociology, stata

Social Bookmarks: