Running MLwiN from within Stata

In the past, I did a lot of multi-level modelling with MLwiN 2.02, which I quickly learned to loath. Back in the late 1990s, MLwiN was perhaps the first ML software that had a somewhat intuitive interface, i.e. it allowed one to build a model by pointing and clicking. Moreover, it printed updated estimates on the screen while cycling merrily through the parameter space. That was sort of cool, as it could take minutes to reach convergence, and without the updating, one would never have been sure that the program had not crashed yet. Which it did quite often, even for simple models.

Worse than the bugs was the lack of proper scriptability. Pointing and clicking  loses its appeal when you need to run the same model on 12 different datasets, or when you are looking at three variants of the same model and 10 recodes of the same variable. Throw in the desire semi-automatically re-compile the findings from these exercises into two nice tables for inclusion in  Running MLwiN from within Stata again and again after finding yet another problem with a model, and you will agree that any  piece of software that is not scriptable is pretty useless for scientists.

Continue reading “Running MLwiN from within Stata” »

Robust Regression of Aggregate Data in Stata

I’m currently working on an analysis of the latest state election in Rhineland-Palatinate using aggregate data alone, i.e. electoral returns and structural information, which is available at the level of the state’s roughly 2300 municipalities. The state’s Green party (historically very weak) has roughly tripled their share of the vote since the last election in 2006, and I want to know were all these additional votes come from. And yes, I’m treading very careful around the very large potential ecological fallacy that lurks at the centre of my analysis, regressing Green gains on factors such as tax receipts and distance from next university town, but never claiming that the rich or the students or both turned to the Greens.

One common problem with this type of analysis is that not all municipalities are created equal. There is a surprisingly large number of flyspeck villages with only a few dozen voters on, whereas the state’s capital boasts more than 140,000 registered voters. Most places are somewhere in between. Having many small municipalities in the regression feels wrong for at least two reasons. First, small-scale changes of political preferences in tiny electorates will result in relatively large percentage changes. Second, the behaviour of a relatively large number of voters who happen to live in a small number of relatively large municipalities will be grossly underrepresented, i.e. the countryside will drive the results.

Continue reading “Robust Regression of Aggregate Data in Stata” »

Are Germans More Afraid of Neo-Nazis Than of Islamists?

Whose afraid of whom?

The liberal German weekly Zeit has commissioned a YouGov poll which demonstrates that Germans are more afraid of right-wing terrorists than of Islamist terrorists. The question read “What is, in your opinion, the biggest terrorist threat in Germany?” On offer were right-wingers (41 per cent), Islamists (36.6 per cent), left-wingers (5.6 per cent), other groups (3.8 per cent), or (my favourite) “no threat” (13 per cent). This is a pretty daft question anyway. Given the news coverage of the Neo-Nazi gang that has killed at least ten people more or less under the eyes of the authorities, and given that the authorities have so far managed to stop would-be terrorists in their tracks, the result is hardly surprising.

Continue reading “Are Germans More Afraid of Neo-Nazis Than of Islamists?” »

Sampling from a Multinomial Distribution in Stata

Sometimes, a man’s gotta do what a man’s gotta do. Which, in my case, might be a little simulation of a random process involving an unordered categorical variable. In R, sampling from a multinomial distribution is trivial.

rmultinom(1,1000,c(.1,.7,.2,.1))

Continue reading “Sampling from a Multinomial Distribution in Stata” »

Me at the Margins: Average Marginal Effects, Marginal Effects at the Mean, and Stata’s margins command

Seems that I am not the only one who is startled by Stata 11′s margins command, which does all sorts of amazing things. At a mere 50 pages (not counting the remarks on margins postestimation), the documentation is a little overwhelming, and there are just too many options. There are two separate issue that seem to confuse a lot of people (see this discussion on statalist on the then new margins command).

Marginal Effects at the Mean vs Average Marginal Effects

Continue reading “Me at the Margins: Average Marginal Effects, Marginal Effects at the Mean, and Stata’s margins command” »

Statistics and Data links roundup for January through September 2010

National Grid for Great Britain Statistics and Data links roundup for January through September 2010
Image via Wikipedia

Statistics and Data links roundup for January through September 2010
Continue reading “Statistics and Data links roundup for January through September 2010” »

Which of my students are most likely to gang up against me?

I’m teaching a lecture course on Political Sociology at the moment, and because everyone is so excited about social capital and social network analysis these days, I decided to run a little online experiment with and on my students. The audience is large (at the beginning of this term, about 220 students had registered for this lecture series) and quite diverse, with some students still in their first year, others in their second, third or fourth and even a bunch of veterans who have spent most of their adult lives in university education.

glorreiche 10 150x150 Which of my students are most likely to gang up against me?

Who knows whom in a large group of learners?

Continue reading “Which of my students are most likely to gang up against me?” »

How to get from Stata to Pajek

I’m teaching an introductory SNA class this year. Following a time-honoured tradition, I conducted a small network survey at the beginning of the class using Limesurvey. Getting the data from Limesurvey to Stata via CSV was easy enough. Here is the data set. But how does one get the data from Stata to Pajek for analysis? Actually, it’s quite easy.

First, we need to change the layout of the data. In the data set, there is one record for each of the 13 respondent. Each record has 13 variables, one for each (potential) arc connecting the respondent to other students in the class. This is equivalent to Stata’s “wide” form. Stata’s reshape command will happily re-arrange the data to the “long” form, with one record for each arc. This is what Pajek requires.

Continue reading “How to get from Stata to Pajek” »

Software for Social Network Analysis: Pajek and Friends

Our project on social (citation and collaboration) networks in British and German political science involves networks with hundreds and thousands of nodes (scientists and articles). At the moment, our data come from the Social Science Citation Index (part of the ISI web of knowledge), and we use a bundle of rather eclectic (erratic?) scripts written in Perl to convert the ISI records into something that programs like Pajek or Stata can read. Some canned solutions (Wos2pajek, network workbench, bibexcel) are available for free, but I was not aware of them when I started this project, did not manage to install them properly, or was not happy with the results. Perl is the Swiss Army Chainsaw (TM) for data pre-processing, incredibly powerful (my scripts are typically less than 50 lines, and I am not an efficient programmer), and every time I want to do something in a slightly different way (i.e. I spot a bug), all I have to do is to change a few lines in the scripts.
After trying a lot of other programs available on the internet, we have chosen Pajek for doing the analyses and producing those intriguing graphs of cliques and inner circles in Political Science. Pajek is closed source but free for non-commercial use and runs on Windows or (via wine) Linux. It is very fast, can (unlike many other programs) easily handle very large networks, produces decent graphs and does many standard analyses. Its user interface may be slightly less than straightforward but I got used to it rather quickly, and it even has basic scripting capacities.

 Software for Social Network Analysis: Pajek and Friends

The Missing Manual

Continue reading “Software for Social Network Analysis: Pajek and Friends” »

Makefile helps with latex, too

A couple of weeks ago, I posted an article on how make and Makefiles can help you to organise your Stata projects. If you are working in a unix environnment, you’ll already have make installed. If you work under Windows, install GNU make – it’s free, and it can make your Stata day. Rather unsurprisingly, make is also extremely useful if you have large or medium-sized latex project (or if you want to include tables and/or graphs produced by Stata) in a latex document. For instance, this comes handy if you have eps-Figures and use pdflatex. pdflatex produces pdf files instead of dvi files. If you produces slides with, this can save you a lot of time because you don’t have to go through the latex – dvips – ps2pdf cycle. However, pdflatex cannot read eps files: you have to convert your eps files with pstoedit to the meta post format, then use meta post to convert them to mps (which can be read by pdflatex). With this Makefile snippet, everything happens automagically:


#New implicit rules for conversion of eps->mp->mps
#Change path if you have installed pstoedit in some other place
%.mp : %.eps
c:\pstoedit/pstoedit.exe -f mpost $*.eps $*.mp

Continue reading “Makefile helps with latex, too” »