Stata is my favourite general-purpose stats package. Sadly, it is also one of my favourite pasttimes, but there you are. Here is my collection of Stata-related blog posts. If this is relevant for you, you might also be interested in a series of slides for a Stata course I taught some years ago (in German)
The Stata idiom capture quietly makes it so that any output from the subsequent command is suppressed, and that even critical failures are happily ignored. Your script soldiers on, and you are none the wiser. I always thought that this is a wonderful metaphor for organisational behaviour.
In unrelated news, every other summer, Statacorp comes up with a new version of its product. Every other summer, I succumb to some Pavlovian reflex and decide to spend some institutional money on upgrading my unit’s licences for some interesting but usually quite marginal benefits.
It is the same story in other units and departments, and by coordinating and pooling our orders, we can get substantial discounts. And so, come autumn, the university’s IT centre is collating expressions of interest and communicating tentative prices, going back and forth until some equilibrium is reached. From then on, it can still take months until the new licences arrive, in spite of shipments being just codes and downloads now. Yesterday, I realised that Stata 17 came out in April, i.e. nine months ago, and so decided to find out what had happened to our order. As it turned out, the IT centre required our charge codes to proceed, but had never bothered to ask for them.
A year ago, I wrote a slightly maudlin blog about the good and the not-so-good reasons for solo-blogging in this time and age. Good reasons or not, I kept up the good work with 35 blogs in 2018. That is a bit less than my long-term annual average, but Chapeau to my good self nonetheless.
But what were the most popular (used in a strictly relative sense) posts on the blog in 2018? Here is your handy guide:
#10 Sampling from a Multinomial Distribution in Stata Who hasn’t found themselves in the situation where they want to sample from a multinomial distribution (IRONY KLAXON!)? It’s easily done in R. In Stata, you have to go through a few hoops. This short 2011 post is still reasonably popular, seven years down the line, because it tells you how you do it.
#6 A few thoughts on the framing of the AfD’s result in Bavaria In the Bavaria state election in October 2018, the AfD did actually less well than many had expected. But to the party’s immense delight, many journalists remained in full democratic-crisis mode. I disagree with that, for reasons.
#5 Three and a half Special Issues on (Right-Wing) populism. And then two more 2018 was a horrible year for the world, and hence a good year for Radical Right/Populism research. It showed early on, i.e. in January, with a number of special issues on the topic.
#4 I looked up the AfD’s women’s organisation on Facebook. You will not believe what I found. I wrote this one on December 30 2017. What can I say? Sometimes, reality imitates art. You will not believe it until you see what I saw.
This is not the late 1990s. Hey, it’s not even the early Naughties, and has not been for a while. I have had my own tiny corner of the Internet (then hosted on university Web space as it was the norm in the day) since Mosaic came under pressure from Netscape and the NYT experimented with releasing content as (I kid you not) postscript files, because PDF was not invented yet. I did this mostly because I liked computers, because it was new, and because it provided an excellent distraction from the things I should have been doing. By and large, not much changes over 25 years.
Photo by karimian
Later (that was before German universities had repositories or policies for such things), my webspace became a useful resource for teaching-related material. Reluctantly and with a certain resentment, I have copied slides and handouts from one site to the next, adding layers of disclaimers instead of leaving them behind, because some of this stuff carries hundreds of decade-old backlinks and gets downloaded / viewed dozens of times each day.
And of course, I started posting pre-publication versions of my papers, boldly ignoring / blissfully ignorant of the legal muddle surrounding the issue back in the day. Call me old fashioned, but making research visible and accessible is was the Web was invented for.
In summer 2008, I set up my own domain on a woefully underpowered shared webspace (since replaced by an underpowered virtual server). A bit earlier in the same year, already late to the party, I had started my own “Weblog” on wordpress.com, writing and ranting about science, politics, methods, and all that. A year down the road, I converted www.kai-arzheimer.com to wordpress, moved my blog over there, and have never looked back continously wondered why I kept doing this.
Why keep blogging?
In those days of old, we had trackbacks and pingbacks & stuff (now a distant memory), and social media was the idea of having a network of interlinking personal blogs, whose authors would comment on each other’s posts. Even back in 2008 on wordpress, my blog was not terribly popular, but for a couple of years, there was a bunch of people who had similar interests, with whom I would interact occasionally.
Then, academically minded multi-author blogs came along, which greatly reduced fragmentation and aimed at making social science accessible for a much bigger audience whilst removing the need to set up and maintain a site. For similar reasons, Facebook and particularly Twitter became perfect outlets for ranting “microblogging”, while Medium bypasses the fragmentation issue for longer texts and is far more aesthetically pleasing and faster than anything any of us could run by ourselves.
Photo by kjarrett
It is therefore only rational that many personal academic blogs died a slow death. People I used to read left Academia completely, gave up blogging, or moved on to the newer platforms. Do you remember blogrolls? No, you wouldn’t. Because I’m a dinosaur, I still get my news through an RSS reader (and you should, too). While there are a few exceptions (Chris Blattman and Andrew Gelman spring to mind), most of the sources in my “blog” drawer are run by collectives / institutions (the many LSE blogs, the Monkey Cage, the Duck etc.). I recently learned that I made it into an only slightly dubious looking list of the top 100 political science blogs, but that is surely because there are not many individual political science bloggers left. So why am I still rambling in this empty Platonic man-cave? Off the top of my head, I can think of about five reasons:
Total editorial control. I have written for the Monkey Cage, The Conversation, the LSE, and many other outlets. Working with their editors has made my texts much better, but sometimes I am not in the mood for clarity and accessibility. I want to rant, and be quick about it.
Pre-prints. I like to have pre-publication versions of my work on my site, although again, institutional hosting makes much more sense. Once I upload them, I’m usually so happy that I want to say something about it.
For me, my blog is still a bit like an open journal. If I need to remember some sequence of events in German or European politics for the day job, it’s helpful if I have blogged about it as it happened. Similarly, sometimes I work out the solution to some software issue but quickly forget the details. Five months later, a blog post is a handy reference and may help others.
Irrelevance. Often, something annoys or interests me so much that I need to write a short piece about it, although few other people will care. I would have a better chance of being of finding an audience at Medium, but then again on my own wordpress-powered site, I have a perfectly serviceable CME which happens to have blogging functionality built in.
Ease of use. I do almost all of my writing in Emacs and keep (almost) all my notes in orgmode code. Thanks to org2blog, turning a few paragraphs into a post is just some hard-to-remember key strokes away.
Bonus track: the five most popular posts in 2017
As everyone knows, I’m not obsessed with numbers, thank you very much. I keep switching between various types of analytic software and have no idea how much (or rather little) of an audience I actually have. Right now I’m back to the basic wordpress statistics and have been for over a year, so here is the list of the five posts that were the most popular in 2017.
#5 nlcom and the Delta Method. This is a short explainer of the Delta Method and its implementation in a Stata command. It was written in the summer of 2013, presumably when we were working on surveybias, as a note-to-future-self post. It was viewed 343 times in 2017. Not too shabby for an oldie.
#4 State of the German polls: The Schulz effect was real. Part of my 2017 poll-pooling exercise, this post demonstrates that the bounce for the SPD early in the campaign was real but short-lived. Just like this post? It got 620 views, but most of them (559) in March, right when it was published.
#3 Similarly, Five Quick takes on the German election was viewed 869 times, but almost exclusively on election night and on the following day. Which is a pity, because some of it is still relevant (I think).
#2 I looked up the AfD’s women’s organisation on Facebook. You will not believe what I found Posted only on December 30, this one got 878 views in the few remaining hours of the old year, almost bringing down my server in the process. Traffic was driven by Twitter, thanks to the click-baity title and the incredible image. You will not believe what I saw until you see it.
How can we usefully summarise the accuracy of an election opinion poll compared to the real result of an election? In this blog, we describe a score we have devised to allow people to see how different polls compare in their reflection of the final election result, no matter how many parties or candidates are standing. This index, B, can be compared across time, polling company and even election to provide a simple demonstration of how the polls depicted public opinion in the run-up to polling-day
Just how badly biased is your pre-election survey? Once the election results are in, our scalar measures B and B_w provide convenient, single number summaries. Our surveybias add-on for Stata will calculate these and other measures from either raw data or from published margins. Its latest iteration (version 1.4) has just appeared on SSC. Surveybias 1.4 improves on the previous version by ditching the last remnants of the old numerical approximation code for calculating standard errors and is hence much faster in many applications. Install it now from within Stata by typing
In der letzten Woche ist meine Einführung zum Thema Strukturgleichungsmodelle bei Springer/VS erschienen. Das Buch zeigt, wie sich die gängigsten Modelle (u.a. einfache und Mehr-Gruppen-Konfirmatorische-Faktorenanalysen (CFA/MGCFA)) in Stata, Lisrel und MPlus realisieren lassen. Die Beispiele stammen aus dem Bereich der politikwissenschaftlichen Einstellungsforschung (Fremdenfeindlichkeit, politische Entfremdung, politisches Interesse …).
We have updated our add-on (or ado) surveybias, which calculates our multinomial generalisation of the old Martin, Traugott, and Kennedy (2005) measure for survey bias. If you have any dichotomous or multinomial variable in your survey whose true distribution is known (e.g. from the census, electoral counts, or other official data), surveybias can tell you just how badly damaged your sample really is with respect to that variable. Our software makes it trivially easy to asses bias in any survey.
Within Stata, you can install/update surveybias by entering ssc install surveybias. We’ve also created a separate page with more information on how to use surveybias, including a number of worked examples.
The new version is called 1.3b (please don’t ask). New features and improvements include:
Support for (some) complex variance estimators including Stata’s survey estimator (sample points, strata, survey weights etc.)
Improvements to the numerical approximation. survebias is roughly seven times faster now
A new analytical method for simple random samples that is even faster
Convenience options for naming variables created by survebiasseries
Lots of bug fixes and improvements to the code
If you need to quantify survey bias, give it a spin.
Contrary to popular belief, it’s not always the third reviewer that gives you grief. In our case, it is the one and only reviewer that shot down a manuscript, because at the very least, s/he would have expected (and I quote) an “analytical derivation of the estimator”. For some odd reason of his own, the editor, instead of simply rejecting us, dared us to do just that, and against all odds, we succeeded after some months of gently banging various heads against assorted walls.
Needless to say that on second thought, the reviewer found the derivation “interesting but unnecessarily complicated” and now recommends relegating the material to a footnote. To make up for this, s/he delved into the code of our software, spotted some glaring mistakes and recommended a few changes (actually sending us a dozen lines of code) that result in a speed gain of some 600 per cent. This is very cool, very good news for end users, very embarrassing for us, and generally wrong on so many levels.
While the interwebs are awash with headline findings from countless surveys, commercial companies (and even some academics) are reluctant to make their raw data available for secondary analysis. But fear not: Quite often, media outlets and aggregator sites publish survey margins, and that is all the information you need. It’s as easy as .
The Solution: surveybiasi
After installing our surveybias add-on for Stata, you will have access to surveybiasi. surveybiasi is an “immediate command” (Stata parlance) that compares the distribution of a categorical variable in a survey to its true distribution in the population. Both distributions need to be specified via the popvalues() and samplevalues() options, respectively. The elements of these two lists may be specified in terms of counts, of percentages, or of relative frequencies, as the list is internally rescaled so that its elements sum up to unity. surveybiasi will happily report k s, and (check out our paper for more information on these multinomial measures of bias) for variables with 2 to 12 discrete categories.
Bias in a 2012 CBS/NYT Poll
A week before the 2012 election for the US House of Representatives, 563 likely voters were polled for CBS/The New York Times. 46 per cent said they would vote for the Republican candidate in their district, 48 per cent said they would vote for the Democratic candidate. Three per cent said it would depend, and another two per cent said they were unsure, or refused to answer the question. In the example these five per cent are treated as “other”. Due to rounding error, the numbers do not exactly add up to 100, but surveybiasi takes care of the necessary rescaling.
In the actual election, the Republicans won 47.6 and the Democrats 48.8 per cent of the popular vote, with the rest going to third-party candidates. To see if these differences are significant, run surveybiasi like this:
Given the small sample size and the close match between survey and electoral counts, it is not surprising that there is no evidence for statistically or substantively significant bias in this poll.
An alternative approach is to follow Martin, Traugott and Kennedy (2005) and ignore third-party voters, undecided respondents, and refusals. This requires minimal adjustments: is now 535 as the analytical sample size is reduced by five per cent, while the figures representing the “other” category can simply be dropped. Again, surveybiasiinternally rescales the values accordingly:
Under this two-party scenario, is identical to Martin, Traugott, and Kennedy’s original (and all other estimates are identical to ‘s absolute value). Its negative sign points to the (tiny) anti-Republican bias in this poll, which is of course even less significant than in the previous example.