I’ve just realised that Her Majesty’s government has released the results of the 2011 (!) census at the level of individual output areas. Output areas are small, socially homogeneous cells with an average size of about 300 residents. In other words: The information is incredibly detailed, which is a bit worrying on one level (I see your prism and raise you a Tempora), but on another level quite exhilarating. And that is because HM has also released various geographies, meaning that it is comparatively easy to find and plot these clusters of people in space. The possibilities for social research are endless (but don’t bother looking for a link here – these data are distributed and duplicated over various sites run by the ONS, the Ordnance Survey, Edina, UK data etc., so it’s best to google “output area” and start digging from there).
Electoral Wards and Census Output Areas in Camden (click for higher resolution)
This nice map of Camden is just an illustration for a paper I am currently working on. It was created using the very useful sp and OpenStreetMap packages for R and shows the borough’s 18 electoral wards (the pinkish polygons) as well as the population weighted centroids for the output areas contained in these wards.
The upshot is that people are sometimes very unevenly distributed within wards (perhaps unsurprisingly, nobody lives on Hampstead Heath), so if you are looking for neighbourhood effects on voting behaviour, even the very detailed ward level data can be misleading. In an ideal world, we would get electoral counts at the output area level, but even I can see that this might be a bit problematic.
More than two years ago, a research paper by Dr Malte Steinbrink and his students created quite a stir in German Human Geography. Using Social Network Analysis, the group identified a tight-knit cluster of academic geographers who basically run the show – in other words, an oligarchy. “Berichte zur deutschen Landeskunde” has now published an issue devoted exclusively to the debate on these findings. Colleague Harald Schoen and I were invited to the party because we are perfectly detached outsiders (and nice guys to boot). Here is our comment on the paper that rocked German geography.
A mere 2.75 years after the fact, the Definitive Volume (TM) on the German Federal Election of 2009 is almost (almost!) ready to go to the printers’. And so is our chapter on East-West differences in German voting behaviour, which is vintage before it is even out (Pirate party, anyone?). Obviously, the details are becoming more and more blurry, so going through the proofs actually made for a pleasant read.
Political Science is the magpie amongst the social sciences, which borrows heavily from other disciplines. These days, many political scientists are actually failed economists (even more failed economists are actually economists, however). I used to think of myself as a failed sociologist, but reading the proofs it dawned on me that I might actually aspire to become a failed geographer.
Local deviations from regional voting patterns
On particular nice map that should have been discussed more thoroughly in the paper shows the local deviation from regional voting patterns. Yes, you read that right: I calculate an index (basically Pedersen’s) that summarises local (i.e. district level) deviations from the regional (East vs West) result and roll that into a choropleth. This way, it is easy to see how heterogeneous the two regions really are. Most striking (in my view) is the difference between Bavaria and the other Western Länder, which is of course a result of the CSU’s still relatively strong position. The PDS/Left party’s stronghold over the eastern districts of Berlin is clearly visible, too.
For our piece on distance effects in English elections we geocoded the addresses of hundreds of candidates. For the un-initiated: Geocoding is the fine art of converting addresses into geographical coordinates (longitude and latitude). Thanks to Google and some other providers like OpenStreeMap, this is now a relatively painless process. But when one needs more than a few addresses geocoded, one does not rely on pointing-and-clicking. One needs an API, i.e. a software library that makes the service accessible through R, Python or some other programming language.
The upside is that I learned a bit about the wonders of Python in general and the charms of geopy in particular. The downside is that writing a simple script that takes a number of strings from a Stata file, converts them into coordinates and gets them back into Stata took longer than I ever thought possible. Just now, I’ve learned about a possible shortcut (via the excellent data monkey blog): geocode is a user-written Stata command that takes a variable containing address strings and returns two new variables containing the latitude/longitude information. Now that would have been a bit of a time-saver. You can install geocode by typing
net from http://www.stata-journal.com/software/sj11-1
net install dm0053
There is, however, one potential drawback: Google limits the number of free queries per day (and possibly per minute). Via Python, you can easily stagger your requests, and you can also use an API key that is supposed to give you a bigger quota. Geocoding a large number of addresses from Stata in one go, on the other hand, will probably result in an equally large number of parsing errors.