Like social networks, multilevel data structures are everywhere once you start thinking about it. People live in neighbourhoods, neighbourhoods are nested in municipalities, which make up provinces – well, you get the picture. Even if we have no substantive interest in their effects, it often makes sense to control for structures in our data to get more realistic standard errors.
Now the good folks over at the European Social Survey have reacted and spent the Descartes Prize money on compiling multilevel information and merging them with their own data. So far, the selection is a little bit disappointing in some respects. Homicide rates, for instance, are reported on the national level only. But there are some pleasant surprises (I guess due to Eurostat, who collect such things): We get unemployment, GDP growth and even student numbers at the NUTS-3 level. Since you asked, NUTS is the Nomenclature of (subnational) Territory, and level 3 is the lowest level for which comparative data are normally published.
Regrettably, the size and number of level 3 units is not necessarily comparable across countries: For Germany, level 3 corresponds to about 400 local government districts, while France is divided into 96 European Departments. But if you need to combine top-notch survey data with small(ish) regional data, it’s a start, and not a bad one.
If this post’s title does make any sense to you, chances are that you are one of us anoraks who had a brilliant weekend of extreme right spotting. In France, Jean-Marie Le Pen stepped down as leader of the Front National, just under 40 years after he founded the party. He is succeeded by his youngest daughter, who is portrayed as a moderniser (hey, she’s twice divorced) and a moderate (by FN standards). While this story might conjure the image of Prince Charles, Marine’s rise through the Front’s ranks was quick, largely unexpected and a major source of aggravation for Bruno Gollnisch, the controversial academic who became the party’s number two after the old number two, Bruno Mégret, left the party to found the MNR in 1999. If Gollnisch (who was soundly beaten by LePen the younger in the leadership contest) aims to repeat that stunt remains to be seen. I’m sure there is a silly story about men named Bruno who turn out to be the real Princes Charles here (both spent a lot of time eyeing the leadership and are in their early 60s now ), but more importantly, Marine is going to change le Front, though her father might be tempted to meddle. These right-wingers know a thing or two about family values.
Meanwhile, in Germany the NPD, arguably the most radical amongst the electorally viable right-wing parties in Germany has celebrated its merger with its old rival DVU. The DVU was founded in the early 1970s as a marketing device for right-wing books, journals and paraphernalia and became a party in the 1980s. For nearly 40 years, it was completely dominated by its founder Gerhard Frey, who finally stepped down in 2009, aged 76. While has successor planned for the merger that became effective from January 1, some regional leaders are less than happy and seem willing to either take the issue to the courts over some alleged irregularities, or to set up a new party of their own. Either way, it would seem that the German extreme right remains divided, as it has been since the 1980s.
These days, a bonanza of political information is freely available on the internet. Sometimes this information comes in the guise of excel sheets, comma separated data or other formats which are more or less readily machine readable. But more often than not, information is presented as tables designed to be read by humans. This is where the gentle art of screen scraping, web scraping or spidering comes in. In the past, I have used kludgy Perl scripts to get electoral results at the district level off sites maintained by the French ministry of the interior or by universities (very interesting if you do not really speak/read French). A slightly more elegant approach might be to use R’s builtin Perl-like capabilities for doing the job, as demonstrated by Simon Jackman. Finally, Python is gaining ground in the political science community, which has some very decent libraries for screen/web scraping – see this elaborate post on Drew Conway’s Zero Intelligence Agents blog. But, let’s face it: I am lazy. I want to spend time analysing the data, not scraping them. And so I was very pleased when I came across outwit, a massive plugin for the firefox browser (Linux, Mac and Windows versions available) that acts as a point-and-click scraper.
French Départements (from Wikipedia)
Say you need a dataset with the names and Insee numbers for all the French Départements. The (hopefully trustworthy) Wikipedia page has a neat table, complete with information on the Prefecture and many tiny coats of arms which are of absolutely no use at all. We could either key in the relevant data (doable, but a nuisance), or we could try to copy and paste the table into a word processor, hoping that we do not lose accents and other funny characters, and that WinWord or whatever we use converts the HTML table into something that we can edit to extract the information we really need.
Or you we could use outwit. One push of the button loads the page
Scraping a table with outwit
into a sub-window, a second push (data->tables) extracts the HTML tables on the page. Now, we can either mark the lines we are interested in by hand (often the quickest option) or use a filter to selfect them. One final click, and they are exported as a CSV file that can be read into R, OpenOffice, or Stata for post processing and analysis.
While I’m all in favour of scriptable and open-source tools like Perl, Python and R, outwit has a lot to go for it if all you need is a quick hack. Outwit also has functions to mass-download files (say PDFs) from a page and give the unique names. If the job is complex, there is even more functionality under the hood, and you can use the point-and-click interface to program you own scraper, though I would tend use a real programming language for these cases. At any rate, outwit is a useful and free tool for the lazy data analyst.
In a recent article in the European Journal of Political Research, Kestilä and Söderlund claim (amongst other things) that in the French regional elections of 2004, turnout and district magnitude have significant negative effects on the extreme right vote whereas the effects of the number of party lists and unemployment are positive and significant. Most interestingly, immigration (which is usually a very good predictor for the radical right vote) had no effect on the success of the Front National. More generally, they argue that a subnational approach can control for a wider range of factors and provide more reliable results than cross-national analyses (now the most common approach to this phenomenon). My colleague Liz Carter and I disagreed and engaged in a massive replication/re-analysis endeavour. The outcome is a critique of the KS model of subnational political opportunity structures in regional elections. In this paper, we dispute Kestilä’s and Söderlund’s claims on theoretical, conceptual and methodological grounds and demonstrate that their findings are spurious. Today, the European Journal has accepted the article for publication (probably in 2009) 🙂
If you are interested in subnational politics, France is an interesting case for many reasons. On the one hand, the country is highly centralised and divided into 96 (European) Departements (administrative units) with equal legal rights (though Corsica is a bit of an exception to this). In fact, Departements were created after the revolution in an attempt to replace the provinces of the Ancien Regime with something rational and neat. On the other hand, the Departements are vastly different in terms of their size, population, economic, political and social structure, which gives you a lot of variance that can be modelled. Electoral data is often made available at the level of the Departement (see e.g. the useful book by Caramani for historical results and the CDSP and government websites for recent elections) or can be aggregated to that level since electoral districts are nested in Departements. The French National Insitute for Statistics and Economic Studies (INSEE) has a wealth of data from the 1999 census and other sources, and even more is available from Eurostat. One thing that is incredibly annoying, however, is that many sources like Caramani, INSEE and the Wikipedia use the traditional French system. This system (which is part of the ISO standard ISO 3166-1) assigns numbers from 1 to 95 that once reflected the alphabetical order of the Departments’ names, though this initial order was a bit scrambled by territorial changes. The most obvious result of these are the odd 2A/2B codes for Corsica (after 1975, see this article on the French Official Geographic Code for the details). Rather unsurprisingly, Eurostat (and a few others) prefer the European NUTS-3 codes, which have a hierarchical structure that consists of a country (FR), region, and subregion (=Departement) code. If you want to merge Departmental data from various sources you obviously have to map one system to the other, which is cumbersome and prone to error. That’s why I wrote a little script in Perl that reads a table of Departmental Codes and creates a do-File for Stata, which does the actual mapping. From within Stata, you can simply type net from https://www.kai-arzheimer.com/stata to get the whole package. It should be fairly easy to adopt this to your own needs – enjoy!