Apr 262012
 

For our piece on distance effects in English elections we geocoded the addresses of hundreds of candidates. For the un-initiated: Geocoding is the fine art of converting addresses into geographical coordinates (longitude and latitude). Thanks to Google and some other providers like OpenStreeMap, this is now a relatively painless process. But when one needs more than a few addresses geocoded, one does not rely on pointing-and-clicking. One needs an API, i.e. a software library that makes the service accessible through R, Python or some other programming language.

The upside is that I learned a bit about the wonders of Python in general and the charms of geopy in particular. The downside is that writing a simple script that takes a number of strings from a Stata file, converts them into coordinates and gets them back into Stata took longer than I ever thought possible. Just now, I’ve learned about a possible shortcut (via the excellent data monkey blog): geocode is a user-written Stata command that takes a variable containing address strings and returns two new variables containing the latitude/longitude information. Now that would have been a bit of a time-saver. You can install geocode by typing

net from http://www.stata-journal.com/software/sj11-1
net install dm0053

There is, however, one potential drawback: Google limits the number of free queries per day (and possibly per minute). Via Python, you can easily stagger your requests, and you can also use an API key that is supposed to give you a bigger quota. Geocoding a large number of addresses from Stata in one go, on the other hand, will probably result in an equally large number of parsing errors.

Apr 212012
 

I’m more and more intrigued by the potential spatial data hold for political science. Once you begin to think about it, concepts like proximity and clustering are basic building blocks  for explaining social phenomena. Even better, since the idea of open data has gone mainstream, more and more spatially referenced information becomes available, and when it comes to free, open source software, we are spoilt for choice or, at least in my case, up and beyond the point of utter confusion.

For our paper on the effect of spatial distance between candidates and their prospective voters,  we needed  a choropleth map of English Westminster constituencies that shows how many of the mainstream candidates live within the constituency’s boundaries. Basically, we had three options (not counting the rather few user-contributed packages for Stata): GRASS, a motley collection of Python packages, and a host of libraries for R.

GRASS is a full-blown open source GIS, whose user interface is perfect for keyboard aficionados and brings back happy memories of the 1980s. While GRASS can do amazing things with raster and vector maps, it is suboptimal for dealing with rectangular data. In the end, we used only its underrated cartographic ps.map module, which reliably creates high-resolution postscript maps.

Python has huge potential for social scientists, both in its own right and as a kind of glue that binds various programs together. In principle, a lot of GIS-related tasks could be done with Python alone. We used the very useful geopy toolboxfor converting UK postcodes to LatLong co-ordinates, with a few lines of code and a little help from Google.

Candidate locations by constituency

The real treasure trove, however, is R. The quality of packages for spatial analysis is amazing, and their scope is a little overwhelming. Applied Spatial Data Analysis with R by Roger Bivand, who wrote much of the relevant code, provides much-needed guidance.

Counting the number of mainstream candidates living in a constituency is a point-in-polygon problem: each candidate is a co-ordinate enclosed by a constituency boundary. Function overlay from package sp carries out the relevant operation. Once I had it located, I was seriously tempted to loop over constituencies and candidates. Just in time, I remembered the R mantra of vectorisation. Provided that points (candidates) and polygons (constituencies) have been transformed to the same projection, all that is needed is this:

[email protected]$homeconst1 <-overlay(candpos1,mymap)
[email protected]$homeconst2 <-overlay(candpos2,mymap)
[email protected]$homeconst3 <-overlay(candpos3,mymap)

This works because candpos1 is a vector of points that represent the spatial positions of all Labour candidates. These are tested against all constituency boundaries. The result is another vector of indices, i.e. sequence numbers of the constituencies the candidates are living in. Put differently, overlay takes a list of points and a bunch of polygons and returns a list that maps the former to the latter. With a bit of boolean logic, a vector of zeros (candidate outside constituency) and ones (candidate living in their constituency) ensues. Summing up the respective vectors for Labour, Tories, and LibDems then gives the required count that can be mapped. Result!

Apr 022012
 

In first-past-the-post systems, voters should prefer local candidates for all sorts of reasons. From a rational choice perspective, you could argue that local candidates should, on average, more similar to their constituency in socio-economic terms and therefore more likely to represent their interests. A more socio-psychological-minded explanation would refer to shared ideological traits, positive stereotypes and collective identities. Or you could argue that local candidates are simply better known and have more opportunities for canvassing. Either way, even your granny knew that local is better when it comes to politics.

Only that she could never prove this assertion, while we can. Almost two years after the event, Political Geographyhas accepted our paper on the effect of (driving) distance between English mainstream candidates and their voters in the 2010 General Election. Controlling for incumbency, socio-economic distance and pre-campaign feeling towards the major parties, we demonstrate that physical distance (derived from candidates’ addresses and the centroid of their prospective voters’ neighbourhood) has a small but politically relevant effect. And yes, this is a brilliant start to this week!

Update: I have moved the preprint to a separate page. You can access the PDF, replication data etc. by clicking on the links below.

    Arzheimer, Kai and Jocelyn Evans. “Geolocation and voting: candidate-voter distance effects on party choice in the 2010 General Election in England.” Political Geography 31.5 (2012): 301-310. doi:10.1016/j.polgeo.2012.04.006
    [BibTeX] [Abstract] [Download PDF] [HTML] [DATA]

    The effect of geographical distance between candidate and voter on vote likelihood in the UK is essentially untested. In systems where constituency representatives vie for local inhabitants’ support in elections, candidates living closer to a voter would be expected to have a greater probability of receiving that individual’s support, other things being equal. In this paper, we present a first test of this concept using constituency data (specifically, notice of poll address data) from the British General Election of 2010 and the British Election Survey, together with geographical data from Ordnance Survey and Royal Mail, to test the hypothesis that candidate distance matters in voters’ choice of candidate. Using a conditional logit model, we find that the distance between voter and candidates from the three main parties (Conservative, Labour and Liberal Democrat) matters in English constituencies, even when controlling for strong predictors of vote-choice, such as party feeling and incumbency advantage.

    @Article{arzheimer-evans-2012,
    author = {Arzheimer, Kai and Evans, Jocelyn},
    title = {Geolocation and voting: candidate-voter distance effects on party choice in the 2010 General Election in England},
    number = {5},
    volume = {31},
    abstract = {The effect of geographical distance between candidate and voter on vote likelihood in the UK is essentially untested. In systems where constituency representatives vie for local inhabitants' support in elections, candidates living closer to a voter would be expected to have a greater probability of receiving that individual's support, other things being equal. In this paper, we present a first test of this concept using constituency data (specifically, notice of poll address data) from the British General Election of 2010 and the British Election Survey, together with geographical data from Ordnance Survey and Royal Mail, to test the hypothesis that candidate distance matters in voters' choice of candidate. Using a conditional logit model, we find that the distance between voter and candidates from the three main parties (Conservative, Labour and Liberal Democrat) matters in English constituencies, even when controlling for strong predictors of vote-choice, such as party feeling and incumbency advantage.},
    journal = {Political Geography},
    year = 2012,
    doi = {10.1016/j.polgeo.2012.04.006},
    pages = {301--310},
    keywords = {uk, gis},
    html = {http://www.kai-arzheimer.com/paper/geolocation-voting-candidate-voter-distance-effects-party-choice-2010-general-election-england},
    data = {http://hdl.handle.net/1902.1/17940},
    url = {http://www.kai-arzheimer.com/arzheimer-evans-geolocation-vote-england.pdf}
    }