Open Census Data and Open Geo Data in the UK

I’ve just realised that Her Majesty’s government has released the results of the 2011 (!) census at the level of individual output areas. Output areas are small, socially homogeneous cells with an average size of about 300 residents. In other words: The information is incredibly detailed, which is a bit worrying on one level (I see your prism and raise you a Tempora), but on another level quite exhilarating. And that is because HM has also released various geographies, meaning that it is comparatively easy to find and plot these clusters of people in space. The possibilities for social research are endless (but don’t bother looking for a link here – these data are distributed and duplicated over various sites run by the ONS, the Ordnance Survey, Edina, UK data etc., so it’s best to google “output area” and start digging from there).

Camden: Electoral Wards and Output Areas
Electoral Wards and Census Output Areas in Camden (click for higher resolution)

This nice map of Camden is just an illustration for a paper I am currently working on. It was created using the very useful sp and OpenStreetMap packages for R and shows the borough’s 18 electoral wards (the pinkish polygons) as well as the population weighted centroids for the output areas contained in these wards.

The upshot is that people are sometimes very unevenly distributed within wards (perhaps unsurprisingly, nobody lives on Hampstead Heath), so if you are looking for neighbourhood effects on voting behaviour, even the very detailed ward level data can be misleading. In an ideal world, we would get electoral counts at the output area level, but even I can see that this might be a bit problematic.

David Spiegelhalter on Risk, Knife-Crime and the Probability of Being Killed in London

Radio 4 never fails to amaze me. This morning, just three minutes before the 9 o’clock news, they interviewed David Spigelhalter. Spiegelhalter is obviously the man who gave us BUGS. But he  is also Winton Professor of the Public Understanding of risk at the University of Cambridge, and a man who can (within the 90 seconds they allocated him) explain to a lay public why a spade in knife-crime (last summer, four people were killed in the space of just one day) is not totally unlikely and does not necessarily indicate an increase in the murder rate, illustrating the idea of clustered risks in passing. He even convinced the anchor that stats is actually fun, even if you look at 170 murders per year in a population of just 7 million Londoners. I was duly impressed (you can listen here to the interview with Spiegelhalter). In fact, I was so impressed that I googled him once I reached the office and came across his website, which has full coverage of the London murder mystery (that is solved by modelling a Poisson distribution of the incidents).


David Spiegelhalter on Risk, Knife-Crime and the Probability of Being Killed in London 1