For our piece on distance effects in English elections we geocoded the addresses of hundreds of candidates. For the un-initiated: Geocoding is the fine art of converting addresses into geographical coordinates (longitude and latitude). Thanks to Google and some other providers like OpenStreeMap, this is now a relatively painless process. But when one needs more than a few addresses geocoded, one does not rely on pointing-and-clicking. One needs an API, i.e. a software library that makes the service accessible through R, Python or some other programming language.
The upside is that I learned a bit about the wonders of Python in general and the charms of geopy in particular. The downside is that writing a simple script that takes a number of strings from a Stata file, converts them into coordinates and gets them back into Stata took longer than I ever thought possible. Just now, I’ve learned about a possible shortcut (via the excellent data monkey blog): geocode is a user-written Stata command that takes a variable containing address strings and returns two new variables containing the latitude/longitude information. Now that would have been a bit of a time-saver. You can install geocode by typing
net from http://www.stata-journal.com/software/sj11-1
net install dm0053
There is, however, one potential drawback: Google limits the number of free queries per day (and possibly per minute). Via Python, you can easily stagger your requests, and you can also use an API key that is supposed to give you a bigger quota. Geocoding a large number of addresses from Stata in one go, on the other hand, will probably result in an equally large number of parsing errors.