The upside is that I learned a bit about the wonders of Python in general and the charms of geopy in particular. The downside is that writing a simple script that takes a number of strings from a Stata file, converts them into coordinates and gets them back into Stata took longer than I ever thought possible. Just now, I’ve learned about a possible shortcut (via the excellent data monkey blog): geocode is a user-written Stata command that takes a variable containing address strings and returns two new variables containing the latitude/longitude information. Now that would have been a bit of a time-saver. You can install geocode by typing
net from http://www.stata-journal.com/software/sj11-1
net install dm0053
There is, however, one potential drawback: Google limits the number of free queries per day (and possibly per minute). Via Python, you can easily stagger your requests, and you can also use an API key that is supposed to give you a bigger quota. Geocoding a large number of addresses from Stata in one go, on the other hand, will probably result in an equally large number of parsing errors.
@Jaime L Dettwiler Geocoding in Python is relatively straightforward with geopy: http://code.google.com/p/geopy/wiki/GettingStarted You can/should get your own API key (which should give you a bigger quota) from Google’s API console: https://code.google.com/apis/console
Hope that helps!
Thanks for the excelent information. Do you know the API key or the Python script to increase the number of querys per day?
Regards from Chile.
Jaime. L. Dettwiler