Mar 232013

Like many fellow information junkies, I was shocked that Google is killing Reader, then realised that I had not used it very regularly  lately. This is partly because I rely a bit more on twitter these days, partly because last year I began using feedly, a Reader frontend that looks good on my various devices and pushes the more popular stories to the top. While feedly quickly announced that they would replace Reader by a backend of their own, Google’s move prompted me to reconsider my reading habits. Although feedly is great for the lazy Sunday afternoon reading experience (flipboard is even more tempting/worse in this respect), I sometimes want to check those feeds that publish new items every other week, or to make sure that I really see the headline for every post published on the Monkey Cage in the last couple of days or so. For that, one needs a proper feed reader.

Feedreaders as a species were mostly extinct by Reader’s advent in 2005. Synchronisation in the cloud is a killer feature when you are reading on more than a single device, but Reader was also fast and never missed an item (because it pulls feeds centrally and continually), and had early social sharing features and content suggestions. That’s why most people are now looking for cloud-based replacements like the Old Reader.

At the margins of the nerdosphere, however, people kept muttering about Tiny Tiny RSS (ttrss), which is an odd-beast: an open source online RSS reader. Since spring is not coming this year, I decided to give it a spin. Somewhat surprisingly, I like it rather well.

ttrss requires that you have access to a webserver with PHP and a SQL demon. I installed it on my office machine, which is almost constantly online, but these days, you can probably build your own cloud  on a Raspberry Pi that you attach to your router. Installation was mostly painless. Once installed, ttrss looks not too different from Reader and almost feels like a desktop application (with lots of keyboard shortcuts – good!). It is somewhat sparsely documented, but importing my feeds was no big deal, and as a proof of concept, I quickly managed to set up a filter that collects all items mentioning Cyprus. Like Reader, it monitors my feeds continuously, so if I really want to catch up with a feed that I have ignored for a month or two, I have the complete backlog. There are more goodies under the hood, I suppose. Phones and tablets are supported out of the box, but there is an even better webapp available here that needs to be installed separately.

Just for the fun of it, I then installed newsbeuter, which claims to be the Mutt of RSS readers. If that statement does not make sense to you, newsbeuter is probably not right for you. It is the antithesis to those apps that turn feeds into a pages that could be taken from glossy magazine. It lives in the text-based world of the console, is keyboard driven and blindingly fast. It has macros and its own language for filters. And its newer versions happily sync with ttRSS, so that I can casually read on the phone or the tablet, and slash through hundreds of feeds on the laptop or the office machine.

I’m not sure if I will stick with this slightly geeky setup. I like the way feedly and flipboard integrate social signals, but the echo chamber effect is quite pronounced. In ye olden days, I thought nothing of running my own mail server. Perhaps it’s time to get off that pampered cloud.


Apr 262012

For our piece on distance effects in English elections we geocoded the addresses of hundreds of candidates. For the un-initiated: Geocoding is the fine art of converting addresses into geographical coordinates (longitude and latitude). Thanks to Google and some other providers like OpenStreeMap, this is now a relatively painless process. But when one needs more than a few addresses geocoded, one does not rely on pointing-and-clicking. One needs an API, i.e. a software library that makes the service accessible through R, Python or some other programming language.

The upside is that I learned a bit about the wonders of Python in general and the charms of geopy in particular. The downside is that writing a simple script that takes a number of strings from a Stata file, converts them into coordinates and gets them back into Stata took longer than I ever thought possible. Just now, I’ve learned about a possible shortcut (via the excellent data monkey blog): geocode is a user-written Stata command that takes a variable containing address strings and returns two new variables containing the latitude/longitude information. Now that would have been a bit of a time-saver. You can install geocode by typing

net from
net install dm0053

There is, however, one potential drawback: Google limits the number of free queries per day (and possibly per minute). Via Python, you can easily stagger your requests, and you can also use an API key that is supposed to give you a bigger quota. Geocoding a large number of addresses from Stata in one go, on the other hand, will probably result in an equally large number of parsing errors.

Sep 162008
[Slightly off topic] Having your own domain is obviously attractive, but when I moved to the UK two years ago, I left my main site with all my presentations, pre-prints and other goodies in a subdirectory of my old institution’s website where it had resided since about 1999. They have a decent server with loads of space that is regularly backupped, and they don’t charge me a penny. But more importantly, over the years I have accumulated a whopping 160 MB worth of files (about 6000 of them), and people (and Google) know where to find my stuff. In the past, I have moved single pages of special interest groups from one domain to another with  javascript redirects but had no clue who this would translate to a huge and fairly overgrown structure of PDFs, powerpoints etc. And so I simply left everything as it was (i.e. working).

However, during the summer break I had a little spare time and decided that it was time to move my stuff to a domain of my own. This is what I did:

  • I registered my own domain and rented 250 MB of webspace from a small but very keen provider for less than 18 Euros per year. Crucially, they give me ssh access to the server and a handy set of tools (bash, textutils, emacs, perl, python and even gcc)
  • I carefully read the advice on moving to a new domain that Google gives on its webmaster blog. I registered both the old and the new site with them and installed their tool for generating sitemaps.
  • I copied everything to the new site without making any changes.
  • I brushed up my knowledge on generating 301 redirects. A “301” means that what ever content was available at a given URL has moved permanently to another URL. Most browsers take you to this new address in the blink of an eye without you ever realising that the URL has changed. And Google will eventually update its index and will interpret any links pointing to the old URL as pointing to the new one. At least this is what they promise.
  • I found out that I was extremely lucky because my old institution runs Apache with the Mod-Rewrite module enabled and gives ordinary users access to this machine via .htaccess files. This is obviously Techno-Babble but the upshoot is this: I put a file named .htaccess in the top-level directory of my old site ( and changed its content to
    Options +FollowSymLinks
    RewriteEngine on
    RewriteRule (.*)$1 [R=301,L]

    This instructs the server at Mainz to do a  search&replace operation on URLs that refer to my old site and rewrite them into redirects to my new site. This works for PDFs, powerpoints, single pages, pictures, anything. That also means that external links to duly forgotten working papers on other people’s sites which have (just like the working papers) not been updated since 1999 still work. The object does not even have to exist: if you ask for you will be served a 404-page from my new site. How neat is that?

  • Finally, I found a perl-oneliner that would correct the absolute references to the old site that might or might not be buried deep in the HTML code of ancient pages: perl -pi.bak -e 's!!!ig' *.htm* There is probably a more clever way to do this, but I applied the same changes in the lower-level directories by changing the last few characters to */*.htm*, */*/*.htm* and so on. Rather amazingly, the same trick worked for PDF files: by applying the patch to *.pdf and so on, I could change URLs in files that had been generated by Office 97.

On the next day, results from the new site began very slowly to replace the pages from the old site. For a couple of days, pages from the new site would disappear and re-appear, but this doesn’t really matter because thanks to the redirect, people find you either way. Three weeks on, the transition seems to be mostly complete. So far, it has been a surprisingly painless experience.