Sep 162008
 
[Slightly off topic] Having your own domain is obviously attractive, but when I moved to the UK two years ago, I left my main site with all my presentations, pre-prints and other goodies in a subdirectory of my old institution’s website where it had resided since about 1999. They have a decent server with loads of space that is regularly backupped, and they don’t charge me a penny. But more importantly, over the years I have accumulated a whopping 160 MB worth of files (about 6000 of them), and people (and Google) know where to find my stuff. In the past, I have moved single pages of special interest groups from one domain to another with  javascript redirects but had no clue who this would translate to a huge and fairly overgrown structure of PDFs, powerpoints etc. And so I simply left everything as it was (i.e. working).

However, during the summer break I had a little spare time and decided that it was time to move my stuff to a domain of my own. This is what I did:

  • I registered my own domain kai-arzheimer.com and rented 250 MB of webspace from a small but very keen provider for less than 18 Euros per year. Crucially, they give me ssh access to the server and a handy set of tools (bash, textutils, emacs, perl, python and even gcc)
  • I carefully read the advice on moving to a new domain that Google gives on its webmaster blog. I registered both the old and the new site with them and installed their tool for generating sitemaps.
  • I copied everything to the new site without making any changes.
  • I brushed up my knowledge on generating 301 redirects. A “301” means that what ever content was available at a given URL has moved permanently to another URL. Most browsers take you to this new address in the blink of an eye without you ever realising that the URL has changed. And Google will eventually update its index and will interpret any links pointing to the old URL as pointing to the new one. At least this is what they promise.
  • I found out that I was extremely lucky because my old institution runs Apache with the Mod-Rewrite module enabled and gives ordinary users access to this machine via .htaccess files. This is obviously Techno-Babble but the upshoot is this: I put a file named .htaccess in the top-level directory of my old site (www.politik.uni-mainz.de/kai.arzheimer/) and changed its content to
    Options +FollowSymLinks
    RewriteEngine on
    RewriteRule (.*) http://www.kai-arzheimer.com/$1 [R=301,L]

    This instructs the server at Mainz to do a  search&replace operation on URLs that refer to my old site and rewrite them into redirects to my new site. This works for PDFs, powerpoints, single pages, pictures, anything. That also means that external links to duly forgotten working papers on other people’s sites which have (just like the working papers) not been updated since 1999 still work. The object does not even have to exist: if you ask for http://www.politik.uni-mainz.de/kai.arzheimer/meaning-of-life.html you will be served a 404-page from my new site. How neat is that?

  • Finally, I found a perl-oneliner that would correct the absolute references to the old site that might or might not be buried deep in the HTML code of ancient pages: perl -pi.bak -e 's!www.politik.uni-mainz.de/kai.arzheimer!www.kai-arzheimer.com!ig' *.htm* There is probably a more clever way to do this, but I applied the same changes in the lower-level directories by changing the last few characters to */*.htm*, */*/*.htm* and so on. Rather amazingly, the same trick worked for PDF files: by applying the patch to *.pdf and so on, I could change URLs in files that had been generated by Office 97.

On the next day, results from the new site began very slowly to replace the pages from the old site. For a couple of days, pages from the new site would disappear and re-appear, but this doesn’t really matter because thanks to the redirect, people find you either way. Three weeks on, the transition seems to be mostly complete. So far, it has been a surprisingly painless experience.

Click to share

Agree? Disagree? Leave a reply (also works with Facebook, G+, Disqus ...)