Oct 222016

Which publishers are the most relevant for Radical Right research? Good question.

Radical Right research by type of publication

Currently, most of the items in the The Eclectic, Erratic Bibliography on the Extreme Right in Western Europe (TM) are journal articles. The books/chapters/articles ratios have shifted somewhat over the years, reflecting both general trends in publishing and my changing reading habits, and by now the dominance of journal articles is rather striking.

Radical Right research by type of publication

The most important journals for Radical Right research (add pinch of salt as required)

One in three of this articles has been published in one of the four apparent top journals for Radical Right research: the European Journal of Political Research, West European Politics, Party Politics, and Acta Politica. I say ’apparent’ here, because this result may be a function of my (Western) Eurocentrism and my primary interest in Political Science and Sociology. Other Social Sciences are underrepresented, and literature from national journals that publish in other languages than English is virtually absent.

But hey: Laying all scruples aside, here is a table of the ten most important journals for Radical Right research:

Journal No. of articles
European Journal of Political Research 38
West European Politics 35
Party Politics 24
Acta Politica 22
Electoral Studies 15
Parliamentary Affairs 13
Patterns of Prejudice 12
Comparative European Politics 10
Comparative Political Studies 10
Government and Opposition 9

Neat, isn’t it?

I did a similar analysis nearly two years ago. Government and Opposition as well as Comparative European Politics are new additions to the top ten (replacing Österreichische Zeitschrift für Politikwissenschaft and Osteuropa), but otherwise, the picture is much the same. So if you publish on the Radical Right and want your research to be noticed, you should probably aim for these journals.

Oct 202016

For the past 15 years or so, I have maintained an extensive collection of references on the Radical/Extreme/Populist/New/Whatever Right in Western Europe. Because I love TeX and other command line tools of destruction, these references live in a large BibTeX file. BibTeX is a well-documented format for bibliographic text files that has been around for decades and can be written and read by a large number of reference managers.

Because BibTeX is so venerable, it’s unsurprising that there is even an R package (RefManageR) that can read and write BibTeX files, effectively turning bibliographic data into a dataset that can be analysed, graphed and otherwise mangled to one’s heart’s desire. And so my totally unscientific analysis of the Radical Right literature (as reflected in my personal preferences and interests) is just three lines of code away:

# read
ex <- ReadBib("/home/kai/Work/bibliography/xr-bibliography/extreme-right-western-europe-bibliography.bib")
year publications
2014 34
2012 38
2000 42
2002 54
2015 57

So 2012, 2014 and 2015(!) saw a lot of publications that ended up on my list, but 2000 and particularly 2002 (the year Jean-Marie Le Pen made it into the second round of the French presidential election) were not bad either. 2013 and 2003 (not listed) were also relatively strong years, with 33 publications each.

To get a more complete overview, it’s best to plot the whole time series (ignoring some very old titles):


There is a distinct upwards trend all through the 1990s, a post-millenial decline in the mid-naughties (perhaps due to the fact that I completed a book manuscript then and became temporarily negligent in my collector’s duties, but I don’t think so), and then a new peak during the last five years, undoubtedly driven by recent political events and countless eager postdocs and PhD students. I’m just beginning to understand the structure of data objects that RefManageR creates from my bibliography, but I think it’s time for some league tables next.

Jul 262014

I’ve recently discovered Rfacebook, which lets you access public information on Facebook from R. In terms of convenience, no package for R or Python that I have seen so far comes near. Get yourself a long-lived token, store it as a variable, and put all posts on a fanpage you are interested in into one R object with a single function call. Check it out here.

Jan 262014

R Package Parallel: How Not to Solve a Problem That Does Not Exist

Somewhat foolishly, my university has granted me access to Mogon: not the god, not the death metal band but rather their supercomputer, which currently holds the 182th spot in the top 500 list of the fastest computers on the planet. It has some 34,000+ cores and more than 80 TB of RAM, but basically it’s just a very large bunch of Linux boxes. That means that I have a rough idea how to handle it, and that it happily runs my native Linux Stata and MPlus (and hopefully Jags) binaries for me. It also has R installed, and this is where my misery began.

I have a lengthy R job that deals with census data. Basically, it looks up the absolute number of minority residents in some 25,000 output areas and their immediate neighbours and calculates a series of percentages from these figures. I think this could in principle be done in Stata, but R provides convenient libraries for dealing with geo-coded data (sp and friends), non-rectangular data structures and all the trappings of a full-featured programming language, so it would be stupid not to make use of it. The only problem is that R is relatively slow and single-threaded, and that my script is what they call embarrassingly parallel: The same trivial function is applied to 33 vectors with 25,000 elements each. Each calculation on a vector takes about eight seconds to complete, which amounts to roughly five minutes in total. Add the time it takes to read in the data and some fairly large lookup-tables (it would be very time-consuming to repeatedly calculate which output area is close enough to each other output area to be considered a neighbour), and we are looking at eight to ten minutes for one run.


Mogon. Image Credit: ZDV JGU Mainz

While I do not plan to run this script very often – once the calculations are done and saved, the results can be used in the analysis proper over and over again – I fully expect that I might change some operationalisations, include different variables etc., and so I began toying with the parallel package for R to make use of the many cores suddenly at my disposal.

Twelve hours later, I had learned the basics of the scheduling system (LSF), solved the problem of synching my data between home, office, central, and super-computer, gained some understanding of the way parallel works and otherwise achieved basically nothing: Even the best attempt at running a parallelised version of the script on the supercomputer was a little slower than the serialised version on my very capable office machine (and that is without the time (between 15 and 90 seconds) the scripts spends waiting to be transferred to a suitable node of the cluster). I tried different things: replacing lapply with mclapply, which was slower, regardless of the number of cores; using clusterApply instead of lapply (same result), and forking the 33 serial jobs into the background, which was even worse, presumably because storing the returned values resulted in changes to rather large data structures that were propagated to all cores involved.

Lessons Learned?

So yes, to save a few minutes in a script that I will presumably run not more than four or five times over the next couple of weeks, I spent 12 hours, with zilch results. But at least I learned a few things (apart from the obvious re-iteration of the old ‘never change a half-way running system’ mantra). First, even if it takes eight seconds to do the sums, a vector of 25,000 elements is probably to short to really benefit from shifting the calculations to more cores. While forking should be cheap, the overhead of setting up the additional threads dominates any savings. Second, running jobs in parallel without really understanding what overhead this creates is a stupid idea, and knowing what overhead this creates and how to avoid this is probably not worth the candle (see the above). Third, I can always re-use the infrastructure I’ve created (for more pointless experiments). Forth, my next go at Mogon shall avoid half-baked middle-level parallelisation altogether. Instead I shall combine fine-grained implicit parallelism (built into Stata and Mplus) and very coarse explicit parallelism (by breaking up lengthy scripts into small chunks that can be run independently). Further research is definitively needed.

Apr 262012

For our piece on distance effects in English elections we geocoded the addresses of hundreds of candidates. For the un-initiated: Geocoding is the fine art of converting addresses into geographical coordinates (longitude and latitude). Thanks to Google and some other providers like OpenStreeMap, this is now a relatively painless process. But when one needs more than a few addresses geocoded, one does not rely on pointing-and-clicking. One needs an API, i.e. a software library that makes the service accessible through R, Python or some other programming language.

The upside is that I learned a bit about the wonders of Python in general and the charms of geopy in particular. The downside is that writing a simple script that takes a number of strings from a Stata file, converts them into coordinates and gets them back into Stata took longer than I ever thought possible. Just now, I’ve learned about a possible shortcut (via the excellent data monkey blog): geocode is a user-written Stata command that takes a variable containing address strings and returns two new variables containing the latitude/longitude information. Now that would have been a bit of a time-saver. You can install geocode by typing

net from http://www.stata-journal.com/software/sj11-1
net install dm0053

There is, however, one potential drawback: Google limits the number of free queries per day (and possibly per minute). Via Python, you can easily stagger your requests, and you can also use an API key that is supposed to give you a bigger quota. Geocoding a large number of addresses from Stata in one go, on the other hand, will probably result in an equally large number of parsing errors.

Apr 212012

I’m more and more intrigued by the potential spatial data hold for political science. Once you begin to think about it, concepts like proximity and clustering are basic building blocks  for explaining social phenomena. Even better, since the idea of open data has gone mainstream, more and more spatially referenced information becomes available, and when it comes to free, open source software, we are spoilt for choice or, at least in my case, up and beyond the point of utter confusion.

For our paper on the effect of spatial distance between candidates and their prospective voters,  we needed  a choropleth map of English Westminster constituencies that shows how many of the mainstream candidates live within the constituency’s boundaries. Basically, we had three options (not counting the rather few user-contributed packages for Stata): GRASS, a motley collection of Python packages, and a host of libraries for R.

GRASS is a full-blown open source GIS, whose user interface is perfect for keyboard aficionados and brings back happy memories of the 1980s. While GRASS can do amazing things with raster and vector maps, it is suboptimal for dealing with rectangular data. In the end, we used only its underrated cartographic ps.map module, which reliably creates high-resolution postscript maps.

Python has huge potential for social scientists, both in its own right and as a kind of glue that binds various programs together. In principle, a lot of GIS-related tasks could be done with Python alone. We used the very useful geopy toolboxfor converting UK postcodes to LatLong co-ordinates, with a few lines of code and a little help from Google.

Candidate locations by constituency

The real treasure trove, however, is R. The quality of packages for spatial analysis is amazing, and their scope is a little overwhelming. Applied Spatial Data Analysis with R by Roger Bivand, who wrote much of the relevant code, provides much-needed guidance.

Counting the number of mainstream candidates living in a constituency is a point-in-polygon problem: each candidate is a co-ordinate enclosed by a constituency boundary. Function overlay from package sp carries out the relevant operation. Once I had it located, I was seriously tempted to loop over constituencies and candidates. Just in time, I remembered the R mantra of vectorisation. Provided that points (candidates) and polygons (constituencies) have been transformed to the same projection, all that is needed is this:

[email protected]$homeconst1 <-overlay(candpos1,mymap)
[email protected]$homeconst2 <-overlay(candpos2,mymap)
[email protected]$homeconst3 <-overlay(candpos3,mymap)

This works because candpos1 is a vector of points that represent the spatial positions of all Labour candidates. These are tested against all constituency boundaries. The result is another vector of indices, i.e. sequence numbers of the constituencies the candidates are living in. Put differently, overlay takes a list of points and a bunch of polygons and returns a list that maps the former to the latter. With a bit of boolean logic, a vector of zeros (candidate outside constituency) and ones (candidate living in their constituency) ensues. Summing up the respective vectors for Labour, Tories, and LibDems then gives the required count that can be mapped. Result!

Apr 092011

Sometimes, a man’s gotta do what a man’s gotta do. Which, in my case, might be a little simulation of a random process involving an unordered categorical variable. In R, sampling from a multinomial distribution is trivial.


gives me a vector of random numbers from a multinomial distribution with outcomes 1, 2, 3, and 4, where the probability of observing a ‘1’ is 10 percent, the probability of observing a ‘2’ is 70 per cent, and so on. But I could not find an equivalent function in Stata. Generating artificial data in R is not very elegant, so I kept digging and found a solution in section M-5 of the Mata handbook. Hidden in the entry on runiform is a reference to rdiscrete(r,c,p), a Mata function which generates a r*c matrix of draws from a multinomial distribution defined by a vector p of probabilities.

That leaves but one question: Is wrapping a handful of lines around a Mata call to replace a non-existent Stata function more elegant than calling an external program?

Jan 102010

Statistics and Data links roundup for November 23rd through December 29th:

  • The Data and Story Library – DASL (pronounced “dazzle”) is an online library of datafiles and stories that illustrate the use of basic statistics methods. We hope to provide data from a wide variety of topics so that statistics teachers can find real-world examples that will be interesting to their students. Use DASL’s powerful search engine to locate the story or datafile of interest.
  • Drawing graphs using tikz/pgf & gnuplot | politicaldata.org
Nov 232009

Statistics and Data links roundup for November 14th through November 23rd:

It’s surprisingly difficult to find suitable datasets for a sna workshop that are relevant for political scientists.

Nov 142009
Reblog this post [with Zemanta]