Dec 082018
 

Research question

For a long time, people working in the field of European Radical Right Studies could not even agree on a common name for the thing that they were researching. Should it be the Extreme Right, the Radical Right, or what? Utterly unimpressed by this fact, I argue in a in-press contribution that this sorry state has not seriously hindered communication amongst authors. Do I have any evidence to back up this claim? Hell yeah! Fasten your seatbelts and watch me turning innocent publications into tortured data, or more specifically, a Radical Right network of co-citations. Or was it the Extreme Right?

How to turn citations into data

Short of training a hypercomplex and computationally expensive neural network (i.e. a grad student) to look at the actual content of the texts, analysing citation patterns is the most straightforward way to address the research question. Because I needed citation information, I harvested the Social Science Citation Index (SSCI) instead of my own bibliography. The Web of Science interface to the SSCI lets you save records as plain text files, which is all that was required. The key advantage of the SSCI data is that all the sources that each item cites are recorded, too, and can be exported with the title. This includes (most) items that are themselves not covered by the SSCI, opening up the wonderful world of monographs and chapters. To identify the two literatures, I simply ran queries for the phrases “Extreme Right” and “Radical Right” for the 1980-2017 period. I used the “TS” operator to search in titles, abstracts, and keywords. These queries returned 596 and 551 hits, respectively. Easy.

But how far separated are the two strands of the literature? To find out, I first looked at the overlap between the two. By overlap, I mean items that use both phrases. This applies to 132 pieces, or just under 12 per cent of the whole stash. This is not a state of zilch communication, yet by this criterion alone, it would seem that there are indeed two relatively distinct literatures. But what I’m really interested in are (co-)citation patterns How could I beat two long plain text lists of articles and the sources they cite into a usable data set?

When you are asking this kind of question, usually “there is an R package for that”™, unless the question is too silly. In my case, the magic bullet for turning information from the SSCI into crunchable data is the wonderful bibliometrix package. Bibliometrix reads saved records from Web of Science/SSCI (in bibtex format) and converts them into data frames. It also provides functions for extracting bibliometric information from the data. Before I move on to co-citations, here’s the gist of the code that reads the data and generates a handy list of the 10 most-cited titles:

library(bibliometrix)
  D <- readFiles("savedrecs-all.bib")
  M <- convert2df(D, dbsource = "isi", format = "bibtex")
# remove some obviously unrelated items
  M <- M[-c(65,94,96,97,104,105,159,177,199,457,459,497,578,579,684,685,719,723),]
  M <- M[-c(659,707),]
  M <- M[-c(622),]

  results <- biblioAnalysis(M, sep = ";")
  S=summary(object = results, k = 10, pause = FALSE)
  #Citations
  CR <- citations(M, field = "article", sep = ".  ")
  CR$Cited[1:10] 

So what are the most cited titles in Extreme/Radical Right studies?

The ten most cited sources in 726 SSCI items
SourceNumber of times cited
Mudde (2007)160
Kitschelt (1995)147
Betz (1994)123
Lubbers et al. (2002)97
Norris (2005)90
Golder (2003)86
R.W. Jackman & Volpert (1996)77
Carter (2005)66
Arzheimer & Carter (2006)65
Brug et al. (2005)65
Importantly, this top ten contains (in very prominent positions) a number of monographs. The SSCI itself only lists articles in (some) peer-reviewed journals. Without the citation data, we would have no idea which non-peer-reviewed-journal items are important. Having said that, the situation is still far from perfect: We only observe co-citation patterns through the lens of the 1,000+ odd SSCI publications. But that’s still better than nothing, right? What about the substantive results of this exercise? The table clearly shows the impact that Cas Mudde’s 2007 (“Populist Radical Right”) book had on the field. It is the most cited and at the same time the youngest item on the list, surpassing the much older monographs by Betz (“Radical Right Wing Populism”) and Kitschelt (“Radical Right”). Two other monographs by Carter (“Extreme Right”) and Norris (“Radical Right”) are also frequently cited but appreciably less popular than the books by Betz, Kitschelt, and Mudde. The five other items are journal articles with a primarily empirical outlook and mostly without conceptual ambitions. Taken together, this suggests that the “Extreme Right” label lacked a strong proponent whose conceptual work was widely accepted in the literature. Once someone presented a clear rationale for using the “Radical Right” label instead, many scholars were willing to jump ship.

Getting to the co-citation network: are the Extreme / Radical Right literatures separated from each other?

If this was indeed the case, the literature should display a low degree of separation between users of both labels. Looking for co-citation patterns is a straightforward operationalisation for (lack of) separation. A co-citation occurs when two publications are both cited by some later source. By definition, co-citations reflect a view on the older literature as it is expressed in a newer publication. When two titles from the “Extreme Right” and “Radical Right” literatures are co-cited, this small piece of evidence that the literature has not split into two isolated streams. The SSCI aims at recording every source that is cited, even if the source itself is not in the SSCI. This makes for a very large number of publications that could be candidates for co-citations (18,255), even if most of them are peripheral European Radical Right studies, and a whopping 743,032 actual co-citations.

To get a handle on this, I extracted the 20 publications with the biggest total number of co-citations and their interconnections. They represent something like the backbone of the literature. How did I reconstruct this network from textual data? Once more, R and its packages came to the rescue and helped me to produce a reasonably nice plot (after some additional cleaning up)

NetMatrix <- biblioNetwork(M, analysis="co-citation",network = "references", sep = ".  ")
# Careful: we are not interested in loops and not interested in separate connections between nodes. We convert the latter to weights 
g <- graph.adjacency(NetMatrix,mode="max",diag=FALSE)
# Extract the top 20 most co-cited items
f <- induced_subgraph(g,degree(g)>quantile(degree(g),probs=(1-20/ length(V(g)))))
# Now build a vector of relevant terms (requires knowledge of these titles)
# 1: extreme, 2: radical, 3:none/other
# Show all names
V(f)$name
term <- c(3,2,1,1,2,1,1,2,1,2,3,2,2,2,3,1,1,1,1,1)
mycolours <- brewer.pal(3, "Greys")
V(f)$term <- term
V(f)$color <- mycolours[term]

Co-citation analysis: results

So, what are the results? First, here is the top 20 of co-cited items in the field of Extreme/Radical Right studies:

The twenty most co-cited sources in 726 SSCI items
SourceCo-citations within top 20Total co-citations
Kitschelt (1995)7457700
Mudde (2007)7408864
Lubbers et al. (2002)6005212
Norris (2005)5685077
Golder (2003)5644687
Betz (1994)5426151
R.W. Jackman & Volpert (1996)4774497
Brug et al. (2005)4623523
Arzheimer & Carter (2006)4603551
Knigge (1998)4453487
Carter (2005)3893291
Arzheimer (2009)3763301
Ignazi (2003)3442876
Ivarsflaten (2008)3343221
Ignazi (1992)3313230
Rydgren (2007)3003353
Bale (2003)2973199
Brug et al. (2000)2762602
Meguid (2005)2462600
Bale et al. (2010)1342449

Many of these titles are familiar, because they also appear in the top ten of most cited titles and are classics to boot. And here is another nugget: for each title, a substantial share of about 10 per cent of all co-citations happen within this top twenty. This is exactly the (sub)network of co-citations I’m interested in. So here is the plot I promised:

Co-citations within top 20 titles in Extreme / Radical Right studies

Co-citations within top 20 titles in Extreme / Radical Right studies

But what does it all mean? Stay tuned for the next episode, or read the full article (author’s version, no paywall):

  • Arzheimer, Kai. “Conceptual Confusion is not Always a Bad Thing: The Curious Case of European Radical Right Studies.” Demokratie und Entscheidung. Eds. Marker, Karl, Michael Roseneck, Annette Schmitt, and Jürgen Sirsch. Wiesbaden: Springer, 2018. forthcoming.
    [BibTeX] [Download PDF] [HTML]
    @InCollection{arzheimer-2018,
    author = {Arzheimer, Kai},
    title = {Conceptual Confusion is not Always a Bad Thing: The Curious Case of
    European Radical Right Studies},
    booktitle = {Demokratie und Entscheidung},
    publisher = {Springer},
    address = {Wiesbaden},
    pages = {forthcoming},
    year = 2018,
    url =
    {https://www.kai-arzheimer.com/conceptual-confusion-european-radical-right-studies.pdf},
    html =
    {https://www.kai-arzheimer.com/conceptual-confusion-european-radical-right-studies},
    editor = {Marker, Karl and Roseneck, Michael and Schmitt, Annette and Sirsch,
    Jürgen},
    dateadded = {01-06-2018}
    }

Apr 032018
 
Feb 282018
 

I’m still collecting references for the next iteration of the Extreme Right Bibliography (but I am almost there. Honest to God. Really). Meanwhile, while I should have probably been doing other things, I’ve brushed up my fairly rudimentary R skills and taught myself how to write a similarly rudimentary twitterbot.

kzKD4VF__400x400_2018-02-27_22-19-58.jpg

If you are reading this, the chances that you are interested in the Radical/Extreme/Etc Right are high. If you also happen to be on twitter, you will want to follow the Radical Right Research Robot for all sorts of serendipitous insights, e.g. that reference to the article you always suspected exists but were to shy to ask about.

And if that does not appeal, it has a cutesy profile pic. So follow it (him? her?). Resistance is futile.

arzheimer-2009-wordcloud.png

Dec 172017
 
Oct 222016
 

Which publishers are the most relevant for Radical Right research? Good question.

Radical Right research by type of publication

Currently, most of the items in the The Eclectic, Erratic Bibliography on the Extreme Right in Western Europe (TM) are journal articles. The books/chapters/articles ratios have shifted somewhat over the years, reflecting both general trends in publishing and my changing reading habits, and by now the dominance of journal articles is rather striking.

Radical Right research by type of publication

The most important journals for Radical Right research (add pinch of salt as required)

One in three of this articles has been published in one of the four apparent top journals for Radical Right research: the European Journal of Political Research, West European Politics, Party Politics, and Acta Politica. I say ’apparent’ here, because this result may be a function of my (Western) Eurocentrism and my primary interest in Political Science and Sociology. Other Social Sciences are underrepresented, and literature from national journals that publish in other languages than English is virtually absent.

But hey: Laying all scruples aside, here is a table of the ten most important journals for Radical Right research:

JournalNo. of articles
European Journal of Political Research38
West European Politics35
Party Politics24
Acta Politica22
Electoral Studies15
Parliamentary Affairs13
Patterns of Prejudice12
Comparative European Politics10
Comparative Political Studies10
Government and Opposition9

Neat, isn’t it?

I did a similar analysis nearly two years ago. Government and Opposition as well as Comparative European Politics are new additions to the top ten (replacing Österreichische Zeitschrift für Politikwissenschaft and Osteuropa), but otherwise, the picture is much the same. So if you publish on the Radical Right and want your research to be noticed, you should probably aim for these journals.

Oct 202016
 

For the past 15 years or so, I have maintained an extensive collection of references on the Radical/Extreme/Populist/New/Whatever Right in Western Europe. Because I love TeX and other command line tools of destruction, these references live in a large BibTeX file. BibTeX is a well-documented format for bibliographic text files that has been around for decades and can be written and read by a large number of reference managers.

Because BibTeX is so venerable, it’s unsurprising that there is even an R package (RefManageR) that can read and write BibTeX files, effectively turning bibliographic data into a dataset that can be analysed, graphed and otherwise mangled to one’s heart’s desire. And so my totally unscientific analysis of the Radical Right literature (as reflected in my personal preferences and interests) is just three lines of code away:

library("RefManageR")
# read
ex <- ReadBib("/home/kai/Work/bibliography/xr-bibliography/extreme-right-western-europe-bibliography.bib")
tail(sort(table(unlist(ex$year))),5)
yearpublications
201434
201238
200042
200254
201557

So 2012, 2014 and 2015(!) saw a lot of publications that ended up on my list, but 2000 and particularly 2002 (the year Jean-Marie Le Pen made it into the second round of the French presidential election) were not bad either. 2013 and 2003 (not listed) were also relatively strong years, with 33 publications each.

To get a more complete overview, it’s best to plot the whole time series (ignoring some very old titles):

years.png

There is a distinct upwards trend all through the 1990s, a post-millenial decline in the mid-naughties (perhaps due to the fact that I completed a book manuscript then and became temporarily negligent in my collector’s duties, but I don’t think so), and then a new peak during the last five years, undoubtedly driven by recent political events and countless eager postdocs and PhD students. I’m just beginning to understand the structure of data objects that RefManageR creates from my bibliography, but I think it’s time for some league tables next.

Jul 262014
 

I’ve recently discovered Rfacebook, which lets you access public information on Facebook from R. In terms of convenience, no package for R or Python that I have seen so far comes near. Get yourself a long-lived token, store it as a variable, and put all posts on a fanpage you are interested in into one R object with a single function call. Check it out here.

Jan 262014
 

R Package Parallel: How Not to Solve a Problem That Does Not Exist

Somewhat foolishly, my university has granted me access to Mogon: not the god, not the death metal band but rather their supercomputer, which currently holds the 182th spot in the top 500 list of the fastest computers on the planet. It has some 34,000+ cores and more than 80 TB of RAM, but basically it’s just a very large bunch of Linux boxes. That means that I have a rough idea how to handle it, and that it happily runs my native Linux Stata and MPlus (and hopefully Jags) binaries for me. It also has R installed, and this is where my misery began.

I have a lengthy R job that deals with census data. Basically, it looks up the absolute number of minority residents in some 25,000 output areas and their immediate neighbours and calculates a series of percentages from these figures. I think this could in principle be done in Stata, but R provides convenient libraries for dealing with geo-coded data (sp and friends), non-rectangular data structures and all the trappings of a full-featured programming language, so it would be stupid not to make use of it. The only problem is that R is relatively slow and single-threaded, and that my script is what they call embarrassingly parallel: The same trivial function is applied to 33 vectors with 25,000 elements each. Each calculation on a vector takes about eight seconds to complete, which amounts to roughly five minutes in total. Add the time it takes to read in the data and some fairly large lookup-tables (it would be very time-consuming to repeatedly calculate which output area is close enough to each other output area to be considered a neighbour), and we are looking at eight to ten minutes for one run.

Mogon

Mogon. Image Credit: ZDV JGU Mainz

While I do not plan to run this script very often – once the calculations are done and saved, the results can be used in the analysis proper over and over again – I fully expect that I might change some operationalisations, include different variables etc., and so I began toying with the parallel package for R to make use of the many cores suddenly at my disposal.

Twelve hours later, I had learned the basics of the scheduling system (LSF), solved the problem of synching my data between home, office, central, and super-computer, gained some understanding of the way parallel works and otherwise achieved basically nothing: Even the best attempt at running a parallelised version of the script on the supercomputer was a little slower than the serialised version on my very capable office machine (and that is without the time (between 15 and 90 seconds) the scripts spends waiting to be transferred to a suitable node of the cluster). I tried different things: replacing lapply with mclapply, which was slower, regardless of the number of cores; using clusterApply instead of lapply (same result), and forking the 33 serial jobs into the background, which was even worse, presumably because storing the returned values resulted in changes to rather large data structures that were propagated to all cores involved.

Lessons Learned?

So yes, to save a few minutes in a script that I will presumably run not more than four or five times over the next couple of weeks, I spent 12 hours, with zilch results. But at least I learned a few things (apart from the obvious re-iteration of the old ‘never change a half-way running system’ mantra). First, even if it takes eight seconds to do the sums, a vector of 25,000 elements is probably to short to really benefit from shifting the calculations to more cores. While forking should be cheap, the overhead of setting up the additional threads dominates any savings. Second, running jobs in parallel without really understanding what overhead this creates is a stupid idea, and knowing what overhead this creates and how to avoid this is probably not worth the candle (see the above). Third, I can always re-use the infrastructure I’ve created (for more pointless experiments). Forth, my next go at Mogon shall avoid half-baked middle-level parallelisation altogether. Instead I shall combine fine-grained implicit parallelism (built into Stata and Mplus) and very coarse explicit parallelism (by breaking up lengthy scripts into small chunks that can be run independently). Further research is definitively needed.

Apr 262012
 

For our piece on distance effects in English elections we geocoded the addresses of hundreds of candidates. For the un-initiated: Geocoding is the fine art of converting addresses into geographical coordinates (longitude and latitude). Thanks to Google and some other providers like OpenStreeMap, this is now a relatively painless process. But when one needs more than a few addresses geocoded, one does not rely on pointing-and-clicking. One needs an API, i.e. a software library that makes the service accessible through R, Python or some other programming language.

The upside is that I learned a bit about the wonders of Python in general and the charms of geopy in particular. The downside is that writing a simple script that takes a number of strings from a Stata file, converts them into coordinates and gets them back into Stata took longer than I ever thought possible. Just now, I’ve learned about a possible shortcut (via the excellent data monkey blog): geocode is a user-written Stata command that takes a variable containing address strings and returns two new variables containing the latitude/longitude information. Now that would have been a bit of a time-saver. You can install geocode by typing

net from http://www.stata-journal.com/software/sj11-1
net install dm0053

There is, however, one potential drawback: Google limits the number of free queries per day (and possibly per minute). Via Python, you can easily stagger your requests, and you can also use an API key that is supposed to give you a bigger quota. Geocoding a large number of addresses from Stata in one go, on the other hand, will probably result in an equally large number of parsing errors.