Mar 022019
 

Every remotely relevant reference I came across during the last 15 years or so resides in a single bibtex file. That is not a problem. The problem is that I’m moving into a shiny, new but somewhat smaller office, together with hundreds of copies of journal articles and hundreds of PDFs. Wouldn’t it be good to know which physical copies are effectively redundant (unreadable comments in the margins aside) and can therefore stay behind?

The trouble is that bibtex files have a rather flexible, human readable format. Each entry begins with the @ sign, followed by a type (book, article etc.), a reference name,  lots of key/value pairs (fields) in arbitrary order,  and even more curly braces.

grep @ full.bib|wc -l tells me that I have 2914 references in total. grep binder|wc -l (binder is a custom field that I use to keep track of the location of my copies) shows that I have printed out/copied 712 texts over the years, and grep file|wc -l indicates that there are 504 PDFs residing on my filesystem. But what is the magnitude of the intersection?

My first inclination was to look for a suitable Python parser/library. Pybtex looked good in principle but is underdocumented and had trouble reading full.bib, because that is encoded in Latin 1. So it was endless hours of amateurish coding and procrastination ahead. Then I remembered the “do one thing, and do it really well” mantra of old. Enter bibtool, which is a fast and reasonably stable bibtex file filter and pretty printer. Bibtool reads “resource files”, which are really just short scripts containing filtering/formatting directives. select = {binder ".+"} keeps those references whose “binder” field contains at least one character (.+ is a regular expression that matches any non-empty string). select = {file ".+"} selects all references for which I have a PDF. But bibtool applies a logical OR to these conditions while I’m interested in finding those references that meet both criteria.

The quick solution is to store each statement in a file of its own and apply bibtool twice, using a pipeline for extra efficiency: bibtool -r find-binder.rsc full.bib|bibtool -r find-pdf >intersection.bib does the trick and solves my problem in under a minute, without any coding.

As it turns out, there were just 65 references in both groups. Apparently, I stopped printing (or at least filing away) some time ago. Eventually, I binned two copies, but it is the principle that matters.

2019 Update

I still use bibtool for quick filtering/reformatting tasks at the command line, but for more complex jobs involving programmatic access to bibtex files from R, RefManageR is a wonderful package.  I have used it here in a bibliometric study of the Radical/Extreme Right literature. And my nifty RRResRobot also relies heavily on RefManageR. If you are interested at all in RefManageR, here is a short and sweet introduction.

Dec 132018
 

Reprise: The co-citation network in European Radical Right studies

In the last post, I tried to reconstruct the co-citation network in European Radical Right studies and ended up with this neat graph.

Co-citations within top 20 titles in Extreme / Radical Right studies

Co-citations within top 20 titles in Extreme / Radical Right studies

The titles are arranged in groups, with the “Extreme Right” camp on the right, the “Radical Right” group in the lower-left corner, and a small number of publications that is committed to neither in the upper-left corner. The width of the lines represents the number of co-citations connecting the titles.

What does the pattern look like? The articles by Knigge (1998) and Bale et al. (2010) are both in the “nothing in particular” group, but are never cited together, at least not in the data that I extracted. One potential reason is that they are twelve years apart and address quite different research questions.

Want to watch a video of this blog?

The Extreme / Radical Right network of co-citations

Watch this video on YouTube.

Apart from this gap, the network is complete, i.e. everyone is cited with everyone else in the top 20. This is already rather compelling against the idea of a split into incompatible two incompatible strands. Intriguingly, there are even some strong ties that bridge alleged intellectual cleavages, e.g. between Kitschelt’s monograph and the article by Golder, or between Lubbers, Gijsberts and Scheepers on the one hand and Norris and Kitschelt on the other.

While the use of identical terminology seems to play a minor role, the picture also suggests that co-citations are chiefly driven by the general prominence of the titles involved. However, network graphs can be notoriously misleading.

Modelling the number of co-citations in European Radical Right studies

Modelling the number of co-citations provides a more formal test for this intuition. There are \frac{20\times 19}{2}=190 counts of co-citations amongst the top 20 titles, ranging from 0 to 5476, with a mean count of 695 and a variance of 651,143. Because the variance is so much bigger than the mean, a regression model that assumes a negative binomial distribution, which can accommodate such overdispersion, is more adequate than one built around a Poison distribution. “General prominence” is operationalised as the sum of external co-citations of the two titles involved. Here are the results.

VariableCoefficientS.E.p
external co-citations0.0004.00002<0.05
same terminology0.4240.120<0.05
Constant2.8520.219<0.05

 

The findings show that controlling for general prominence (operationalised as the sum of co-citations outside the top 20), using the same terminology (coded as “extreme” / “radical” / “unspecific or other” does have a positive effect on the expected number of co-citations. But what do the numbers mean?

The model is additive in the logs. To recover the counts (and transform the model into its multiplicative form), one needs to exponentiate the coefficients. Accordingly, the effect of using the same terminology translates into a factor of exp(0.424) = 1.53.

What do these numbers mean?

But how relevant is this in practical terms? Because the model is non-linear, it’s best to plot the expected counts for equal/unequal terminology, together with their areas of confidence, against a plausible range of external co-citations.

Effect of external co-citations and use of terminology on predicted number of co-citations within top 20

Effect of external co-citations and use of terminology on predicted number of co-citations within top 20

As it turns out, terminology has only a small effect on the expected number of co-citations for works that have between 6,000 and 8,000 external co-citations. From this point on, the expected number of co-citations grows somewhat more quickly for dyads that share the same terminology. However, over the whole range of 6,000 to 12,000 external co-citations, the confidence intervals overlap and so this difference is not statistically significant.

Unless two titles have a very high number of external co-citations, the probability of them being both cited in a third work does not depend on the terminology they use. Even for the (few) heavily cited works, the evidence is insufficient to reject the null hypothesis that terminology makes no difference.

While the analysis is confined to the relationships between just 20 titles, these titles matter most, because they form the core of ERRS. If we cannot find separation here, that does not necessarily mean that it does not happen elsewhere, but if happens elsewhere, that is much less relevant. So: no two schools. Everyone is citing the same prominent stuff, whether the respective authors prefer “Radical” or “Extreme”. Communication happens, which seems good to me.

Are you surprised?

Got to the first part of this mini series, or read the full article on concepts in European Radical Right research here:

  • Arzheimer, Kai. “Conceptual Confusion is not Always a Bad Thing: The Curious Case of European Radical Right Studies.” Demokratie und Entscheidung. Eds. Marker, Karl, Michael Roseneck, Annette Schmitt, and Jürgen Sirsch. Wiesbaden: Springer, 2018. 23-40. doi:10.1007/978-3-658-24529-0_3
    [BibTeX] [Download PDF] [HTML]
    @InCollection{arzheimer-2018,
    author = {Arzheimer, Kai},
    title = {Conceptual Confusion is not Always a Bad Thing: The Curious Case of
    European Radical Right Studies},
    booktitle = {Demokratie und Entscheidung},
    publisher = {Springer},
    address = {Wiesbaden},
    pages = {forthcoming},
    year = 2018,
    url =
    {https://www.kai-arzheimer.com/conceptual-confusion-european-radical-right-studies.pdf},
    doi = {10.1007/978-3-658-24529-0_3},
    pages = {23-40},
    html =
    {https://www.kai-arzheimer.com/conceptual-confusion-european-radical-right-studies},
    editor = {Marker, Karl and Roseneck, Michael and Schmitt, Annette and Sirsch,
    Jürgen},
    dateadded = {01-06-2018}
    }

Dec 082018
 

Research question

For a long time, people working in the field of European Radical Right Studies could not even agree on a common name for the thing that they were researching. Should it be the Extreme Right, the Radical Right, or what? Utterly unimpressed by this fact, I argue in a in-press contribution that this sorry state has not seriously hindered communication amongst authors. Do I have any evidence to back up this claim? Hell yeah! Fasten your seatbelts and watch me turning innocent publications into tortured data, or more specifically, a Radical Right network of co-citations. Or was it the Extreme Right?

Want to watch a video of this blog?

The Extreme / Radical Right network of co-citations

Watch this video on YouTube.

How to turn citations into data

Short of training a hypercomplex and computationally expensive neural network (i.e. a grad student) to look at the actual content of the texts, analysing citation patterns is the most straightforward way to address the research question. Because I needed citation information, I harvested the Social Science Citation Index (SSCI) instead of my own bibliography. The Web of Science interface to the SSCI lets you save records as plain text files, which is all that was required. The key advantage of the SSCI data is that all the sources that each item cites are recorded, too, and can be exported with the title. This includes (most) items that are themselves not covered by the SSCI, opening up the wonderful world of monographs and chapters. To identify the two literatures, I simply ran queries for the phrases “Extreme Right” and “Radical Right” for the 1980-2017 period. I used the “TS” operator to search in titles, abstracts, and keywords. These queries returned 596 and 551 hits, respectively. Easy.

This is the second in a series of three posts. Click here for the first part of this mini series

But how far separated are the two strands of the literature? To find out, I first looked at the overlap between the two. By overlap, I mean items that use both phrases. This applies to 132 pieces, or just under 12 per cent of the whole stash. This is not a state of zilch communication, yet by this criterion alone, it would seem that there are indeed two relatively distinct literatures. But what I’m really interested in are (co-)citation patterns How could I beat two long plain text lists of articles and the sources they cite into a usable data set?

When you are asking this kind of question, usually “there is an R package for that”™, unless the question is too silly. In my case, the magic bullet for turning information from the SSCI into crunchable data is the wonderful bibliometrix package. Bibliometrix reads saved records from Web of Science/SSCI (in bibtex format) and converts them into data frames. It also provides functions for extracting bibliometric information from the data. Before I move on to co-citations, here’s the gist of the code that reads the data and generates a handy list of the 10 most-cited titles:

library(bibliometrix)
  D <- readFiles("savedrecs-all.bib")
  M <- convert2df(D, dbsource = "isi", format = "bibtex")
# remove some obviously unrelated items
  M <- M[-c(65,94,96,97,104,105,159,177,199,457,459,497,578,579,684,685,719,723),]
  M <- M[-c(659,707),]
  M <- M[-c(622),]

  results <- biblioAnalysis(M, sep = ";")
  S=summary(object = results, k = 10, pause = FALSE)
  #Citations
  CR <- citations(M, field = "article", sep = ".  ")
  CR$Cited[1:10] 

So what are the most cited titles in Extreme/Radical Right studies?

The ten most cited sources in 726 SSCI items
SourceNumber of times cited
Mudde (2007)160
Kitschelt (1995)147
Betz (1994)123
Lubbers et al. (2002)97
Norris (2005)90
Golder (2003)86
R.W. Jackman & Volpert (1996)77
Carter (2005)66
Arzheimer & Carter (2006)65
Brug et al. (2005)65
Importantly, this top ten contains (in very prominent positions) a number of monographs. The SSCI itself only lists articles in (some) peer-reviewed journals. Without the citation data, we would have no idea which non-peer-reviewed-journal items are important. Having said that, the situation is still far from perfect: We only observe co-citation patterns through the lens of the 1,000+ odd SSCI publications. But that’s still better than nothing, right? What about the substantive results of this exercise? The table clearly shows the impact that Cas Mudde’s 2007 (“Populist Radical Right”) book had on the field. It is the most cited and at the same time the youngest item on the list, surpassing the much older monographs by Betz (“Radical Right Wing Populism”) and Kitschelt (“Radical Right”). Two other monographs by Carter (“Extreme Right”) and Norris (“Radical Right”) are also frequently cited but appreciably less popular than the books by Betz, Kitschelt, and Mudde. The five other items are journal articles with a primarily empirical outlook and mostly without conceptual ambitions. Taken together, this suggests that the “Extreme Right” label lacked a strong proponent whose conceptual work was widely accepted in the literature. Once someone presented a clear rationale for using the “Radical Right” label instead, many scholars were willing to jump ship.

Getting to the co-citation network: are the Extreme / Radical Right literatures separated from each other?

If this was indeed the case, the literature should display a low degree of separation between users of both labels. Looking for co-citation patterns is a straightforward operationalisation for (lack of) separation. A co-citation occurs when two publications are both cited by some later source. By definition, co-citations reflect a view on the older literature as it is expressed in a newer publication. When two titles from the “Extreme Right” and “Radical Right” literatures are co-cited, this small piece of evidence that the literature has not split into two isolated streams. The SSCI aims at recording every source that is cited, even if the source itself is not in the SSCI. This makes for a very large number of publications that could be candidates for co-citations (18,255), even if most of them are peripheral European Radical Right studies, and a whopping 743,032 actual co-citations.

To get a handle on this, I extracted the 20 publications with the biggest total number of co-citations and their interconnections. They represent something like the backbone of the literature. How did I reconstruct this network from textual data? Once more, R and its packages came to the rescue and helped me to produce a reasonably nice plot (after some additional cleaning up)

NetMatrix <- biblioNetwork(M, analysis="co-citation",network = "references", sep = ".  ")
# Careful: we are not interested in loops and not interested in separate connections between nodes. We convert the latter to weights 
g <- graph.adjacency(NetMatrix,mode="max",diag=FALSE)
# Extract the top 20 most co-cited items
f <- induced_subgraph(g,degree(g)>quantile(degree(g),probs=(1-20/ length(V(g)))))
# Now build a vector of relevant terms (requires knowledge of these titles)
# 1: extreme, 2: radical, 3:none/other
# Show all names
V(f)$name
term <- c(3,2,1,1,2,1,1,2,1,2,3,2,2,2,3,1,1,1,1,1)
mycolours <- brewer.pal(3, "Greys")
V(f)$term <- term
V(f)$color <- mycolours[term]

Co-citation analysis: results

So, what are the results? First, here is the top 20 of co-cited items in the field of Extreme/Radical Right studies:

The twenty most co-cited sources in 726 SSCI items
SourceCo-citations within top 20Total co-citations
Kitschelt (1995)7457700
Mudde (2007)7408864
Lubbers et al. (2002)6005212
Norris (2005)5685077
Golder (2003)5644687
Betz (1994)5426151
R.W. Jackman & Volpert (1996)4774497
Brug et al. (2005)4623523
Arzheimer & Carter (2006)4603551
Knigge (1998)4453487
Carter (2005)3893291
Arzheimer (2009)3763301
Ignazi (2003)3442876
Ivarsflaten (2008)3343221
Ignazi (1992)3313230
Rydgren (2007)3003353
Bale (2003)2973199
Brug et al. (2000)2762602
Meguid (2005)2462600
Bale et al. (2010)1342449

Many of these titles are familiar, because they also appear in the top ten of most cited titles and are classics to boot. And here is another nugget: for each title, a substantial share of about 10 per cent of all co-citations happen within this top twenty. This is exactly the (sub)network of co-citations I’m interested in. So here is the plot I promised:

Co-citations within top 20 titles in Extreme / Radical Right studies

Co-citations within top 20 titles in Extreme / Radical Right studies

But what does it all mean? Read the second part of this mini series, or go to the full article (author’s version, no paywall):

  • Arzheimer, Kai. “Conceptual Confusion is not Always a Bad Thing: The Curious Case of European Radical Right Studies.” Demokratie und Entscheidung. Eds. Marker, Karl, Michael Roseneck, Annette Schmitt, and Jürgen Sirsch. Wiesbaden: Springer, 2018. 23-40. doi:10.1007/978-3-658-24529-0_3
    [BibTeX] [Download PDF] [HTML]
    @InCollection{arzheimer-2018,
    author = {Arzheimer, Kai},
    title = {Conceptual Confusion is not Always a Bad Thing: The Curious Case of
    European Radical Right Studies},
    booktitle = {Demokratie und Entscheidung},
    publisher = {Springer},
    address = {Wiesbaden},
    pages = {forthcoming},
    year = 2018,
    url =
    {https://www.kai-arzheimer.com/conceptual-confusion-european-radical-right-studies.pdf},
    doi = {10.1007/978-3-658-24529-0_3},
    pages = {23-40},
    html =
    {https://www.kai-arzheimer.com/conceptual-confusion-european-radical-right-studies},
    editor = {Marker, Karl and Roseneck, Michael and Schmitt, Annette and Sirsch,
    Jürgen},
    dateadded = {01-06-2018}
    }

Mar 032010
 

Over the last two decades I have accumulated thousands of references that have travelled with me all the way from bibtex-mode through Endnote, Citavi and some more obscure packages until we finally came full circle and ended up in bibtex-mode again. To my mild surprise, my use of (some) keywords has been fairly consistent so that it was relatively easy (using make, bibtool and bibtex2html) to create a 380+ entries strong online bibliography on the Extreme Right in Western Europe. Enjoy.