Identifying topics in research papers with the newsmap package for R (or: how the Radical Right Research Robot became slightly less dumb)

Topic modelling does not work well for (my) research paper abstracts

The Radical Right Research Robot is a fun side project whose life began exactly one year ago. The Robot exists to promote the very large body of knowledge on Radical Right parties and their voters that social scientists have accumulated over decades. At its core is a loop that randomly selects one of the more than 800 titles on my online bibliography on the Extreme/Radical Right every few hours and spits it out on twitter.

Yet the little android’s aim was always for some sort of serendipity, and so it tries to extract meaning from the abstracts (where available), sometimes with rather funny consequences. The robots’s first idea was to make use (structural) topic modelling. There are some implementations available in R and the first results looked promising, but in the end, topic modelling did not find meaningful clusters of papers that could easily be labelled with a common theme. One possible reason is that the abstracts are short, and that there are relatively few (less than 400) of them. And so the Robot reverted to using a small and fairly arbitrary set of keywords for identifying topics.

This approach produced some embarrassing howlers like this one:

the media and the #RadicalRight D. Halikiopoulou and T. Vlandas. “Risks, Costs and Labour Markets: Explaining Cross-national Patterns of Far Right Party Success in European Parliament Elections”. In: Jcms-Journal of Common Market Studies 54.3 (2016), pp. 636-655. https:// pic.twitter.com/K4YHAoZvNL
— (also @RRResRobot@fediscience.org) (@RRResRobot) December 23, 2018

Or this one (clearly the robot has a thing for media studies – who doesn’t?):

the media: K. Loxbo. “The Impact of the Radical Right: Lessons from the Local Level in Sweden, 2002-2006”. In: Scandinavian Political Studies 33.3 (2010), pp. 295-315. https://t.co/p8JmK4ovXF. <URL: https://t.co/GEDZ3p4MnI>. pic.twitter.com/QhSF914y4x
— (also @RRResRobot@fediscience.org) (@RRResRobot) December 18, 2018

There are two problems here: first, even a single instance of a keyword in a given abstract is enough to trigger a classification, and second, the bot’s pedestrian implementation would classify an abstract using the last keyword that it detected, even if it was the most peripheral of several hits. Not good enough for world domination, obviously.

Newsmap works reasonably well for classifying topics in research paper abstracts

Looking for an alternative solution, the robot came across newsmap (now also available within quanteda), a geographical news classifier developed by Kohei Watanabe. Newsmap is semi-supervised: it starts with a dictionary of proper nouns and adjectives that all refer to geographical entities, say

'France': [Paris, France, French*] 
'Germany': [German*, Berlin]
...

But newsmap is able to pick up additional words that also help to identify the respective country with high probability, e.g. “Macron”, “Merkel”, “Marseille”, “Hamburg”, or even “Lederhosen”. In a (limited) sense, it learns to identify geographical context even when the country in question is not mentioned explicitly.

And the algorithm is not restricted to geographical entities. It can also identify topics from a list. An so these days, the robot starts with a dictionary of seed words that is work in progress but looks mostly like this at the moment:

'religion & culture': [muslim*, islam*, relig*, cultur*]
'media': [TV, newspaper*, journalis*]
'group conflict': [group*,contact, prejudice, stereotyp*, competition]
...

Results are not perfect, but at least they are less embarrassing than those from the simple keyword approach. One remaining problem is that newsmap tags each abstract with (at most) one topic. In reality, any given article will refer to two or more themes in the literature. Topic models are much more attractive in this respect, because they treat each text as a mixture of topics, and so the robot may have to revisit them in the future.

Discover more from kai arzheimer

Subscribe to get the latest posts sent to your email.

Reposts

♻️ Rstats

♻️ R Tweets

♻️ Rstats

♻️ Tomasz Drabowicz

♻️ Rstats

♻️ R Tweets

♻️ Rstats

♻️ Programming Feed

♻️ EconFella

♻️ aberl🏳️‍🌈✅

♻️ Christopher Zorn

♻️ Christian Czymara

♻️ EconMaett

♻️ Jikkey

♻️ Giuseppe Carteny

♻️ econmaett

♻️ v_i_o_l_a

♻️ Till Grallert

Likes

👍 Martin Wiegand

👍 Kamil Marcinkiewicz

👍 Ama Owusu-Darko

👍 Uwe Remer

👍 Lisa Zehnter

👍 Christian S. Czymara

👍 Kamil Marcinkiewicz

👍 🌳

👍 Linda Mannheim

👍 Kudusch

👍 Ulrike Hahn

👍 Josiah

👍 Byunghwan 'Ben' Son

👍 Ulrike Hahn

👍 Niklas Schäfer

👍 Michael Jennewein

👍 Alisson Soares

👍 EconFella

👍 Stanisław Jank

👍 Giulia Sandri

👍 bsky.app

👍 Lydia Bean

👍 aberl🏳️‍🌈✅

👍 bsky.app

👍 Christian Ströbele

👍 Lincoln Quillian

👍 EconMaett

👍 Christopher Zorn

👍 Christian Czymara

👍 Olivier Leroy

👍 Christian Rauh

👍 Giuseppe Carteny

👍 Andreas Frings

👍 Stanisław Jank

👍 Sergio Lo Iacono

👍 Loukas (they/them)🏳️‍⚧️

👍 Manès Weisskircher

👍 econmaett

👍 Alex Kraieski

👍 Roland Schmidt

65 thoughts on “Identifying topics in research papers with the newsmap package for R (or: how the Radical Right Research Robot became slightly less dumb)”

Kai Arzheimer
December 18, 2024 at 15:23
Thanks, will definitively look into this, although abusing Newsmap works quite well for my purposes 😉
Alisson Soares
December 18, 2024 at 15:02
Very interesting. Did you try seeded-LDA github.com/koheiw/seede…, also from Watanabe? I tried for a while but didn’t get good results.
Rstats (@rstatstweet)
July 27, 2019 at 00:13
RT @kai_arzheimer: throwback post: identifying topics in research papers with the newsmap package for #rstats (or: how the Radical Right Re…
Rstats (@rstatstweet)
March 27, 2019 at 20:19
RT @kai_arzheimer: Identifying topics in research papers with newsmap for #rstats https://t.co/ZS5p8i6Frr https://t.co/sWLQ0IXXjW
@rstatstweet
January 28, 2019 at 20:14
RT @kai_arzheimer: Identifying topics in research papers with newsmap for #rstats https://t.co/ZS5p8i6Frr https://t.co/dJ3XkW9V2V
@rstatstweet
January 19, 2019 at 09:16
RT @kai_arzheimer: Clustering research papers with newsmap for #rstats
https://t.co/YNLdmY5rau https://t.co/vkkq0h9XMk

♻️ Rstats
♻️ R Tweets
♻️ Rstats
♻️ Tomasz Drabowicz
♻️ Rstats
♻️ R Tweets
♻️ R Tweets
♻️ Rstats
♻️ Programming Feed
♻️ EconFella
♻️ aberl🏳️‍🌈✅
♻️ Christopher Zorn
♻️ Christian Czymara
♻️ EconMaett
♻️ Jikkey
♻️ Giuseppe Carteny
♻️ econmaett
♻️ v_i_o_l_a
♻️ Till Grallert

👍 Martin Wiegand
👍 Kamil Marcinkiewicz
👍 Ama Owusu-Darko
👍 Uwe Remer
👍 Lisa Zehnter
👍 Christian S. Czymara
👍 Kamil Marcinkiewicz
👍 🌳
👍 Linda Mannheim
👍 Kudusch
👍 Ulrike Hahn
👍 Josiah
👍 Byunghwan 'Ben' Son
👍 Ulrike Hahn
👍 Niklas Schäfer
👍 Michael Jennewein
👍 Alisson Soares
👍 EconFella
👍 Stanisław Jank
👍 Giulia Sandri
👍 bsky.app
👍 Lydia Bean
👍 aberl🏳️‍🌈✅
👍 bsky.app
👍 Christian Ströbele
👍 Lincoln Quillian
👍 EconMaett
👍 Christopher Zorn
👍 Christian Czymara
👍 Olivier Leroy
👍 Christian Rauh
👍 Giuseppe Carteny
👍 Andreas Frings
👍 Stanisław Jank
👍 Sergio Lo Iacono
👍 Loukas (they/them)🏳️‍⚧️
👍 Manès Weisskircher
👍 econmaett
👍 Alex Kraieski
👍 Roland Schmidt

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Topic modelling does not work well for (my) research paper abstracts

Newsmap works reasonably well for classifying topics in research paper abstracts

Discover more from kai arzheimer

Related posts:

65 thoughts on “Identifying topics in research papers with the newsmap package for R (or: how the Radical Right Research Robot became slightly less dumb)”

Reposts

Likes

Leave a Comment Cancel reply