Oct 082015
Surveybias 1.4 is out 1

Just how badly biased is your pre-election survey? Once the election results are in, our scalar measures B and B_w provide convenient, single number summaries. Our surveybias add-on for Stata will calculate these and other measures from either raw data or from published margins. Its latest iteration (version 1.4) has just appeared on SSC. Surveybias 1.4 improves on the previous version by ditching the last remnants of the old numerical approximation code for calculating standard errors and is hence much faster in many applications. Install it now from within Stata by typing

ssc install surveybias


Mar 312015

Worried about survey bias?

We have updated our add-on (or ado) surveybias, which calculates our multinomial generalisation of the old Martin, Traugott, and Kennedy (2005) measure for survey bias. If you have any dichotomous or multinomial variable in your survey whose true distribution is known (e.g. from the census, electoral counts, or other official data), surveybias can tell you just how badly damaged your sample really is with respect to that variable. Our software makes it trivially easy to asses bias in any survey.

Within Stata, you can install/update surveybias by entering ssc install surveybias. We’ve also created a separate page with more information on how to use surveybias, including a number of worked examples.survey bias

The new version is called 1.3b (please don’t ask). New features and improvements include:

  • Support for (some) complex variance estimators including Stata’s survey estimator (sample points, strata, survey weights etc.)
  • Improvements to the numerical approximation. survebias is roughly seven times faster now
  • A new analytical method for simple random samples that is even faster
  • Convenience options for naming variables created by survebiasseries
  • Lots of bug fixes and improvements to the code

If you need to quantify survey bias, give it a spin.

May 202014

The Problem: Assessing Bias without the Data Set

While the interwebs are awash with headline findings from countless surveys, commercial companies (and even some academics) are reluctant to make their raw data available for secondary analysis. But fear not: Quite often, media outlets and aggregator sites publish survey margins, and that is all the information you need. It’s as easy as \pi.

The Solution: surveybiasi

After installing our surveybias add-on for Stata, you will have access to surveybiasi. surveybiasi is an “immediate command” (Stata parlance) that compares the distribution of a categorical variable in a survey to its true distribution in the population. Both distributions need to be specified via the popvalues() and samplevalues() options, respectively. The elements of these two lists may be specified in terms of counts, of percentages, or of relative frequencies, as the list is internally rescaled so that its elements sum up to unity. surveybiasi will happily report k A^{\prime}_{i}s, B and B_{w} (check out our paper for more information on these multinomial measures of bias) for variables with 2 to 12 discrete categories.

Bias in a 2012 CBS/NYT Poll

A week before the 2012 election for the US House of Representatives, 563 likely voters were polled for CBS/The New York Times. 46 per cent said they would vote for the Republican candidate in their district, 48 per cent said they would vote for the Democratic candidate. Three per cent said it would depend, and another two per cent said they were unsure, or refused to answer the question. In the example these five per cent are treated as “other”. Due to rounding error, the numbers do not exactly add up to 100, but surveybiasi takes care of the necessary rescaling.

In the actual election, the Republicans won 47.6 and the Democrats 48.8 per cent of the popular vote, with the rest going to third-party candidates. To see if these differences are significant, run surveybiasi like this:

. surveybiasi , popvalues(47.6 48.8 3.6) samplevalues(46 48 5) n(563)
      catvar |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
A'           |
           1 |  -.0426919   .0844929    -0.51   0.613     -.208295    .1229111
           2 |  -.0123999   .0843284    -0.15   0.883    -.1776805    .1528807
           3 |   .3375101   .1938645     1.74   0.082    -.0424573    .7174776
B            |
           B |   .1308673   .0768722     1.70   0.089    -.0197994    .2815341
         B_w |   .0385229   .0247117     1.56   0.119    -.0099112    .0869569
    Ho: no bias
    Degrees of freedom: 2
    Chi-square (Pearson) = 3.0945337
    Pr (Pearson) = .21282887
    Chi-square (LR) = 2.7789278
    Pr (LR) = .24920887

Given the small sample size and the close match between survey and electoral counts, it is not surprising that there is no evidence for statistically or substantively significant bias in this poll.

An alternative approach is to follow Martin, Traugott and Kennedy (2005) and ignore third-party voters, undecided respondents, and refusals. This requires minimal adjustments: n is now 535 as the analytical sample size is reduced by five per cent, while the figures representing the “other” category can simply be dropped. Again, surveybiasiinternally rescales the values accordingly:

. surveybiasi , popvalues(47.6 48.8) samplevalues(46 48) n(535)
      catvar |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
A'           |
           1 |  -.0162297   .0864858    -0.19   0.851    -.1857388    .1532794
           2 |   .0162297   .0864858     0.19   0.851    -.1532794    .1857388
B            |
           B |   .0162297   .0864858     0.19   0.851    -.1532794    .1857388
         B_w |   .0162297   .0864858     0.19   0.851    -.1532794    .1857388
    Ho: no bias
    Degrees of freedom: 1
    Chi-square (Pearson) = .03521623
    Pr (Pearson) = .85114329
    Chi-square (LR) = .03521898
    Pr (LR) = .85113753

Under this two-party scenario, A^{\prime}_{1} is identical to Martin, Traugott, and Kennedy’s original A (and all other estimates are identical to A‘s absolute value). Its negative sign points to the (tiny) anti-Republican bias in this poll, which is of course even less significant than in the previous example.

May 102014

In a recent publication (Arzheimer & Evans 2014), we propose a new multinomial measure B for bias in opinion surveys. We also supply a suite of ado files for Stata, surveybias, which plugs into Stata’s framework for estimation programs and provides estimates for this and other measures along with their standard errors.  This is the first instalment in a mini series of posts that show how our commands can be used with real-world data. Here, we analyse the quality of a single French pre-election poll.

Installing surveybias for Stata

You can install surveybias directly from this website (net from https://www.kai-arzheimer.com/stata), but it may more convenient to install from SSC ssc install surveybias

Assessing Bias in Presidential Pre-Election Surveys

. use onefrenchsurvey

The French presidential campaign of 2012 attracted considerable political interest. Accordingly, numerous surveys were fielded. onefrenchsurvey.dta (included in our package) contains data from one of them, taken a couple of weeks before the actual election. The command I will discuss in this post is called (*drumroll*) surveybias and is the main workhorse in our package. surveybias needs exactly one variable as a mandatory argument: the voting intention as measured in the survey, which is appropriately called “vote” in this example. Moreover, surveybias requires an option through which must submit the true distribution of this variable. Absolute or relative frequencies will do just as well as percentages, since surveybias will automatically rescale any of them.

Ten candidates stood in the first round of the French presidential election in 2012, but only two of them would progress to the run-off. While surveybias can handle variables with up to twelve categories, requesting estimates for very small parties increases the computational burden, may lead to numerically unstable estimates and is often of little substantive interest. In onefrenchsurvey.dta support for the two-lowest ranking candidates has therefore been recoded to a generic “other” category. The first-round results, which serve as a yardstick for the accuracy of the poll, are submitted in popvalues(). For other options, have a look at the documentation.

. surveybias vote, popvalues(28.6 27.18 17.9 9.13 11.1 2.31 1.15 1.79 0.8)
______________ ________________________________________________________________
vote       Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
______________ ________________________________________________________________
Hollande   -.0757639   .0697397    -1.09   0.277    -.2124512    .0609233
Sarkozy    .0477294   .0689193     0.69   0.489    -.0873499    .1828087
LePen   -.0559812   .0823209    -0.68   0.496    -.2173271    .1053648
Bayrou    .3057213   .0953504     3.21   0.001     .1188379    .4926047
Melenchon   -.0058251   .0988715    -0.06   0.953    -.1996096    .1879594
Joly   -.0913924   .2154899    -0.42   0.671    -.5137449      .33096
Poutou   -.8802476   .4482915    -1.96   0.050    -1.758883   -.0016125
DupontAigna   -.5349338   .3031171    -1.76   0.078    -1.129032    .0591648
other    .1841789   .3177577     0.58   0.562    -.4386147    .8069724
______________ ________________________________________________________________
B    .2424193   .0767485     3.16   0.002     .0919949    .3928437
B_w    .0965423    .039022     2.47   0.013     .0200605    .1730241
______________ ________________________________________________________________

Ho: no bias
Degrees of freedom: 8
Chi-square (Pearson) = 18.695468
Pr (Pearson) = .01657592
Chi-square (LR) = 19.540804
Pr (LR) = .01222022

The top panel lists the Ai for the first eight candidates plus the “other” category alongside their standard errors, z- and p-values, and confidence intervals. Ai is a party-specific, multi-party version of Martin, Traugott, and Kennedy’s measure A and reflects bias for/against any specific party. By conventional standards (p 0.05), only two of these values are significantly different from zero: Support for François Bayrou was overestimated (A4 = 0.31) while support for Philippe Poutou was underestimated (A7 = 0.88).

Poutou was the little known candidate for the tiny “New Anticapitalist Party”. While he received more than twice the predicted number of votes (1exp(0.88) 2.4), the case of Bayrou is more interesting. Bayrou, a centre-right candidate, stood in the previous 2007 election and came third with a very respectable result of almost 19 per cent, taking many political observers by surprise. In 2012, when he stood for a new party that he had founded immediately after the 2007 election, his vote effectively halved. But this is not fully reflected in the poll, which overestimates his support by roughly a third (exp(0.31) 1.35). This could be due to (misguided) bandwagon effects, sampling bias, or political weighting of the poll by the company.

The lower panel of the output lists B and Bw, a weighted version of our measure. B, the unweighed average of the Ais absolute values, is much higher than Bw. This is because the estimates for all the major candidates with the exception of Bayrou were reasonably good. While support for Poutou and also for Dupont-Aignan was underestimated by large factors, Bw heavily discounts these differences, because they are of little practical relevance unless one is interested specifically in splinter parties.

As outlined in the article in which we derive B, B’s (and Bw’s) sampling distribution is non-normal, rendering the p-value of 0.002 somewhat dubious. surveybias therefore performs additional χ2-tests based on the Pearson and the likelihood-ratio formulae, whose results are listed below the main table. In this case, however, both tests agree that the null hypothesis of no bias is indeed falsified by the data.

While their p-values are clearly higher than the one resulting from the inappropriate z-test on B, they are close to the p-value for Bw. This is to be expected, because the upward bias and the non-normality become less severe as the number of categories increases, and because the weighting reduces the impact of differences that are small in absolute numbers but associated with large values on the log-ratio scale.

surveybias leaves the full variance-covariance matrix behind for your edification. Parameter estimates, chi-square values and probabilities are available, too, so that you can easily test all sorts of interesting variables about bias in this poll.

Apr 092014

European Identities in the Cloud

As previously reported on this blog, my PhD student and I are doing a CATI survey on European Identities. We opted for queXS (an open source CATI front-end for Limesurvey) and chose a solution hosted by the Australian Consortium for Social and Political Research on Amazon’s network.

Hosted queXS Is Reliable

Initially, we suffered from a few hick-ups that hit the system while interviewing was in full swing: The form would sometimes simply not open at the very beginning of an interview, which understandably drove our interviewers nuts. Support in Australia fixed the problem quickly, but because of the time difference, we had a somewhat anxious night. Voice over IP connectivity was integrated from Australia but provided by a German company. By and large, that worked well, too. We had one major outage but again, after contacting the ACSPR, that was fixed for good.

PCs and Interviewers not yet Virtualised

PCs and Interviewers not yet Virtualised

Lousy Response Rate Not a Software Problem

The one element that we did not virtualise were the interviewers. We had hired a large group of student helpers, which, with hindsight, was not necessarily a brilliant idea. queXS makes it very easy to track operator performance, and so we could quickly see that some of them generated very, very high refusal rates. They all received initial training and constant supervision from us, but some of them would barely manage to get one twenty-minute interview per four-hour shift. Others managed four or more. Our star and role model was a guy who attends acting school. If I could clone and upload him to the cloud, I would be a very happy chappy.

Apr 032014

Survey Accuracy

The accuracy of pre-election surveys is a matter of considerable debate. Obviously, any rigorous discussion of bias in opinion polls requires a scalar measure of survey accuracy. Martin, Traugott, and Kennedy (2005) propose such a measure A for the two-party case, and in our own work (Arzheimer/Evans 2014), Jocelyn Evans and I demonstrate how A can be generalised to the multi-party case, giving rise to a new measure B (seriously) and some friends A^{\prime}_{i} and B_w:

    Arzheimer, Kai and Jocelyn Evans. “A New Multinomial Accuracy Measure for Polling Bias.” Political Analysis 22.1 (2014): 31–44. doi:10.1093/pan/mpt012
    [BibTeX] [Abstract] [Download PDF] [HTML] [DATA]

    In this article, we propose a polling accuracy measure for multi-party elections based on a generalization of Martin, Traugott, and Kennedy s two-party predictive accuracy index. Treating polls as random samples of a voting population, we first estimate an intercept only multinomial logit model to provide proportionate odds measures of each party s share of the vote, and thereby both unweighted and weighted averages of these values as a summary index for poll accuracy. We then propose measures for significance testing, and run a series of simulations to assess possible bias from the resulting folded normal distribution across different sample sizes, finding that bias is small even for polls with small samples. We apply our measure to the 2012 French presidential election polls to demonstrate its applicability in tracking overall polling performance across time and polling organizations. Finally, we demonstrate the practical value of our measure by using it as a dependent variable in an explanatory model of polling accuracy, testing the different possible sources of bias in the French data.

    author = {Arzheimer, Kai and Evans, Jocelyn},
    title = {A New Multinomial Accuracy Measure for Polling Bias },
    journal = {Political Analysis},
    year = 2014,
    abstract = {In this article, we propose a polling accuracy measure for
    multi-party elections based on a generalization of Martin,
    Traugott, and Kennedy s two-party predictive accuracy index.
    Treating polls as random samples of a voting population, we first
    estimate an intercept only multinomial logit model to provide
    proportionate odds measures of each party s share of the vote, and
    thereby both unweighted and weighted averages of these values as a
    summary index for poll accuracy. We then propose measures for
    significance testing, and run a series of simulations to assess
    possible bias from the resulting folded normal distribution across
    different sample sizes, finding that bias is small even for polls
    with small samples. We apply our measure to the 2012 French
    presidential election polls to demonstrate its applicability in
    tracking overall polling performance across time and polling
    organizations. Finally, we demonstrate the practical value of our
    measure by using it as a dependent variable in an explanatory model
    of polling accuracy, testing the different possible sources of bias
    in the French data.},
    keywords = {meth-e},
    volume = {22},
    number = {1},
    pages = {31--44},
    url =
    doi = {10.1093/pan/mpt012},
    data = {http://hdl.handle.net/1902.1/21603},
    html =

The Surveybias Software 1.1

Calculating the accuracy measures is a matter of some algebra. Estimating standard errors is a bit trickier but could be done manually by making use of the relationship between A^{\prime}_{i} and the multinomial logistic model on the one hand and Stata’s very powerful implementation of the Delta method on the other. But these calculations are error-prone and become tedious rather quickly. This is why we created a suite of user written programs (surveybias, surveybiasi, and surveybiasseries). They do all the necessary legwork and return the estimates of accuracy, complete with standard errors and statistical tests.

Voter poll
Those Were the DaysFoter.com / CC BY-SA

We have just updated our software. The new version 1.1 of surveybias features some bug fixes, a better mechanism for automagically dealing with convergence problems, better documentation, and a new example data set that compiles information on 152 German pre-election polls conducted between January and September 2013.

Examples, Please?

surveybias comes with example data from the French presidential election 2012 and the German parliamentary election 2013. From within Stata, type help surveybias, help surveybiasi, and help surveybiasseries to see how you can make use of our software. If I can find the time, I will illustrate the use of surveybias in a mini series of blogs over the next week.

Updating Surveybias

The new version 1.1 should appear is now on SSC within the next couple of days or so, but the truly impatient can get it now. In your internet-aware copy of Stata (version 11 or later), type

net from https://www.kai-arzheimer.com/stata/

net install surveybias, replace

Or use SSC: ssc install surveybias, replace


Mar 222014

One of my very able PhD students is working on a better instrument for measuring the interaction of national and European identities. Thanks to the generosity of the Fritz Thyssen Stiftung, we can now road-test some of his ideas in a three-wave telephone survey. Fieldwork for the first wave will commence on Monday, and we are rather excited, not least because we are running this survey in our own “studio”, with a large number of student research assistants working as interviewers.

Worldclouds 2009
NASA Earth Observatory / Foter / Public domain

In the past, the university had installed the voxco software in a PC lab that was equipped with headsets and landlines. But the program never worked well and became de facto unusable once the service contract waterminated. Looking for alternatives when we moved into a new building, we came across queXS, an open source CATI software that is based on limesurvey. Limesurvey had worked well for us in the past, so we gave queXS a spin and rather liked it. The only remaining problem was that our IT support could not setup the necessary servers and patch them into the university’s voice over ip infrastructure in time (we want to be in the field well before the Euro 2014 campaign takes off in two weeks or so). So we got in touch with ACSPRI, the Australian Consortium for Social and Political Research Incorporated, which offers access to a Amazon cloud-based installation of queXS that can be rented on a monthly basis for a reasonable fee. ACSPRI also helped us to find a German VOIP provider whose network we will use to place the calls.

Now our “studio” is still based in a university PC lab. But this is mostly an issue of convenience, and of easy supervision. In fact, it could be run on laptops or even tablet computers anywhere on the planet. The software is browser-based and hosted in some unknown, unmarked data centre somewhere. Connectivity to German landlines is provided through software in another data centre, and this whole virtualised infrastructure is supported and maintained from the other end of the world. Apart from the headsets, the only tangible part of the studio is a bunch of pen-drives that hold the interviewers’ access codes. Eerie, isn’t it?

The tests went well, but will it work in practice? I’ll keep you posted.

Nov 222013

Measuring Survey Bias

In our recent Political Analysis paper (ungated authors’ version), Jocelyn Evans and I show how Martin, Traugott, and Kennedy’s two-party measure of survey accuracy can be extended to the multi-party case (which is slightly more relevant for comparativists and other people interested in the world outside the US). This extension leads to a series of party-specific measures of bias as well as to two scalar measures of overall survey bias.

Moreover, we demonstrate that our new measures are closely linked to the familiar multinomial logit model (just as the MTK measure is linked to the binomial logit). This demonstration is NOT an exercise in Excruciatingly Boring Algebra. Rather, it leads to a straightforward derivation of standard errors and facilitates the implementation of our methodology in standard statistical packages.

Voter poll
Those Were the DaysFoter.com / CC BY-SA

An Update to Our Free Software

We have programmed such an implementation in Stata, and it should not be too difficult to implement our methodology in R (any volunteers?). Our Stata code has been on SSC for a couple of months now but has recently been significantly updated. The new version 1.0 includes various bug fixes to the existing commands surveybias.ado and surveybiasi.ado, slightly better documentation, two toy data sets that should help you getting started with the methodology, and a new command surveybiasseries.ado.

surveybiasseries facilitates comparisons across a series of (pre-election) polls. It expects a data set in which each row corresponds to margins (predicted vote shares) from a survey. Such a dataset can quickly be constructed from published sources. Access to the original data is not required. surveybiasseries calculates the accuracy measures for each poll and stores them in a set of new variables, which can then be used as depended variable(s) in a model of poll accuracy.

Getting Started with Estimating Survey Bias

The new version of surveybias for Stata should appear be on SSC over the next couple of weeks or so (double check the version number (was 0.65, should now be 1.0) and the release date), but you can install it right now from this website:

net from https://www.kai-arzheimer.com/stata 
net install surveybias

To see the new command in action, try this

use fivefrenchsurveys, replace

will load information from five pre-election polls taken during the French presidential campaign (2012) into memory. The vote shares refer to eight candidates that competed in the first round.

surveybiasseries in 1/3 , popvaria(*true) samplev(fh-other) nvar(N) gen(frenchsurveys)

will calculate our accuracy measures and their standard errors for the first three surveys over the full set of candidates.

surveybiasseries in 4/5, popvariables(fhtrue-mptrue) samplevariables(fh-mp) nvar(N) gen(threeparty)

will calculate bias with respect to the three-party vote (i.e. Hollande, Sarkozy, Le Pen) for surveys no. 4 and 5 (vote shares a automatically rescaled to unity, no recoding required). The new variable names start with “frenchsurveys” and “threeparty” and should be otherwise self-explanatory (i.e. threepartybw is $B_w$ for the three party case, and threepartysebw the corresponding standard error). Feel free to plot and model to your heart’s content.