In a recent publication (Arzheimer & Evans 2014), we propose a new multinomial measure B for bias in opinion surveys. We also supply a suite of ado files for Stata, surveybias, which plugs into Stata’s framework for estimation programs and provides estimates for this and other measures along with their standard errors. This is the first instalment in a mini series of posts that show how our commands can be used with real-world data. Here, we analyse the quality of a single French pre-election poll.
Installing surveybias for Stata
You can install surveybias directly from this website (
net from https://www.kai-arzheimer.com/stata), but it may more convenient to install from SSC
ssc install surveybias
Assessing Bias in Presidential Pre-Election Surveys
. use onefrenchsurvey
The French presidential campaign of 2012 attracted considerable political interest. Accordingly, numerous surveys were fielded. onefrenchsurvey.dta (included in our package) contains data from one of them, taken a couple of weeks before the actual election. The command I will discuss in this post is called (*drumroll*) surveybias and is the main workhorse in our package. surveybias needs exactly one variable as a mandatory argument: the voting intention as measured in the survey, which is appropriately called “vote” in this example. Moreover, surveybias requires an option through which must submit the true distribution of this variable. Absolute or relative frequencies will do just as well as percentages, since surveybias will automatically rescale any of them.
Ten candidates stood in the first round of the French presidential election in 2012, but only two of them would progress to the run-off. While surveybias can handle variables with up to twelve categories, requesting estimates for very small parties increases the computational burden, may lead to numerically unstable estimates and is often of little substantive interest. In onefrenchsurvey.dta support for the two-lowest ranking candidates has therefore been recoded to a generic “other” category. The first-round results, which serve as a yardstick for the accuracy of the poll, are submitted in
popvalues(). For other options, have a look at the documentation.
. surveybias vote, popvalues(28.6 27.18 17.9 9.13 11.1 2.31 1.15 1.79 0.8) ______________ ________________________________________________________________ vote Coef. Std. Err. z P>|z| [95% Conf. Interval] ______________ ________________________________________________________________ A´ Hollande -.0757639 .0697397 -1.09 0.277 -.2124512 .0609233 Sarkozy .0477294 .0689193 0.69 0.489 -.0873499 .1828087 LePen -.0559812 .0823209 -0.68 0.496 -.2173271 .1053648 Bayrou .3057213 .0953504 3.21 0.001 .1188379 .4926047 Melenchon -.0058251 .0988715 -0.06 0.953 -.1996096 .1879594 Joly -.0913924 .2154899 -0.42 0.671 -.5137449 .33096 Poutou -.8802476 .4482915 -1.96 0.050 -1.758883 -.0016125 DupontAigna -.5349338 .3031171 -1.76 0.078 -1.129032 .0591648 other .1841789 .3177577 0.58 0.562 -.4386147 .8069724 ______________ ________________________________________________________________ B B .2424193 .0767485 3.16 0.002 .0919949 .3928437 B_w .0965423 .039022 2.47 0.013 .0200605 .1730241 ______________ ________________________________________________________________ Ho: no bias Degrees of freedom: 8 Chi-square (Pearson) = 18.695468 Pr (Pearson) = .01657592 Chi-square (LR) = 19.540804 Pr (LR) = .01222022
The top panel lists the Ai′ for the first eight candidates plus the “other” category alongside their standard errors, z- and p-values, and confidence intervals. Ai is a party-specific, multi-party version of Martin, Traugott, and Kennedy’s measure A and reflects bias for/against any specific party. By conventional standards (p ≤ 0.05), only two of these values are significantly different from zero: Support for François Bayrou was overestimated (A4′ = 0.31) while support for Philippe Poutou was underestimated (A7′ = –0.88).
Poutou was the little known candidate for the tiny “New Anticapitalist Party”. While he received more than twice the predicted number of votes (1∕exp(–0.88) ≈ 2.4), the case of Bayrou is more interesting. Bayrou, a centre-right candidate, stood in the previous 2007 election and came third with a very respectable result of almost 19 per cent, taking many political observers by surprise. In 2012, when he stood for a new party that he had founded immediately after the 2007 election, his vote effectively halved. But this is not fully reflected in the poll, which overestimates his support by roughly a third (exp(0.31) ≈ 1.35). This could be due to (misguided) bandwagon effects, sampling bias, or political weighting of the poll by the company.
The lower panel of the output lists B and Bw, a weighted version of our measure. B, the unweighed average of the Ai′s absolute values, is much higher than Bw. This is because the estimates for all the major candidates with the exception of Bayrou were reasonably good. While support for Poutou and also for Dupont-Aignan was underestimated by large factors, Bw heavily discounts these differences, because they are of little practical relevance unless one is interested specifically in splinter parties.
As outlined in the article in which we derive B, B’s (and Bw’s) sampling distribution is non-normal, rendering the p-value of 0.002 somewhat dubious. surveybias therefore performs additional χ2-tests based on the Pearson and the likelihood-ratio formulae, whose results are listed below the main table. In this case, however, both tests agree that the null hypothesis of no bias is indeed falsified by the data.
While their p-values are clearly higher than the one resulting from the inappropriate z-test on B, they are close to the p-value for Bw. This is to be expected, because the upward bias and the non-normality become less severe as the number of categories increases, and because the weighting reduces the impact of differences that are small in absolute numbers but associated with large values on the log-ratio scale.
surveybias leaves the full variance-covariance matrix behind for your edification. Parameter estimates, chi-square values and probabilities are available, too, so that you can easily test all sorts of interesting variables about bias in this poll.