Example: Assessing House Effects in German Pre-Election Surveys with surveybiaseries


The website www.wahlrecht.de publishes a wealth of information on German electoral law including a series of margins from pre-election polls going back to the late 1990s. Building on this remarkable resource, our dataset of German pre-election surveys contains margins for 152 nationwide polls that were conducted by six leading pollsters between January 2013 (when the exact date of the Parliamentary election was agreed between the parties and subsequently officially announced) and mid-September (the week immediately before the election). Two main parties, the Christian Democrats (CDU/CSU) and the Social Democrats (SPD) contested this election, together with the smaller Liberal party (FDP), Greens (Die Gruenen), a left-wing party (Die Linke) and a range of small parties which we here code as ‘other’.

Obviously, one cannot expect the early polls to provide accurate predictions of the final result, because voting intentions would not be very firm so long ahead of election day, while polls conducted during spring and early summer will reflect the waxing and waning popularity of parties that is due to events and campaign effects. But these effects can be modelled, and once they are taken into account, the relatively large number of cases (individual polls) makes it possible to assess the general reliability of individual polling houses as well as to identify any party-specific polling problems that afflict all pollsters in a similar fashion.

surveybiasseries calculates the accuracy measures for the 152 surveys with a single command.

. use german-pre-election-polls.dta, replace
. qui surveybiasseries, samplevariables(cducsu spd linke gruene fdp other) ///
   nvar(n) popvalues(41.5 25.7 8.6 8.4 4.8 10.9) generate(g) num desc

Once the accuracy measures have been estimated, assessing and modelling bias is straightforward.

. summ gaprime* gbw, separator(6)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
gaprimecdu~u |       152   -.0260809    .0812008  -.1905379   .3416225
  gaprimespd |       152   -.0009817    .1027674  -.2059059   .3509032
gaprimelinke |       152   -.1953039    .1878546  -.8218191    .165164
gaprimegru~e |       152    .5237577    .1553622   .0710442     .80248
  gaprimefdp |       152   -.0968489    .2516042  -.9241087   .4017963
gaprimeother |       152   -.3630083    .2441565  -1.374638    .106989
    gbw |       152    .1586983    .0617779   .0622907   .3897365

On average, the polls measured support for the two major parties with very little bias. Moreover, the $A^{′}$s for these parties also have small standard deviations, which means that their final vote share was predicted well very consistently.

Bias is more pronounced for the smaller parties, whose measured support displayed more fluctuation throughout the campaign. This is most pronounced for the Greens, whose final result was considerably and rather consistently overestimated by the polls. This is not necessarily a sign of any methodological problems: After a strong start into the campaign, the party presented a platform that called for a comprehensive ecological tax hike. This proved almost universally unpopular, and the party’s support declined markedly. It therefore stands to reason that the relatively high figure reflects some true change of opinion over the course of the campaign.

Finally, the last line shows the average estimate of B_{w}, which seems rather high compared to the French case. However, the French surveys were taken in the week preceding the election, whereas the German surveys cover a much longer time span.

A simple linear model of overall bias can be constructed by regressing B_{w} on time to the election (measured in days) and a set of dummy variables which represent the six different major polling companies. Since B_{w} is biased away from zero, and since this bias is more pronounced in smaller samples, sample size should also be included in the model.

. reg gpesbw timetoelection n company1-company5

      Source |       SS       df       MS              Number of obs =     152
-------------+------------------------------           F(  7,   144) =   59.78
       Model |  .428748277     7  .061249754           Prob > F      =  0.0000
    Residual |  .147543926   144  .001024611           R-squared     =  0.7440
-------------+------------------------------           Adj R-squared =  0.7315
       Total |  .576292203   151  .003816505           Root MSE      =  .03201

      gpesbw |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
timetoelec~n |   .0004332   .0000336    12.90   0.000     .0003669    .0004996
      n |   9.78e-06   9.85e-06     0.99   0.322    -9.68e-06    .0000292
    company1 |  -.1093188   .0120523    -9.07   0.000    -.1331411   -.0854966
    company2 |  -.1266562   .0093368   -13.57   0.000     -.145111   -.1082013
    company3 |  -.1497661   .0121675   -12.31   0.000    -.1738161   -.1257162
    company4 |  -.1530201   .0142726   -10.72   0.000     -.181231   -.1248093
    company5 |   -.130271   .0134006    -9.72   0.000    -.1567584   -.1037837
       _cons |   .2078479   .0154906    13.42   0.000     .1772295    .2384663

The results show that sample size does not have any effect on bias, and that bias declines over time. More importantly, there are remarkable difference between the companies: Each of the first five companies does significantly better (produces less biased results) than ‘Forschungsgruppe Wahlen’ (the reference category), once time to election is controlled for. The average difference in the expected B_{w} is about 0.14.

This is not necessarily what one would expect. Forschungsgruppe Wahlen is a highly respected company with roots in academia, and their polls are conducted on behalf of one of Germany’s biggest public broadcasters. But unlike the other pollsters, Forschungsgruppe releases two different series of headline results: their raw (although presumably design-weighed) data, and a model-based ‘projection’, which factors in party identification and long-term trends. In our dataset, we have used the former. The fact that the other companies are consistently closer to the final result than Forschungsgruppe suggests that they do not publish raw survey results but rather the product of some model-based weighting – something that they do not publicise.

Given the length of the observation span, a linear trend for time is somewhat implausible and could be misleading. With a little help from mfp, one can easily find a series of transformations that provide an optimal, non-linear fit. But changing to a more adequate functional form does not substantively alter the estimates of the house effects: Forschungsgruppe performs somewhat worse than the other five.

A similar modelling strategy can also be applied to party-specific bias. In recent years, pollsters have clashed in the media and even in court over the issue of measuring support for the Social Democrats. Forsa has accused Infratest dimap of overreporting SPD support for commercial and political reasons. Other observers claim that Forsa is trying to hurt the Social Democrats by systematically underreporting SPD support.

While it seems impossible to resolve this dispute, surveybiasseries makes modelling the extent of bias in favour or against the SPD trivially easy. Because A^{\prime}_{2} (unlike B_{w}) is normally distributed, there is no need to include sample size in the model. We therefore model bias in the estimate of Social Democratic support as a function of time to the election (this time allowing for non-linear effects) and house effects, pitting Infratest dimap (company2) and Forsa (company4) against the other four companies.

. mfp : reg gpesaprime2 timetoelection company2 company4
      Source |       SS       df       MS              Number of obs =     152
-------------+------------------------------           F(  4,   147) =   49.59
       Model |  .915948751     4  .228987188           Prob > F      =  0.0000
    Residual |  .678784023   147  .004617578           R-squared     =  0.5744
-------------+------------------------------           Adj R-squared =  0.5628
       Total |  1.59473277   151  .010561144           Root MSE      =  .06795

 gpesaprime2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    Itime__1 |  -.1040717   .0214699    -4.85   0.000    -.1465012   -.0616423
    Itime__2 |   .3116465   .0530387     5.88   0.000     .2068297    .4164633
    company2 |   .0035551   .0136672     0.26   0.795    -.0234545    .0305647
    company4 |  -.1583322   .0134847   -11.74   0.000     -.184981   -.1316833
       _cons |   .0266413   .0086129     3.09   0.002     .0096203    .0436623
Deviance: -391.165.

Here, the (0 0.5) transformation provides the best fit. This functional form is J-shaped: Bias in favour of the SPD declined over the course of the campaign but rose sharply in the last few polls taken immediately before the election. Controlling for time, the four companies that are treated as the reference point performed very well. While their average bias of 0.027 is significantly different from zero, this number is small in absolute terms.

The coefficient for Infratest dimap is statistically indistinguishable from zero. Put differently, with respect to the estimate of the SPD vote, there is no evidence that the polls conducted by Infratest differ in any way from those produced by the other four companies. The estimate for Forsa, on the other hand, is statistically and substantively significant. While five companies including Infratest dimap got the SPD vote right on average, Forsa consistently underestimated support for the Social Democrats by a considerable margin.

Agree? Disagree? Leave a reply (also works with Facebook, G+, Disqus ...)