May 202014
 

The Problem: Assessing Bias without the Data Set

While the interwebs are awash with headline findings from countless surveys, commercial companies (and even some academics) are reluctant to make their raw data available for secondary analysis. But fear not: Quite often, media outlets and aggregator sites publish survey margins, and that is all the information you need. It’s as easy as \pi.

The Solution: surveybiasi

After installing our surveybias add-on for Stata, you will have access to surveybiasi. surveybiasi is an “immediate command” (Stata parlance) that compares the distribution of a categorical variable in a survey to its true distribution in the population. Both distributions need to be specified via the popvalues() and samplevalues() options, respectively. The elements of these two lists may be specified in terms of counts, of percentages, or of relative frequencies, as the list is internally rescaled so that its elements sum up to unity. surveybiasi will happily report k A^{\prime}_{i}s, B and B_{w} (check out our paper for more information on these multinomial measures of bias) for variables with 2 to 12 discrete categories.

Bias in a 2012 CBS/NYT Poll

A week before the 2012 election for the US House of Representatives, 563 likely voters were polled for CBS/The New York Times. 46 per cent said they would vote for the Republican candidate in their district, 48 per cent said they would vote for the Democratic candidate. Three per cent said it would depend, and another two per cent said they were unsure, or refused to answer the question. In the example these five per cent are treated as “other”. Due to rounding error, the numbers do not exactly add up to 100, but surveybiasi takes care of the necessary rescaling.

In the actual election, the Republicans won 47.6 and the Democrats 48.8 per cent of the popular vote, with the rest going to third-party candidates. To see if these differences are significant, run surveybiasi like this:


. surveybiasi , popvalues(47.6 48.8 3.6) samplevalues(46 48 5) n(563)
------------------------------------------------------------------------------
      catvar |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
A'           |
           1 |  -.0426919   .0844929    -0.51   0.613     -.208295    .1229111
           2 |  -.0123999   .0843284    -0.15   0.883    -.1776805    .1528807
           3 |   .3375101   .1938645     1.74   0.082    -.0424573    .7174776
-------------+----------------------------------------------------------------
B            |
           B |   .1308673   .0768722     1.70   0.089    -.0197994    .2815341
         B_w |   .0385229   .0247117     1.56   0.119    -.0099112    .0869569
------------------------------------------------------------------------------
 
    Ho: no bias
    Degrees of freedom: 2
    Chi-square (Pearson) = 3.0945337
    Pr (Pearson) = .21282887
    Chi-square (LR) = 2.7789278
    Pr (LR) = .24920887


Given the small sample size and the close match between survey and electoral counts, it is not surprising that there is no evidence for statistically or substantively significant bias in this poll.

An alternative approach is to follow Martin, Traugott and Kennedy (2005) and ignore third-party voters, undecided respondents, and refusals. This requires minimal adjustments: n is now 535 as the analytical sample size is reduced by five per cent, while the figures representing the “other” category can simply be dropped. Again, surveybiasiinternally rescales the values accordingly:


. surveybiasi , popvalues(47.6 48.8) samplevalues(46 48) n(535)
------------------------------------------------------------------------------
      catvar |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
A'           |
           1 |  -.0162297   .0864858    -0.19   0.851    -.1857388    .1532794
           2 |   .0162297   .0864858     0.19   0.851    -.1532794    .1857388
-------------+----------------------------------------------------------------
B            |
           B |   .0162297   .0864858     0.19   0.851    -.1532794    .1857388
         B_w |   .0162297   .0864858     0.19   0.851    -.1532794    .1857388
------------------------------------------------------------------------------
 
    Ho: no bias
    Degrees of freedom: 1
    Chi-square (Pearson) = .03521623
    Pr (Pearson) = .85114329
    Chi-square (LR) = .03521898
    Pr (LR) = .85113753

Under this two-party scenario, A^{\prime}_{1} is identical to Martin, Traugott, and Kennedy’s original A (and all other estimates are identical to A‘s absolute value). Its negative sign points to the (tiny) anti-Republican bias in this poll, which is of course even less significant than in the previous example.

Click to share

  One Response to “How to Measure Survey Bias without Having Access to the Raw Data (Surveybias Example 2/3)”

  1. RT @kai_arzheimer: How to Measure #Survey Bias without Having Access to the Raw Data (#Surveybias Example 2/3) http://t.co/N2uehvhzRh #Stat…

Agree? Disagree? Leave a reply (also works with Facebook, G+, Disqus ...)