Example: Assessing Bias in Educational Attainment with surveybias Using the Survey Estimator

 

Various reforms and variations at the state level not withstanding, there exists a strict hierarchy of schools and school-leaving qualifications in Germany. Historically, most pupils would leave school after nine years and would be awarded a ‘Hauptschulabschluss’, whereas a smaller proportion would proceed to the ‘Realschulabschluss’ (awarded after ten years) or even the ‘Abitur’ (the qualification required to enter German universities, awarded after twelve or thirteen years). Over the last couple of decades, however, the number of pupil educated to Abitur level has risen sharply. The true distribution of school-leaving qualifications in the population is known from census data and so can serve as a yardstick for assessing bias.

Normally, more educated voters are more likely to participate in opinion surveys and are therefore overrepresented in survey. But this pattern is not reflected in the pre-election wave of the German Longitudinal Election Study (GLES).

. use gles-preelection, replace

. surveybias educ, popvalues(4 36.1 30.5 29) 
------------------------------------------------------------------------------
   educ |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
A'           |
 noqualifica |  -.5113485   .1461372    -3.50   0.000    -.7977721   -.2249249
    upto9yrs |   .0115364   .0469027     0.25   0.806    -.0803912     .103464
       10yrs |    .308362   .0466371     6.61   0.000     .2169549    .3997691
      12yrs+ |   -.290088   .0532528    -5.45   0.000    -.3944617   -.1857144
-------------+----------------------------------------------------------------
B            |
      B |   .2803337          .        .       .            .           .
    B_w |    .203609          .        .       .            .           .
------------------------------------------------------------------------------

    Ho: no bias
    Degrees of freedom: 3
    Chi-square (Pearson) = 63.802082
    Pr (Pearson) = 9.048e-14
    Chi-square (LR) = 65.206022
    Pr (LR) = 4.532e-14

On the contrary, respondents with twelve or more years of schooling are clearly underrepresented, while there is no appreciable bias for respondents with nine years of schooling, and respondents with ten years of schooling are actually overrepresented. Only the misrepresentation of the small group of school dropouts is in line with expectations.

There are some possible reasons for this unusual type of bias. One is the generational gap in educational attainment. Younger voters are much more likely to hold Abitur qualifications and also more mobile and less likely to have a landline connection, hence more difficult to contact for interviewers.

But another plausible and perhaps more interesting reason is the complex design of the GLES: The GLES is a multi-stage survey that deliberately oversamples respondents from the former East Germany (GDR) to account for persistent attitudinal, social, and economic differences between Germany’s Eastern and Western regions. In the GDR, the Communists phased out the Hauptschulabschluss and instead promoted a ten-year-curriculum. At the same time, they limited access to the Abitur. As a consequence, the distribution of school-leaving qualifications in the former East Germany still differs markedly from the West.

surveybias supports Stata’s survey estimator, so it is possible to make use of the weights supplied by the GLES team as well as of the information on PSUs and stratification to see if this reduces the apparent bias.

. qui svyset vnvpoint [pweight=w_ipfges_1] , strata(distost)

. surveybias educ, popvalues(4 36.1 30.5 29) svy 
Using survey characteristics of your data


Warning: This requires switching to numerical methods

------------------------------------------------------------------------------
   educ |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
A'           |
 noqualifica |  -.2203508   .2777376    -0.79   0.428    -.7647065    .3240049
    upto9yrs |   .0665091   .0780089     0.85   0.394    -.0863856    .2194038
       10yrs |   .0202821   .0657158     0.31   0.758    -.1085185    .1490827
      12yrs+ |  -.0596029   .0943501    -0.63   0.528    -.2445258      .12532
-------------+----------------------------------------------------------------
B            |
      B |   .0916863          .        .       .            .           .
    B_w |   .0565208          .        .       .            .           .
------------------------------------------------------------------------------

    Ho: no bias
    Degrees of freedom: 3
    Chi-square (Wald) = 1.3289546
    Pr (Wald) = .72226926

Incorporating the information on the design of the survey massively reduces the estimates for bias. The A_{i}^{\prime} for the three major groups are now very small while the A^{\prime}_{i} is roughly halved, and none of them differs significantly from zero. B, the estimate for the overall bias, drops to one third of the original figure of 0.28, while its weighted version, B_{w}, is reduced even further from 0.20 to 0.06, because it takes the size of the ‘no qualification’’ group into account.

With complex variance estimators, simple goodness-of-fit tests are not appropriate. They are replaced by the equivalent Wald-test of the null hypothesis that all A_{i}^{\prime} (and, by implication, the overall measures B and B_{w}) jointly equal zero. At three degrees of freedom, this hypothesis cannot be rejected.

Agree? Disagree? Leave a reply (also works with Facebook, G+, Twitter...)