Aug 312009

Should one weight their survey data? Is it worth the effort? The short answer must be ‘maybe’ or ‘it depends’. A slightly longer and much more useful answer was given by Leslie Kish in his enormously helpful paper ‘Weighting: Why, when and how’. Today (well, actually I submitted the final manuscript 2.5 years ago – that’s scientific progress for you!), I have added my own two cent with a short chapter that looks at the effects and non-effects of common weighting procedures (in German). The bottom line is that if you employ the usual weighting variables (age, gender, education and maybe class or region) as controls in your regression, weighting will make next to no difference but might mess with your standard errors.

[…] level of analysis still applied.Then I realised that instead of weighing by size, I could simply include the size of the electorate as an additional independent variable to correct for potential bi…. But this still left me exposed to the danger of extreme outliers (think small, poor, rural […]

My instincts are also not to weight, and I agree with your argument

if all effects are additive, but what about if you're worried about omitted variable bias not for main effects, but interaction effects?For instance, for the sake of argument, let's assume that blacks get different income returns to education than whites (ie, there's a race*edu interaction) and that your data have an oversample of blacks such that they are half the sample. If you control but do not weight for race you're only controlling for the possibly different intercepts by race but not the interaction with education. You'll thus have an estimate of the grand slope that is the mean of the black slope and white slope, when in reality it should be more similar to the white slope because in the population whites are more numerous. On the other hand weighting should produce the correct grand slope. Maybe you should just specify the interaction effect, but a) interactions are a huge pain to interpret and b) it may not occur to you that a specific interaction has appreciable effects.

You are obviously right about the interaction though I feel that if there is an interaction in reality, it should be reflected in the model. The trouble is of course that you have to know it is there in the first place. Equivalently, you have to know which variables you would like to use in your weighting procedure. In an ideal world, we would do all our research with people who are absolutely randomly sampled and then randomly assigned to perfectly valid experiments. In lieu of that, we have to do with kludge of different sorts. 🙁