Nov 222013

Measuring Survey Bias

In our recent Political Analysis paper (ungated authors’ version), Jocelyn Evans and I show how Martin, Traugott, and Kennedy’s two-party measure of survey accuracy can be extended to the multi-party case (which is slightly more relevant for comparativists and other people interested in the world outside the US). This extension leads to a series of party-specific measures of bias as well as to two scalar measures of overall survey bias.

Moreover, we demonstrate that our new measures are closely linked to the familiar multinomial logit model (just as the MTK measure is linked to the binomial logit). This demonstration is NOT an exercise in Excruciatingly Boring Algebra. Rather, it leads to a straightforward derivation of standard errors and facilitates the implementation of our methodology in standard statistical packages.

Voter poll
Those Were the / CC BY-SA

An Update to Our Free Software

We have programmed such an implementation in Stata, and it should not be too difficult to implement our methodology in R (any volunteers?). Our Stata code has been on SSC for a couple of months now but has recently been significantly updated. The new version 1.0 includes various bug fixes to the existing commands surveybias.ado and surveybiasi.ado, slightly better documentation, two toy data sets that should help you getting started with the methodology, and a new command surveybiasseries.ado.

surveybiasseries facilitates comparisons across a series of (pre-election) polls. It expects a data set in which each row corresponds to margins (predicted vote shares) from a survey. Such a dataset can quickly be constructed from published sources. Access to the original data is not required. surveybiasseries calculates the accuracy measures for each poll and stores them in a set of new variables, which can then be used as depended variable(s) in a model of poll accuracy.

Getting Started with Estimating Survey Bias

The new version of surveybias for Stata should appear be on SSC over the next couple of weeks or so (double check the version number (was 0.65, should now be 1.0) and the release date), but you can install it right now from this website:

net from 
net install surveybias

To see the new command in action, try this

use fivefrenchsurveys, replace

will load information from five pre-election polls taken during the French presidential campaign (2012) into memory. The vote shares refer to eight candidates that competed in the first round.

surveybiasseries in 1/3 , popvaria(*true) samplev(fh-other) nvar(N) gen(frenchsurveys)

will calculate our accuracy measures and their standard errors for the first three surveys over the full set of candidates.

surveybiasseries in 4/5, popvariables(fhtrue-mptrue) samplevariables(fh-mp) nvar(N) gen(threeparty)

will calculate bias with respect to the three-party vote (i.e. Hollande, Sarkozy, Le Pen) for surveys no. 4 and 5 (vote shares a automatically rescaled to unity, no recoding required). The new variable names start with “frenchsurveys” and “threeparty” and should be otherwise self-explanatory (i.e. threepartybw is $B_w$ for the three party case, and threepartysebw the corresponding standard error). Feel free to plot and model to your heart’s content.

Oct 292012

Like social networks, multilevel data structures are everywhere once you start thinking about it. People live in neighbourhoods, neighbourhoods are nested in municipalities, which make up provinces – well, you get the picture. Even if we have no substantive interest in their effects, it often makes sense to control for structures in our data to get more realistic standard errors.

Now the good folks over at the European Social Survey have reacted and spent the Descartes Prize money on compiling multilevel information and merging them with their own data. So far, the selection is a little bit disappointing in some respects. Homicide rates, for instance, are reported on the national level only. But there are some pleasant surprises (I guess due to Eurostat, who collect such things): We get unemployment, GDP growth and even student numbers at the NUTS-3 level. Since you asked, NUTS is the Nomenclature of (subnational) Territory, and level 3 is the lowest level for which comparative data are normally published.

Regrettably, the size and number of level 3 units is not necessarily comparable across countries: For Germany, level 3 corresponds to about 400 local government districts, while France is divided into 96 European Departments. But if you need to combine top-notch survey data with small(ish) regional data, it’s a start, and not a bad one.

Aug 312009

Should one weight their survey data? Is it worth the effort? The short answer must be ‘maybe’ or ‘it depends’. A slightly longer and much more useful answer was given by Leslie Kish in his enormously helpful paper ‘Weighting: Why, when and how’. Today (well, actually I submitted the final manuscript 2.5 years ago – that’s scientific progress for you!), I have added my own two cent with a short chapter that looks at the effects and non-effects of common weighting procedures (in German). The bottom line is that if you employ the usual weighting variables (age, gender, education and maybe class or region) as controls in your regression, weighting will make next to no difference but might mess with your standard errors.

Reblog this post [with Zemanta]
Mar 072009

A friend send me this link to Huber’s Sandwich Emporium yesterday.

Huber’s sandwiches is within walking distance of the University of Vienna, and we spent a dreamy 10 minutes imagining  how slightly anxious researchers that suffer from correlated disturbances shuffle into that shop and ask for the massive 18 centimetre sandwich estimator. If you think this is remotely funny, your life must be pretty sad.