In a recent paper, we derive various multinomial measures of bias in public opinion surveys (e.g. pre-election polls). Put differently, with our methodology, you may calculate a scalar measure of survey bias in multi-party elections.

Thanks to Kit Baum over at Boston College, our Stata add-on `surveybias.ado` is now available from Statistical Software Components (SSC).  The add-on takes as its argument the name of a categorical variable and said variable’s true distribution in the population. For what it’s worth, the program tries to be smart: `surveybias vote, popvalues(900000 1200000 1800000)`, `surveybias vote, popvalues(0.2307692 0.3076923 0.4615385)`, and `surveybias vote, popvalues(23.07692 30.76923 46.15385)` should all give the same result.

If you don’t have access to the raw data but want to assess survey bias evident in published figures, there is `surveybiasi`, an “immediate” command that lets you do stuff like this:  `surveybiasi , popvalues(30 40 30) samplevalues(40 40 20) n(1000)`. Again, you may specify absolute values, relative frequencies, or percentages.

If you want to go ahead and measure survey bias, install `surveybias.ado` and `surveybiasi.ado` on your computer by typing `ssc install surveybias` in your net-aware copy of Stata. And if you use and like our software, please cite our forthcoming Political Analysis paper on the New Multinomial Accuracy Measure for Polling Bias.

Update April 2014: New version 1.1 available

All surveys deviate from the true distributions of the variables, but some more so than others. This is particularly relevant in the context of election studies, where the true distribution of the vote is revealed on election night. Wouldn’t it be nice if one could quantify the bias exhibited by pollster X in their pre-election survey(s), with one single number? Heck, you could even model bias in polls, using RHS variables such as time to election, sample size or sponsor of the survey, coming up with an estimate of the infamous “house effect”,.

Jocelyn Evans and I have developed a method for calculating such a figure by extending Martin, Kennedy and Traugott’s measure $A$ to the multi-party case. Being the very creative chaps we are, we call this new statistic [drumroll] $B$. We also derive a weighted version of this measure $B_w$, and statistics to measure bias in favour/against any single party ($A'$). Of course, our measures can be applied to the sampling of any categorical variable whose distribution is known.

We fully develop all these goodies (and illustrate their usefulness by analysing bias in French pre-election polls) in a paper that
(to our immense satisfaction) has just been accepted for publication in Political Analysis (replication files to follow).

Our module survebias is a Stata ado file that implements these methods. It should become available from SSC over the summer, giving you convenient access to the new methods. I’ll keep you posted.

Sometimes, a man’s gotta do what a man’s gotta do. Which, in my case, might be a little simulation of a random process involving an unordered categorical variable. In R, sampling from a multinomial distribution is trivial.

`rmultinom(1,1000,c(.1,.7,.2,.1))`

gives me a vector of random numbers from a multinomial distribution with outcomes 1, 2, 3, and 4, where the probability of observing a ‘1’ is 10 percent, the probability of observing a ‘2’ is 70 per cent, and so on. But I could not find an equivalent function in Stata. Generating artificial data in R is not very elegant, so I kept digging and found a solution in section M-5 of the Mata handbook. Hidden in the entry on runiform is a reference to `rdiscrete(r,c,p)`, a Mata function which generates a r*c matrix of draws from a multinomial distribution defined by a vector p of probabilities.

That leaves but one question: Is wrapping a handful of lines around a Mata call to replace a non-existent Stata function more elegant than calling an external program?