All surveys deviate from the true distributions of the variables, but some more so than others. This is particularly relevant in the context of election studies, where the true distribution of the vote is revealed on election night. Wouldn’t it be nice if one could quantify the bias exhibited by pollster X in their pre-election survey(s), with one single number? Heck, you could even model bias in polls, using RHS variables such as time to election, sample size or sponsor of the survey, coming up with an estimate of the infamous “house effect”,.

Jocelyn Evans and I have developed a method for calculating such a figure by extending Martin, Kennedy and Traugott’s measure $A$ to the multi-party case. Being the very creative chaps we are, we call this new statistic [drumroll] $B$. We also derive a weighted version of this measure $B_w$, and statistics to measure bias in favour/against any single party ($A'$). Of course, our measures can be applied to the sampling of any categorical variable whose distribution is known.

We fully develop all these goodies (and illustrate their usefulness by analysing bias in French pre-election polls) in a paper that
(to our immense satisfaction) has just been accepted for publication in Political Analysis (replication files to follow).

Our module survebias is a Stata ado file that implements these methods. It should become available from SSC over the summer, giving you convenient access to the new methods. I’ll keep you posted.

Today is clearly a day for statistical songs (are there any other days?), so here are some links to get you started.

To kick of the stat song roundup, here are some … interesting insights into the culture that is biostatics, complete with some remarkably dreadful audio material.

Obviously, you tube has a whole channel devoted to statistical songs, featuring, inter alia, Michael Greenacre, of Correspondence Analysis fame. To the true connoisseur,  it might appear a bit overproduced, but this little gem on Single Value Decomposition is very neat.

For the Structural Equation Modelling buffs, nothing compares to Alan Reifman’s annual reprise of  “SEM – the Musical”.

But for the purists, there is only one thing, something that I have watched with awe (and slowly building shock) growing beyond all expectations. The conspiracy against Frequentism have their very own book of Bayesian praise, complete with  LaTex  source, now compromising 40-odd songs including some “previously lost classic songs”, including “Bayesians in the night” (two versions, actually).

Every sentient and internet enabled being in the Western world has by now noticed that Amazon’s “customers who bought this item” algorithm is one of the most successful exercises in machine learning. Like various algorithms used by Google, it is oftentimes accurate as well as slightly frightening.

A friend of mine (who is an engineer) told me that he bought an administrator’s guide to Cisco routers. Amazon concluded that he might also be interested in “Cooking for one”. I, on the other hand, recently browsed the excellent Cambridge “Dictionary of Statistics” and also had a look at “All of Statistics” (preposterous title, but an interesting book – incidentally, it tries to convey statistical basics to engineers interested in machine learning). Amazon suggested to round off my order with – drum roll – “Fifty Shades of Grey”. I’m sure my students would agree that there is an intimate link between these three titles.

Radio 4 never fails to amaze me. This morning, just three minutes before the 9 o’clock news, they interviewed David Spigelhalter. Spiegelhalter is obviously the man who gave us BUGS. But he  is also Winton Professor of the Public Understanding of risk at the University of Cambridge, and a man who can (within the 90 seconds they allocated him) explain to a lay public why a spade in knife-crime (last summer, four people were killed in the space of just one day) is not totally unlikely and does not necessarily indicate an increase in the murder rate, illustrating the idea of clustered risks in passing. He even convinced the anchor that stats is actually fun, even if you look at 170 murders per year in a population of just 7 million Londoners. I was duly impressed (you can listen here to the interview with Spiegelhalter). In fact, I was so impressed that I googled him once I reached the office and came across his website understandinguncertainty.org, which has full coverage of the London murder mystery (that is solved by modelling a Poisson distribution of the incidents).

Many hypothesis in the social sciences involve interaction: The effect of some variable x (say xenophobia) on some variable y (say support for the extreme right) is conditional on a third variable z (say ethnicity). Modelling interactive hypotheses looks straightforward on the surface: simply generate a third variable by multiplying x and z and plug all three in your regression. In Stata, this process can be automated by means of the built-in command xi or by desmat, which is available from SSC.

Click on the citations to get bibliographic data.

Unfortunately, the interpretation of the resulting coefficients is less straightforward. A recent review by Brambor, Clark and Golder (2006), however, suggests that even in top political science journals many interpretations of interaction effects are dubious if not plain wrong. The new book by Kam and Franzese has the potential to rectify this situation. Kam and Franzese start out from the proposition that in interactive models (like in a number of other models they discuss in passing), the effect of an interacted variable x does not equal its coefficient. Rather, one has to differentiate the model equation with respect to x (which requires a working knowledge in introductory calculus or a licence for Mathematica) or must calculate first differences (which is easy). The slim volume will appeal both to advanced students and applied researchers that want to get it right. It is organised around a number of running examples of recent real world political research and compares well with the older monographs in the QASS series (“the green Sage papers”) because “modern” issues such as multi-level models and standard errors for effects are addressed. The latter point is of particular importance because the very concise discussion in Kam and Franzese will save the reader the effort to skim through pages and pages of highly technical econometric treatises. While the mathematical apparatus may look a little daunting at first, it is actually very helpful. Moreover, it is accompanied by clear instructions on how to perform the necessary calculations in Stata.

Technorati Tags: , , , ,

Social Bookmarks: