Jun 232013
All surveys deviate from the true distributions of the variables, but some more so than others. This is particularly relevant in the context of election studies, where the true distribution of the vote is revealed on election night. Wouldn’t it be nice if one could quantify the bias exhibited by pollster X in their pre-election survey(s), with one single number? Heck, you could even model bias in polls, using RHS variables such as time to election, sample size or sponsor of the survey, coming up with an estimate of the infamous “house effect”,.

Jocelyn Evans and I have developed a method for calculating such a figure by extending Martin, Kennedy and Traugott’s measure A to the multi-party case. Being the very creative chaps we are, we call this new statistic [drumroll] B. We also derive a weighted version of this measure B_w, and statistics to measure bias in favour/against any single party (A'). Of course, our measures can be applied to the sampling of any categorical variable whose distribution is known.

We fully develop all these goodies (and illustrate their usefulness by analysing bias in French pre-election polls) in a paper that
(to our immense satisfaction) has just been accepted for publication in Political Analysis (replication files to follow).

Our module survebias is a Stata ado file that implements these methods. It should become available from SSC over the summer, giving you convenient access to the new methods. I’ll keep you posted.

Apr 262013
Today is clearly a day for statistical songs (are there any other days?), so here are some links to get you started.

To kick of the stat song roundup, here are some … interesting insights into the culture that is biostatics, complete with some remarkably dreadful audio material.

Obviously, you tube has a whole channel devoted to statistical songs, featuring, inter alia, Michael Greenacre, of Correspondence Analysis fame. To the true connoisseur,  it might appear a bit overproduced, but this little gem on Single Value Decomposition is very neat.

It had to be U - the SVD song
Watch this video on YouTube.

For the Structural Equation Modelling buffs, nothing compares to Alan Reifman’s annual reprise of  “SEM – the Musical”.

But for the purists, there is only one thing, something that I have watched with awe (and slowly building shock) growing beyond all expectations. The conspiracy against Frequentism have their very own book of Bayesian praise, complete with  LaTex  source, now compromising 40-odd songs including some “previously lost classic songs”, including “Bayesians in the night” (two versions, actually).


Jan 052013
Every sentient and internet enabled being in the Western world has by now noticed that Amazon’s “customers who bought this item” algorithm is one of the most successful exercises in machine learning. Like various algorithms used by Google, it is oftentimes accurate as well as slightly frightening.

A friend of mine (who is an engineer) told me that he bought an administrator’s guide to Cisco routers. Amazon concluded that he might also be interested in “Cooking for one”. I, on the other hand, recently browsed the excellent Cambridge “Dictionary of Statistics” and also had a look at “All of Statistics” (preposterous title, but an interesting book – incidentally, it tries to convey statistical basics to engineers interested in machine learning). Amazon suggested to round off my order with – drum roll – “Fifty Shades of Grey”. I’m sure my students would agree that there is an intimate link between these three titles.

Random Fun Fact of the Day: Machine Learning and Statistics 1
Mar 192009
Radio 4 never fails to amaze me. This morning, just three minutes before the 9 o’clock news, they interviewed David Spigelhalter. Spiegelhalter is obviously the man who gave us BUGS. But he  is also Winton Professor of the Public Understanding of risk at the University of Cambridge, and a man who can (within the 90 seconds they allocated him) explain to a lay public why a spade in knife-crime (last summer, four people were killed in the space of just one day) is not totally unlikely and does not necessarily indicate an increase in the murder rate, illustrating the idea of clustered risks in passing. He even convinced the anchor that stats is actually fun, even if you look at 170 murders per year in a population of just 7 million Londoners. I was duly impressed (you can listen here to the interview with Spiegelhalter). In fact, I was so impressed that I googled him once I reached the office and came across his website understandinguncertainty.org, which has full coverage of the London murder mystery (that is solved by modelling a Poisson distribution of the incidents).


David Spiegelhalter on Risk, Knife-Crime and the Probability of Being Killed in London 2
Jul 012008
Via Simon Jackman’s blog: Chris Jordan found an intriguing way to visualise some very large, mostly scary national statistics, such as the as the number of plastic cups used on flights in the US every six hours (one million), or the number of cell phones retired every day (426,000). Amazing and aesthetically pleasing in a most disturbing way.Technorati-Tags: statistics, art, politics, USA
Mar 162008
Many hypothesis in the social sciences involve interaction: The effect of some variable x (say xenophobia) on some variable y (say support for the extreme right) is conditional on a third variable z (say ethnicity). Modelling interactive hypotheses looks straightforward on the surface: simply generate a third variable by multiplying x and z and plug all three in your regression. In Stata, this process can be automated by means of the built-in command xi or by desmat, which is available from SSC.

Click on the citations to get bibliographic data.

Unfortunately, the interpretation of the resulting coefficients is less straightforward. A recent review by Brambor, Clark and Golder (2006), however, suggests that even in top political science journals many interpretations of interaction effects are dubious if not plain wrong. The new book by Kam and Franzese has the potential to rectify this situation. Kam and Franzese start out from the proposition that in interactive models (like in a number of other models they discuss in passing), the effect of an interacted variable x does not equal its coefficient. Rather, one has to differentiate the model equation with respect to x (which requires a working knowledge in introductory calculus or a licence for Mathematica) or must calculate first differences (which is easy). The slim volume will appeal both to advanced students and applied researchers that want to get it right. It is organised around a number of running examples of recent real world political research and compares well with the older monographs in the QASS series (“the green Sage papers”) because “modern” issues such as multi-level models and standard errors for effects are addressed. The latter point is of particular importance because the very concise discussion in Kam and Franzese will save the reader the effort to skim through pages and pages of highly technical econometric treatises. While the mathematical apparatus may look a little daunting at first, it is actually very helpful. Moreover, it is accompanied by clear instructions on how to perform the necessary calculations in Stata.

Technorati Tags: political science, statistics, interaction, stata, quantitative methods

Social Bookmarks:
Review: Modeling and Interpreting Interactive Hypotheses in Regression Analysis 5Review: Modeling and Interpreting Interactive Hypotheses in Regression Analysis 6Review: Modeling and Interpreting Interactive Hypotheses in Regression Analysis 7Review: Modeling and Interpreting Interactive Hypotheses in Regression Analysis 8Review: Modeling and Interpreting Interactive Hypotheses in Regression Analysis 9Review: Modeling and Interpreting Interactive Hypotheses in Regression Analysis 10Review: Modeling and Interpreting Interactive Hypotheses in Regression Analysis 11Review: Modeling and Interpreting Interactive Hypotheses in Regression Analysis 12Review: Modeling and Interpreting Interactive Hypotheses in Regression Analysis 13Review: Modeling and Interpreting Interactive Hypotheses in Regression Analysis 14Review: Modeling and Interpreting Interactive Hypotheses in Regression Analysis 15Review: Modeling and Interpreting Interactive Hypotheses in Regression Analysis 16