All surveys deviate from the true distributions of the variables, but some more so than others. This is particularly relevant in the context of election studies, where the true distribution of the vote is revealed on election night. Wouldn’t it be nice if one could quantify the bias exhibited by pollster X in their pre-election survey(s), with one single number? Heck, you could even model bias in polls, using RHS variables such as time to election, sample size or sponsor of the survey, coming up with an estimate of the infamous “house effect”,.

Jocelyn Evans and I have developed a method for calculating such a figure by extending Martin, Kennedy and Traugott’s measure A to the multi-party case. Being the very creative chaps we are, we call this new statistic [drumroll] B. We also derive a weighted version of this measure B_w, and statistics to measure bias in favour/against any single party (A'). Of course, our measures can be applied to the sampling of any categorical variable whose distribution is known.

We fully develop all these goodies (and illustrate their usefulness by analysing bias in French pre-election polls) in a paper that
(to our immense satisfaction) has just been accepted for publication in Political Analysis (replication files to follow).

Our module survebias is a Stata ado file that implements these methods. It should become available from SSC over the summer, giving you convenient access to the new methods. I’ll keep you posted.

Today is clearly a day for statistical songs (are there any other days?), so here are some links to get you started.

To kick of the stat song roundup, here are some … interesting insights into the culture that is biostatics, complete with some remarkably dreadful audio material.

Obviously, you tube has a whole channel devoted to statistical songs, featuring, inter alia, Michael Greenacre, of Correspondence Analysis fame. To the true connoisseur,  it might appear a bit overproduced, but this little gem on Single Value Decomposition is very neat.

For the Structural Equation Modelling buffs, nothing compares to Alan Reifman’s annual reprise of  “SEM – the Musical”.

But for the purists, there is only one thing, something that I have watched with awe (and slowly building shock) growing beyond all expectations. The conspiracy against Frequentism have their very own book of Bayesian praise, complete with  LaTex  source, now compromising 40-odd songs including some “previously lost classic songs”, including “Bayesians in the night” (two versions, actually).


Every sentient and internet enabled being in the Western world has by now noticed that Amazon’s “customers who bought this item” algorithm is one of the most successful exercises in machine learning. Like various algorithms used by Google, it is oftentimes accurate as well as slightly frightening.

A friend of mine (who is an engineer) told me that he bought an administrator’s guide to Cisco routers. Amazon concluded that he might also be interested in “Cooking for one”. I, on the other hand, recently browsed the excellent Cambridge “Dictionary of Statistics” and also had a look at “All of Statistics” (preposterous title, but an interesting book – incidentally, it tries to convey statistical basics to engineers interested in machine learning). Amazon suggested to round off my order with – drum roll – “Fifty Shades of Grey”. I’m sure my students would agree that there is an intimate link between these three titles.

Statistics and Data links roundup for December 2010 through March 2011:

  • Discrete Choice Methods with Simulation, by Kenneth Train, Cambridge University Press, 2002 – Discrete Choice Geodatenzentrum – Hier erhalten Sie vielfältige Informationen über die Geobasisdaten der Bundesländer und des Bundes. Nutzen Sie unsere Dienste und interaktiven Karten für Bestellung, Download, Suche oder Verarbeitung von Geoinformationen.
  • Statistisches Bundesamt Deutschland – Statistik lokal – Statistik lokal 2010 ist eine von den Statistischen Ämtern des Bundes und der Länder gemeinsam herausgegebene Datenbank auf DVD, die Gemeindedaten für ganz Deutschland enthält. Mit Statistik lokal 2010 können Sie über 12 000 Städte und Gemeinden in ganz Deutschland anhand ausgewählter Ergebnisse aus allen wichtigen Bereichen der amtlichen Statistik mit derzeit rund 330 Merkmalsausprägungen analysieren und vergleichen. Die DVD enthält auch die Ergebnisse für alle Kreise (kreisfreie Städte und Landkreise), Regierungsbezirke/Statistische Regionen, Bundesländer und Deutschland.
Statistics and Data links roundup for January through September 2010

  • Making Working and Publication Tables in Stata – Review of various commands for making tables in Stata
  • OS OpenData Supply – Download or order Ordnance Survey OpenData – open data * Outline of Great Britain<br />
    * Overview of Great Britain<br />
    * MiniScale ®<br />
    * 1:250 000 Scale Colour Raster<br />
    * OS Street View ®<br />
    * Boundary-Line ™<br />
    * OS VectorMap ™ District – New
  • European Network for the Analysis of Political Text – The European Network for the Analysis of Political Texts (ENAPT) is a newly established network of PhD students and early career researchers who share an interest in the qualitative and quantitative analysis of party manifestos and other political text.<br />
    <br />
    The objectives of ENAPT include:<br />
    <br />
    * To collect and analyse political text in a systematic fashion using a combination of qualitative and quantitative methodologies,<br />
    * To organise meetings and workshops (primarily aimed at PhD students and early career researchers) regarding the coding of political text,<br />
    * To facilitate the dissemination of the findings of its members through conferences, workshops and the world-wide-web.<br />
    <br />
    ENAPT operates a mailing list at the National Academic Mailing List Service. To join the network and the mailing list, email Kostas Gemenis
Statistics and Data links roundup for November 23rd through December 29th:

  • The Data and Story Library – DASL (pronounced “dazzle”) is an online library of datafiles and stories that illustrate the use of basic statistics methods. We hope to provide data from a wide variety of topics so that statistics teachers can find real-world examples that will be interesting to their students. Use DASL’s powerful search engine to locate the story or datafile of interest.
  • Drawing graphs using tikz/pgf & gnuplot | politicaldata.org -
Statistics and Data links roundup for November 14th through November 23rd:

It’s surprisingly difficult to find suitable datasets for a sna workshop that are relevant for political scientists.

Radio 4 never fails to amaze me. This morning, just three minutes before the 9 o’clock news, they interviewed David Spigelhalter. Spiegelhalter is obviously the man who gave us BUGS. But he  is also Winton Professor of the Public Understanding of risk at the University of Cambridge, and a man who can (within the 90 seconds they allocated him) explain to a lay public why a spade in knife-crime (last summer, four people were killed in the space of just one day) is not totally unlikely and does not necessarily indicate an increase in the murder rate, illustrating the idea of clustered risks in passing. He even convinced the anchor that stats is actually fun, even if you look at 170 murders per year in a population of just 7 million Londoners. I was duly impressed (you can listen here to the interview with Spiegelhalter). In fact, I was so impressed that I googled him once I reached the office and came across his website understandinguncertainty.org, which has full coverage of the London murder mystery (that is solved by modelling a Poisson distribution of the incidents).


Via Simon Jackman’s blog: Chris Jordan found an intriguing way to visualise some very large, mostly scary national statistics, such as the as the number of plastic cups used on flights in the US every six hours (one million), or the number of cell phones retired every day (426,000). Amazing and aesthetically pleasing in a most disturbing way.Technorati-Tags: , , ,