Thanks to Kit Baum over at Boston College, our Stata add-on surveybias.ado is now available from Statistical Software Components (SSC). The add-on takes as its argument the name of a categorical variable and said variable’s true distribution in the population. For what it’s worth, the program tries to be smart: surveybias vote, popvalues(900000 1200000 1800000), surveybias vote, popvalues(0.2307692 0.3076923 0.4615385), and surveybias vote, popvalues(23.07692 30.76923 46.15385) should all give the same result.
If you don’t have access to the raw data but want to assess survey bias evident in published figures, there is surveybiasi, an “immediate” command that lets you do stuff like this: surveybiasi , popvalues(30 40 30) samplevalues(40 40 20) n(1000). Again, you may specify absolute values, relative frequencies, or percentages.
Like social networks, multilevel data structures are everywhere once you start thinking about it. People live in neighbourhoods, neighbourhoods are nested in municipalities, which make up provinces – well, you get the picture. Even if we have no substantive interest in their effects, it often makes sense to control for structures in our data to get more realistic standard errors.
Now the good folks over at the European Social Survey have reacted and spent the Descartes Prize money on compiling multilevel information and merging them with their own data. So far, the selection is a little bit disappointing in some respects. Homicide rates, for instance, are reported on the national level only. But there are some pleasant surprises (I guess due to Eurostat, who collect such things): We get unemployment, GDP growth and even student numbers at the NUTS-3 level. Since you asked, NUTS is the Nomenclature of (subnational) Territory, and level 3 is the lowest level for which comparative data are normally published.
Regrettably, the size and number of level 3 units is not necessarily comparable across countries: For Germany, level 3 corresponds to about 400 local government districts, while France is divided into 96 European Departments. But if you need to combine top-notch survey data with small(ish) regional data, it’s a start, and not a bad one.
I’m teaching a lecture course on Political Sociology at the moment, and because everyone is so excited about social capital and social network analysis these days, I decided to run a little online experiment with and on my students. The audience is large (at the beginning of this term, about 220 students had registered for this lecture series) and quite diverse, with some students still in their first year, others in their second, third or fourth and even a bunch of veterans who have spent most of their adult lives in university education.
Who knows whom in a large group of learners?
Fortunately, I had a list of full names plus email addresses for everyone who had signalled interest in the lecture before the beginning of term, so I created a short questionnaire in limesurvey and asked them a very simple question: whom do you know in this group? Given the significant overcoverage of my list – in reality, there are probably not more than 120 students who regularly turn up for the lecture – the response rate was somewhere in the high 70s. If you want to collect network data with limesurvey, the “array with flexible labels” question type is your friend, but keying in 220 names plus unique ids would have been a major pain. Thankfully, one can program the question with a single placeholder name, then export it as a CSV file. Next, simply load the file into Emacs and insert the complete list, then re-import it in limesurvey.
#Some boring stuff omitted #create network Lecture=nx.DiGraph() #Initialise for i in range(1,221): Lecture.add_node(i, stdg="0") for line in netreader: sender = int(line[-1]) #Sender-ID at the very end edges=line[6:216]
#Edges for index in range(len(edges)): if edges[index] == '2': Lecture.add_edge(sender,int(filter(str.isdigit,repr(knoten[index]))),weight=2) elif edges[index] == '3': Lecture.add_edge(sender,int(filter(str.isdigit,repr(knoten[index]))),weight=3) nx.write_pajek(Lecture,'file.net')
As it turns out, a lecture hall rebellion seems not very likely. About one third of all relationships are not reciprocated, and about a quarter of my students do not know a single other person in the room (at least not by name), so levels of social capital are pretty low. There is, however, a small group of 10 mostly older students who are form a tightly-knit core, and who know many of the suckers in the periphery. I need to keep an eye on these guys.
260 reciprocated ties within the same group
Finally, the second graph also shows that those relatively few students who are enrolled in our new BA programs (red, dark blue) are pretty much isolated within the larger group, which is still dominated by students enrolled in the old five year programs (MA yellow, State Examination green) that are phased out. Divide et impera.
Should one weight their survey data? Is it worth the effort? The short answer must be ‘maybe’ or ‘it depends’. A slightly longer and much more useful answer was given by Leslie Kish in his enormously helpful paper ‘Weighting: Why, when and how’. Today (well, actually I submitted the final manuscript 2.5 years ago – that’s scientific progress for you!), I have added my own two cent with a short chapter that looks at the effects and non-effects of common weighting procedures (in German). The bottom line is that if you employ the usual weighting variables (age, gender, education and maybe class or region) as controls in your regression, weighting will make next to no difference but might mess with your standard errors.