Which of my students are most likely to gang up against me?

I’m teaching a lecture course on Political Sociology at the moment, and because everyone is so excited about social capital and social network analysis these days, I decided to run a little online experiment with and on my students. The audience is large (at the beginning of this term, about 220 students had registered for this lecture series) and quite diverse, with some students still in their first year, others in their second, third or fourth and even a bunch of veterans who have spent most of their adult lives in university education.

Who knows whom in a large group of learners?

Fortunately, I had a list of full names plus email addresses for everyone who had signalled interest in the lecture before the beginning of term, so I created a short questionnaire in limesurvey and asked them a very simple question: whom do you know in this group? Given the significant overcoverage of my list – in reality, there are probably not more than 120 students who regularly turn up for the lecture – the response rate was somewhere in the high 70s. If you want to collect network data with limesurvey, the “array with flexible labels” question type is your friend, but keying in 220 names plus unique ids would have been a major pain. Thankfully, one can program the question with a single placeholder name, then export it as a CSV file. Next, simply load the file into Emacs and  insert the complete list, then re-import it in limesurvey.

Getting  a data matrix from Stata into Pajek is not necessarily a fun exercise, so I decided to give the networkx module for Python a go, which is simply superb. Networkx has data types for representing social networks, so you can read in a rectangular data matrix (again as CSV),  construct the network in Python and export the whole lot to Pajek with a few lines of code:


#Some boring stuff omitted
#create network
Lecture=nx.DiGraph()
#Initialise
for i in range(1,221):
Lecture.add_node(i, stdg="0")
for line in netreader:
sender = int(line[-1])
#Sender-ID at the very end
edges=line[6:216] #Degree-scheme
Lecture.node[sender]['stdg']=line[-8] #Edges
for index in range(len(edges)):
if edges[index] == '2':
Lecture.add_edge(sender,int(filter(str.isdigit,repr(knoten[index]))),weight=2)
elif edges[index] == '3':
Lecture.add_edge(sender,int(filter(str.isdigit,repr(knoten[index]))),weight=3)
nx.write_pajek(Lecture,'file.net')

As it turns out, a lecture hall rebellion seems not very likely. About one third of all relationships are not reciprocated, and about a quarter of my students do not know a single other person in the room (at least not by name), so levels of social capital are pretty low.  There is, however, a small group of 10 mostly older students who are form a tightly-knit core, and who know many of the suckers in the periphery. I need to keep an eye on these guys.

260 reciprocated ties within the same group

Finally, the second graph also shows that those relatively few students who are enrolled in our new BA programs (red, dark blue) are pretty much isolated within the larger group, which is still dominated by students enrolled in the old five year programs (MA yellow, State Examination green) that are phased out. Divide et impera.

Reblog this post [with Zemanta]

How to get from Stata to Pajek

I’m teaching an introductory SNA class this year. Following a time-honoured tradition, I conducted a small network survey at the beginning of the class using Limesurvey. Getting the data from Limesurvey to Stata via CSV was easy enough. Here is the data set. But how does one get the data from Stata to Pajek for analysis? Actually, it’s quite easy.

First, we need to change the layout of the data. In the data set, there is one record for each of the 13 respondent. Each record has 13 variables, one for each (potential) arc connecting the respondent to other students in the class. This is equivalent to Stata’s “wide” form. Stata’s reshape command will happily re-arrange the data to the “long” form, with one record for each arc. This is what Pajek requires.

Second, we need to save the data as an ASCII file that can be read into Pajek. This is most easily done using Roger Newson’s listtex, which can be tweaked to write the main chunks of a Pajek file. Here is the code, which should be readily adapted to your own problems.

If you are interested, you can get the whole package from within Stata: net from http://www.kai-arzheimer.com/stata/

Reblog this post [with Zemanta]

Statistics and Data links roundup for November 23rd through December 29th

Statistics and Data links roundup for November 23rd through December 29th:

  • The Data and Story Library – DASL (pronounced “dazzle”) is an online library of datafiles and stories that illustrate the use of basic statistics methods. We hope to provide data from a wide variety of topics so that statistics teachers can find real-world examples that will be interesting to their students. Use DASL’s powerful search engine to locate the story or datafile of interest.
  • Drawing graphs using tikz/pgf & gnuplot | politicaldata.org

Statistics and Data links roundup for November 14th through November 23rd

Statistics and Data links roundup for November 14th through November 23rd:

It’s surprisingly difficult to find suitable datasets for a sna workshop that are relevant for political scientists.