Every remotely relevant reference I came across during the last 15 years or so resides in a single bibtex file. That is not a problem. The problem is that I’m moving into a shiny, new but somewhat smaller office, together with hundreds of copies of journal articles and hundreds of PDFs. Wouldn’t it be good to know which physical copies are effectively redundant (unreadable comments in the margins aside) and can therefore stay behind?

The trouble is that bibtex files have a rather flexible, human readable format. Each entry begins with the @ sign, followed by a type (book, article etc.), a reference name,  lots of key/value pairs (fields) in arbitrary order,  and even more curly braces.

`grep @ full.bib|wc -l` tells me that I have 2914 references in total. `grep binder|wc -l` (binder is a custom field that I use to keep track of the location of my copies) shows that I have printed out/copied 712 texts over the years, and `grep file|wc -l` indicates that there are 504 PDFs residing on my filesystem. But what is the magnitude of the intersection?

My first inclination was to look for a suitable Python parser/library. Pybtex looked good in principle but is underdocumented and had trouble reading full.bib, because that is encoded in Latin 1. So it was endless hours of amateurish coding and procrastination ahead. Then I remembered the “do one thing, and do it really well” mantra of old. Enter bibtool, which is a fast and reasonably stable bibtex file filter and pretty printer. Bibtool reads “resource files”, which are really just short scripts containing filtering/formatting directives. `select = {binder ".+"}` keeps those references whose “binder” field contains at least one character (`.+` is a regular expression that matches any non-empty string). `select = {file ".+"}` selects all references for which I have a PDF. But bibtool applies a logical OR to these conditions while I’m interested in finding those references that meet both criteria.

The quick solution is to store each statement in a file of its own and apply bibtool twice, using a pipeline for extra efficiency: `bibtool -r find-binder.rsc full.bib|bibtool -r find-pdf >intersection.bib` does the trick and solves my problem in under a minute, without any coding.

As it turns out, there were just 65 references in both groups. Apparently, I stopped printing (or at least filing away) some time ago. Eventually, I binned two copies, but it is the principle that matters.

2019 Update

I still use bibtool for quick filtering/reformatting tasks at the command line, but for more complex jobs involving programmatic access to bibtex files from R, RefManageR is a wonderful package.  I have used it here in a bibliometric study of the Radical/Extreme Right literature. And my nifty RRResRobot also relies heavily on RefManageR. If you are interested at all in RefManageR, here is a short and sweet introduction.

My default for writing anything that is longer than a page is LaTeX  (possibly via org-mode, if it is short and simple). In fact, the bond that ties me to the LaTeX/Emacs combo is so strong that I want to use it even for texts that are exactly one page long, i.e. conference posters.

CTAN lists a lot of packages and frameworks for posters, but I found most of them too heavy/compl

Political Geography Conference Poster

ex. I don’t create a lot of conference posters and did not want to spend ages putting a few words and graphs on a sheet of glossy paper. At the end of the day, I decided to give beamerp

oster a spin. Beamerposter is an add-on that transforms my favourite presentation package into a poster printing machine. I did not really like the default themes, but Rob Hyndman has created a very alternative nice template that I adapted slightly.

I rather like the result and will go back to the package for the next poster.

I use emacs/$LaTeX$for all my textprocessing needs, and for the last four or five years, I have created all my slides with Till Tantaus excellent “beamer” class. At the moment, I’m teaching a 2nd year stats course (imagine doing this with PowerPoint – the horror! the horror!), so I sometimes use graphs from the assigned text like this one from Long&Freese that illustrates the latent variable/threshold interpretation of the binary logit model. The message should be fairly clear: $y^{*}$ depends on $x$ andfollows a standard logistic distribution around its conditional mean.

But the fact that the bell-curve lies flat in the $x-y^{*}$ plane confused my students no end. So I wasted half a day on creating a nice 3d-plot for them. After trying several options, I settled on pgfplots.sty, which builds on tikz/pgf, the comprehensive, portable graphics package designed by Tantau (here’s a gallery with most amazing examples of what you can do with this little gem). Plotting data and functions with pgfplots in 2d or 3d is a snap, so that was not too hard. Eventually.

Finally, in a desperate attempt to drive the message home, I enlisted the help of animate.sty, yet another amazing package that creates a javascript-based inline animation from my $LaTeX$ source (requires Acrobat reader). So the bell-curves pop out of the plane, in slow motion. Did it help the students to see the light? I have no idea. Here is the source.

A couple of weeks ago, I posted an article on how make and Makefiles can help you to organise your Stata projects. If you are working in a unix environnment, you’ll already have make installed. If you work under Windows, install GNU make – it’s free, and it can make your Stata day. Rather unsurprisingly, make is also extremely useful if you have large or medium-sized latex project (or if you want to include tables and/or graphs produced by Stata) in a latex document. For instance, this comes handy if you have eps-Figures and use pdflatex. pdflatex produces pdf files instead of dvi files. If you produces slides with, this can save you a lot of time because you don’t have to go through the latex – dvips – ps2pdf cycle. However, pdflatex cannot read eps files: you have to convert your eps files with pstoedit to the meta post format, then use meta post to convert them to mps (which can be read by pdflatex). With this Makefile snippet, everything happens automagically:

` #New implicit rules for conversion of eps->mp->mps #Change path if you have installed pstoedit in some other place %.mp : %.eps c:pstoedit/pstoedit.exe -f mpost \$*.eps \$*.mp`

%.mps: %.mp
mpost \$*.mp
mv \$*.1 \$*.mps
rm \$*.mp

#Now specify a target

presentation.pdf: presentation.tex mytab1.tex myfig.mps

#Optional: if you want to create dataset x.eps, run x.do
#Stata must be in your path
%.eps : %.do
tab wstata -e do \$<

Now type make presentation.pdf, and make will call Stata, pstoedit, metapost and pdflatex as required. If you need more figures, just write the do-file and add a dependency.

Social Bookmarks:

Technorati Tags: , , , , , , , , , , , , ,