Wednesday, November 02, 2005

 

Text Editors and Scripting for Presenting Measurements

I write academic computer science papers for a living (well, partly, at least). Usually that means I need to run measurements of some program and then massage the raw numbers I get into an understandable form. By now, I've pretty much automated everything, and the workflow might be interesting to read.

First of all, all my papers are written in LaTeX using GNU Emacs. BibTeX is used for bibliographies and I have a single file (in my own FubML format) containing all the references I (or the project) have collected.

The starting point for formatting the measurements is the program that I'm measuring. I set the thing up so it prints some form of header information for every set of measurements and each individual measurement gets printed something like "Input: 2532 (100)", with a name, measured value, and replication count. This is typically even human readable.

Time for processing. I have some generic Perl to read the kind of data I output and then to generate a large array of MetaPost definitions, which includes all the measurements.

Again, after some generic MetaPost code I can get all sorts of aggregate numbers out of the data, like averages and deviations. The graph package of MetaPost is then used for making neat graphs out of this data. Again, most of the code is generic and reusable from one paper to the next. The only-paper specific part is typically the setting of a few variables that control which measurements to include.

Sometimes the data are best presented in a tabular manner. If I need this, the Perl script that reads the raw data also prints out the tables. These get printed in LaTeX tabular format and included as such in the LaTeX document. In fact, pretty much any numbers that I present from the measurements have been automatically generated from the raw data.

Of course, all of this gets tied with Make rules so that a single command first processes the data and then compiles the paper into PDF. I have a generic Makefile for LaTeX processing so I only need to set the dependencies right; these ensure that only the parts that need processing due to changes get processed.

This system has several good sides (well, to me anyway). First of all, everything I use in it has a text-based command language, so I can prepare everything with Emacs (well, nearly so; I haven't yet automated getting information from network packet dumps). Second, whenever I take new measurements (say, if I forgot something or just need more replications) the system rebuilds all the figures without any additional work on my part. Third, since all numbers are generated from the same data, they are consistent. And finally, it's pretty easy to write the Perl scripts so that they don't care about how much data I have, so changing what I'm presenting is typically easy.

There's still one problem: when I take new measurements and update all the figures and tables, the scripts do not yet update any conclusions I've made based on the data...


This page is powered by Blogger. Isn't yours?