Scatterplot matrices with gnuplot

Tags: howtos, visualization

Published on
« Previous post: Counting words in TeX documents under … — Next post: Shakespeare, topology, and machine … »

Scatterplot matrices are one of the most practical direct visualization of unstructured data sets—provided that the dimensionality does not become too large. Despite their prominence in the visualization community, there are few non-commercial tools out there that are able to produce scatterplot matrices for publication-ready visualizations.

In this post, I want to briefly explain how to use a recent version of gnuplot, one of my favourite plotting tools, to create something that resembles a scatterplot matrix. In this example, I will use “the famous Iris data set”, which I am sure everyone is familiar with by now.

I prepared a slightly-modified version of the data set, which makes processing it with gnuplot easier. To this end, I shortened the labels and inserted newlines after each block of identical labels. This makes using colours much easier.

The basic idea of the plot is to use the multiplot mode of gnuplot and create all plots in a nested loop. The diagonal elements are filled with histogram of the corresponding attribute—which is of course also created using gnuplot. The only thing I am not proud of is the use of labels. I am using the x2label feature in order to put axis labels on top of a plot, but as gnuplot does not (yet) support a simpler mechanism for placing labels in multiple plots, I had to resort to a truly horrid way of getting this to work…

Without further ado, here’s the code:

And this is how the output looks like:

Scatterplot matrix of the Iris data set

You can find the code for this script as a gist on GitHub.