Scatterplot matrices with gnuplot
Tags: howtos, visualization« Previous post: Counting words in TeX documents under … — Next post: Shakespeare, topology, and machine … »
Scatterplot matrices are one of the most practical direct visualization of unstructured data sets—provided that the dimensionality does not become too large. Despite their prominence in the visualization community, there are few non-commercial tools out there that are able to produce scatterplot matrices for publication-ready visualizations.
In this post, I want to briefly explain how to use a recent version of gnuplot, one of my favourite plotting tools, to create something that resembles a scatterplot matrix. In this example, I will use “the famous Iris data set”, which I am sure everyone is familiar with by now.
I prepared a slightly-modified version of the data set, which makes
processing it with
gnuplot easier. To this end, I shortened the labels
and inserted newlines after each block of identical labels. This makes
using colours much easier.
The basic idea of the plot is to use the
multiplot mode of
and create all plots in a nested loop. The diagonal elements are filled
with histogram of the corresponding attribute—which is of course
also created using
gnuplot. The only thing I am not proud of is the
use of labels. I am using the
x2label feature in order to put axis
labels on top of a plot, but as
gnuplot does not (yet) support
a simpler mechanism for placing labels in multiple plots, I had to
resort to a truly horrid way of getting this to work…
Without further ado, here’s the code:
And this is how the output looks like:
You can find the code for this script as a gist on GitHub.