Underestimating Users

Tags: academia, research, musings

Published on
« Previous post: I Don’t Understand AI Alignment — Next post: Come As You Are »

Back when I was still doing research in visualization, a field concerned with making complex data sets understandable, our field was haunted by the spectre of a ostensibly misunderstood tool: the Rainbow colour map.

Assuming that most of my readers are unfamiliar with this, here is a brief and incomplete introduction to colour maps: The goal of a colour map is to assign colours to data, so that pattern can emerge. For instance, suppose you measure the temperature at some places in your country. To show these temperatures on a map, you might pick colours that range from ‘rather cold’ to ‘rather hot.’ In the parlance of our research field, this would call for a continuous colour map, since temperature is (hopefully) a continuous quantity.

Next to aesthetic choices and questions of colour-blindness, our field used to be convinced—and probably is—that colour maps should be chosen such that different gradations in consecutive colours are perceptually uniform. In other words, your perception of differences in colours should roughly match the differences of their underlying values. So far, so good! It turns out that algorithmically-generated colour maps, i.e. colour maps that can be easily generated, are often not perceptually uniform. Even worse (or so I believed): The default choice of colour map in some common software packages was our enemy, the notorious Rainbow colour map! Myriads of papers use(d) this colour map, even though our research showed that the rainbow colour map introduced patterns where none are. This led to all kinds of nice papers following the ‘X considered harmful style,’ such as work by Borland and Taylor.

At the time, I thought this was a pretty big deal,1 and I would use my signature look of superiority whenever I encountered such a paper. In my head, I imagined scientists using this colour map to be uninformed, being led astray to search for patterns where none exist.

How wrong and presumptuous of me—although one might show some clemency and chalk it up to the usual exuberance and arrogance often found in over-educated but not particularly wise persons. Now, with some additional hindsight and having actually talked to users of these colour maps, my stance changed. I turns out that users are aware, at least on an intuitive level, of the fact that these colour maps might misrepresent things. Instead of treating the visualization as a sacrosanct thing that showed truth—as I was taught—they instead just treated it as another tool that helped them arrive at certain hypotheses. However, they did not stop there, and would actually test any particular hypothesis about their data.

In my arrogance, I had underestimated the users of our tools.

Now, with less hair but more wisdom, I see similar things happen in my new field of research, viz. applications of machine learning. Here, there is a vicious2 debate on the use of dimensionality-reduction techniques like UMAP or t-SNE.

Such techniques make it possible to visualise some aspects of complex high-dimensional data sets, and they have received enormous attention in computational biology. A priori, when going from a high-dimensional space to a low-dimensional one, it is clear that not everything can be easily preserved. Thus, a two-dimensional plot cannot possibly preserve all distances or all clusters in a meaningful fashion—and some computational biologists take umbrage at the fact that their colleagues nevertheless use UMAP & friends without discussing their limitations.

That is only part of the truth, though: It turns out that computational biologists know quite well that dimensionality reduction distorts the truth; they use the visualisations as a tool for hypothesis-generation and also as a way to showcase some overall aspects of their data, all the while knowing full well that some aspects are not preserved.

The debate on whether to use or not use dimensionality reduction reminds me very vividly of my righteous fight against rainbow colour maps, and it appears to me that the vicious, sometimes even polemic, critiques of UMAP and other dimensionality-reduction methods are as unwarranted as my critiques of rainbow colour maps.3

Do not underestimate practitioners. They also know their stuff! Moreover, as an epilogue, consider the saga of rainbow colour maps. The wind appears to have shifted, and maybe, just maybe, rainbow colour maps are not that bad after all.

I hope we can say the same thing about dimensionality-reduction methods at some point.


  1. As the saying goes: Everything is such a pretty big deal in academia because the stakes are so low… ↩︎

  2. Just go on Twitter. ↩︎

  3. Although, truth be told, I always just felt smug on the inside and never started fights on social media. ↩︎