# Using Topology to Classify Labelled Graphs

## Tags: research

« Previous post: How to improve your mathematical writing — Next post: Developing and Maintaining Gratitude »I have written at lengths about certain aspects of topological data analysis, but I have neglected to discuss one of its main applications, i.e. the classification of graphs. In this post, I will therefore take you on a quick tour of our ICML paper A Persistent Weisfeiler–Lehman Procedure for Graph Classification.

Let us assume that we are given a graph with node label information. The
graph could, for instance, be a molecule, whose nodes are atoms such as
carbon or oxygen, and whose edges indicate chemical bonds. The goal
could now be to classify a given molecule into a set of classes, such as
‘toxic’, ‘carcinogen’, etc. How can we achieve such a classification?
One of the simplest techniques dates back to the 1960s and involves
calculating an *iterative fingerprint* of the graph! This procedure was
suggested by Boris Weisfeiler
and Andrei Lehman (sometimes also transliterated as ‘Leman’) in
their seminal article The reduction of a graph to canonical form and the
algebra which appears therein.

At its core, the algorithm is an iteration scheme that works like this:

- For a node $v$, collect its label and the labels of adjacent nodes in a multiset.
- Assign this multiset a new label by hashing it—with the proviso
that the hashing function is
*perfect*, i.e. it maps distinct labels to distinct values with no collisions. - Replace all node labels by their multiset hashes.

Intuitively, each step of the algorithm accumulates more information from nodes that are further removed from the current node. The hashed multiset label is thus an expression of the neighbourhood around a node—and after a sufficiently large number of iterations, the hashed labels will not change any more.

For example, suppose you are dealing with this simple graph (to
prevent confusion of node labels and node IDs, I used *colours* to
indicate node labels in this example):

Tabulating the neighbourhood of each node then results in the following table:

Now for the hashing step. In this example, *perfect hashing* means
choosing a set of colours that is distinct for every distinct
combination of neighbourhood labels and vertex labels:

Notice how nodes A, B, and G are hashed to the same colour—because
in the first iteration of the algorithm, they cannot be distinguished.
How can we use the information about the hashed labels in a subsequent
comparison task? The answer is lies in *counting* them in a histogram
vector, which is indexed by the unique labels—this is where our
requirement of the perfect hashing function is helpful. For the
previously-shown graph, it looks like this:

The *fingerprint* of this graph, according to the first iteration of the
Weisfeiler–Lehman scheme is therefore $(3, 1, 2, 1)$. Further iterations
just make the feature vector longer (technically, the initial
labels already give rise to a feature vector of counts). This procedure
can now be repeated for higher-order iterations and the resulting
feature vectors can be compared across graphs by evaluating, for
example, their dot product. More formalisations of this idea have
resulted in the very successful Weisfeiler–Lehman Graph
Kernels
publication. This method arguably constitutes the basis for graph neural
networks—in fact, these networks can be seen as a parametrised
version of the Weisfeiler–Lehman iteration scheme. But I digress—if
you are interested in these aspects, please read Michael Bronstein’s
article on going beyond graph
isomorphism
for more details.

Now, despite its great practical utility, this feature vector is lacking
some *structural* information about the graph. It does not know whether
a certain label contributes much to the topological structure of
a graph—such as a ring of carbon atoms would in molecule—or not. To
this end, we introduced a notion of topological relevance for each node
label! Briefly put, we first developed a distance metric that would
permit us turn any *labelled* graph into a *weighted* graph. We then
calculate a persistence barcode, a topological descriptor of the graph.
This descriptor assesses the relevance of a topological feature created
by some node label. We use the topological relevance of each feature as
an additional *weight* for the previously-shown feature vector. In
essence, labels that contribute a large amount of topological structure
in a graph are assigned a higher weight than labels that only contribute
a meagre amount!

Here is a graphical depiction of our process:

If you want to brush up your understanding of the barcode calculation, head on over to Christian’s website; he has an excellent article on persistent homology.

The neat thing about our approach is that we can easily integrate
information about *cycles* into the feature vector—this is
a functionality that the original Weisfeiler–Lehman Graph Kernels
Framework lacks. Moreover, these cycles turn out to be crucial in
improving classification performance—we get an increase of more than
3% in classification accuracy by considering them in some data sets!

If this has whet your appetite, I invite you to read our paper or take a look at the code. If you want to learn more about graph classification using graph kernels, take a look at our recent survey on graph kernels, which will hopefully be officially announced in time for NeurIPS 2020. In the best tradition of Fermat, I would very much like to cover the content of the survey here, but this blog is too small to contain all of it—maybe for a subsequent post?

Until next time!