Using Topology to Classify Labelled Graphs
Tags: research
I have written at lengths about certain aspects of topological data analysis, but I have neglected to discuss one of its main applications, i.e. the classification of graphs. In this post, I will therefore take you on a quick tour of our ICML paper A Persistent Weisfeiler–Lehman Procedure for Graph Classification.
Let us assume that we are given a graph with node label information. The graph could, for instance, be a molecule, whose nodes are atoms such as carbon or oxygen, and whose edges indicate chemical bonds. The goal could now be to classify a given molecule into a set of classes, such as ’toxic’, ‘carcinogen’, etc. How can we achieve such a classification? One of the simplest techniques dates back to the 1960s and involves calculating an iterative fingerprint of the graph! This procedure was suggested by Boris Weisfeiler and Andrei Lehman (sometimes also transliterated as ‘Leman’) in their seminal article The reduction of a graph to canonical form and the algebra which appears therein.
At its core, the algorithm is an iteration scheme that works like this:
- For a node $v$, collect its label and the labels of adjacent nodes in a multiset.
- Assign this multiset a new label by hashing it—with the proviso that the hashing function is perfect, i.e. it maps distinct labels to distinct values with no collisions.
- Replace all node labels by their multiset hashes.
Intuitively, each step of the algorithm accumulates more information from nodes that are further removed from the current node. The hashed multiset label is thus an expression of the neighbourhood around a node—and after a sufficiently large number of iterations, the hashed labels will not change any more.
For example, suppose you are dealing with this simple graph (to prevent confusion of node labels and node IDs, I used colours to indicate node labels in this example):
Tabulating the neighbourhood of each node then results in the following table:
Now for the hashing step. In this example, perfect hashing means choosing a set of colours that is distinct for every distinct combination of neighbourhood labels and vertex labels:
Notice how nodes A, B, and G are hashed to the same colour—because in the first iteration of the algorithm, they cannot be distinguished. How can we use the information about the hashed labels in a subsequent comparison task? The answer is lies in counting them in a histogram vector, which is indexed by the unique labels—this is where our requirement of the perfect hashing function is helpful. For the previously-shown graph, it looks like this:
The fingerprint of this graph, according to the first iteration of the Weisfeiler–Lehman scheme is therefore $(3, 1, 2, 1)$. Further iterations just make the feature vector longer (technically, the initial labels already give rise to a feature vector of counts). This procedure can now be repeated for higher-order iterations and the resulting feature vectors can be compared across graphs by evaluating, for example, their dot product. More formalisations of this idea have resulted in the very successful Weisfeiler–Lehman Graph Kernels publication. This method arguably constitutes the basis for graph neural networks—in fact, these networks can be seen as a parametrised version of the Weisfeiler–Lehman iteration scheme. But I digress—if you are interested in these aspects, please read Michael Bronstein’s article on going beyond graph isomorphism for more details.
Now, despite its great practical utility, this feature vector is lacking some structural information about the graph. It does not know whether a certain label contributes much to the topological structure of a graph—such as a ring of carbon atoms would in molecule—or not. To this end, we introduced a notion of topological relevance for each node label! Briefly put, we first developed a distance metric that would permit us turn any labelled graph into a weighted graph. We then calculate a persistence barcode, a topological descriptor of the graph. This descriptor assesses the relevance of a topological feature created by some node label. We use the topological relevance of each feature as an additional weight for the previously-shown feature vector. In essence, labels that contribute a large amount of topological structure in a graph are assigned a higher weight than labels that only contribute a meagre amount!
Here is a graphical depiction of our process:
If you want to brush up your understanding of the barcode calculation, head on over to Christian’s website; he has an excellent article on persistent homology.
The neat thing about our approach is that we can easily integrate information about cycles into the feature vector—this is a functionality that the original Weisfeiler–Lehman Graph Kernels Framework lacks. Moreover, these cycles turn out to be crucial in improving classification performance—we get an increase of more than 3% in classification accuracy by considering them in some data sets!
If this has whet your appetite, I invite you to read our paper or take a look at the code. If you want to learn more about graph classification using graph kernels, take a look at our recent survey on graph kernels, which will hopefully be officially announced in time for NeurIPS 2020. In the best tradition of Fermat, I would very much like to cover the content of the survey here, but this blog is too small to contain all of it—maybe for a subsequent post?
Until next time!