A round-up of topological data analysis papers at ICML 2019
Even though this year’s International Conference on Machine Learning (ICML) is not quite over yet, I wanted to put down some thoughts about the many inspiring papers I encountered here. Specifically, I will briefly introduce all papers that deal with methods from topological data analysis (TDA) as this is one of the topics of my research.
Caveat lector: I might have missed some promising papers. Any suggestions for additions are more than welcome. Please reach out to me via Twitter or e-mail.
Hofer et al.: Connectivity-Optimized Representation Learning via Persistent Homology
Connectivity-Optimized Representation Learning via Persistent Homology by Christoph Hofer, Roland Kwitt, Marc Niethammer, and Mandar Dixit deals with controlling the latent space of an autoencoder by imposing additional connectivity restrictions. This is achieved by calculating topological features using persistent homology. The paper describes a novel connectivity loss term that is differentiable, thereby very elegantly removing the biggest obstacle that prevented persistent homology being used in a continuous setting. Finally, backpropagation through the persistent homology calculation is possible!
Similar ideas along these lines (but placed in different contexts) can be found in A Topological Regularizer for Classifiers via Persistent Homology and Topological Function Optimization for Continuous Shape Matching.
Kunin et al.: Loss Landscapes of Regularized Linear Autoencoders
In their paper Loss Landscapes of Regularized Linear Autoencoders, Daniel Kunin, Jonathan Bloom, Aleksandrina Goeva, and Cotton Seed analyse the loss landscape of a specific class of autoencoders. Their paper is extremely readable and interesting but the real kicker (for me) is that they use Morse theory to prove properties of the loss function! Moreover, their proofs are very accessible and instructive—the supplementary materials of this paper provide one of the best panoramic overviews of a big chunk of algebraic topology that I am currently aware of.
Mémoli et al.: The Wasserstein Transform
In The Wasserstein Transform, Facundo Mémoli, Zane Smith, and Zhengchao Wan present a general transformation for data sets (or metric spaces, to be more precise) that performs a denoising operation based on optimal transport theory. Connections to the venerable mean shift algorithm are also drawn.
While not directly employing TDA techniques, this paper makes the very interesting case that its transformation could be used to potentially strengthen the ’topological signal’ that is underlying a data set. I feel reminded of Topological De-Noising: Strengthening the Topological Signal by Jennifer Kloke and Gunnar Carlsson, in which a modified variant of mean shift is used (not based on optimal transport theory, though) to perform thresholding.
Nguyen: On Connected Sublevel Sets in Deep Learning
In his paper On Connected Sublevel Sets in Deep Learning, Quynh Nguyen analyses sublevel sets of the loss function. While not directly employing TDA techniques, sublevel sets are of course an integral part of TDA and crop up all over the place—see my article on Morse theory, for example. The paper proves that sublevel sets of a certain class of deep networks are connected and unbounded. This, in turn, implies that no ‘bad’ (in terms of training performance) local valleys exist in the loss landscape. A very interesting theoretical paper that I definitely want to read in much, much detail.
A loosely related paper from this year’s ICLR is Deep, skinny neural networks are not universal approximators by Jesse Johnson that employs a characterisation of level sets of neural networks to disprove the universal approximator property.
Ramamurthy et al.: Topological Data Analysis of Decision Boundaries with Application to Model Selection
The paper Topological Data Analysis of Decision Boundaries with Application to Model Selection presents a very nice approximation of the decision boundary of a classifier by means of a labelled Vietoris–Rips complex (and a corresponding Čech complex). Using this approximation, the complexity of the decision boundary can be modelled. This, in turn, facilitates model selection from a ‘marketplace’: the model that most closely matches the topological features of the decision boundary can be picked.
This is very exciting because it is one of the first papers to work with a topological approximation of the decision boundary of a network. There are some preprints, such as the work by Guss and Salakhutdinov, that deals with a similar setting. Plus, there is of course our own work, also from this year’s ICLR, that deals with assessing the complexity of a neural network, albeit in terms of the weight space itself and not the decision boundary.
Rieck et al: A Persistent Weisfeiler–Lehman Procedure for Graph Classification
In A Persistent Weisfeiler–Lehman Procedure for Graph Classification, we (that is, Christian Bock, Karsten Borwardt, and myself) propose a novel way to integrate topological features into the Weisfeiler–Lehman graph kernel framework. We manage this by defining a metric on the aggregated multisets of each node neighbourhood. In turn, this makes it possible for us to define a filtration on the graph. The persistent homology of that filtration, together with the hashed graph labels, permits us to derive a set of feature vectors that incorporate both topological information (in the form of connected components and cycles), as well as label information.
I am happy and excited about seeing that TDA and its related methods are thriving and starting to carve out their space in machine learning. Let’s see what the future holds! I sincerely hope that this will not be the last post of this sort.