# Towards topological machine learning

## Tags: research

I am incredibly grateful about how my academic year started so far: four preprints were at least conditionally accepted for publication in a forthcoming book on topological methods in data visualization, while another publication of my new lab was accepted as a poster for ICLR 2019.

The underlying theme of all these publications is to shift the focus of
machine learning towards *topological methods*, i.e. methods that focus
on connectivity properties of input data. I am convinced that thinking
about these types of properties is worthwhile, as the resulting shift
in perspective often leads to novel insights.

This *spring of papers* follows two themes: in the first, topology is
used directly to drive algorithms, for example to classify data, or to
elucidate its properties. In the second theme, topology is used
indirectly to learn something about the behaviour of *other* algorithms.

# Topology as a driver for algorithms

In *Persistent Intersection Homology for the Analysis of Discrete
Data*, Markus
Banagl, Filip
Sadlo, Heike
Leitte, and
I describe how to use persistent intersection homology, an extension of
persistent homology
in order to describe spaces that do not consist of a *single* manifold,
but of *multiple* ones. It turns out that as long as these multiple
manifolds intersect in somewhat controlled ways, we are able to extract
more information than “ordinary” persistent homology.
This has the potential to result in a new view on traditional manifold
learning algorithms because it seems highly unlikely that input data
only come from a single manifold—despite our best efforts to
pretend that this is the case.

Moreover, in *Topological Machine Learning with Persistence Indicator
Functions*,
Filip Sadlo, Heike
Leitte, and
I describe a functional summary of persistence
diagrams—topological descriptors that typically arise during the
calculation of persistent homology— that is easy to calculate,
can be used for statistical hypothesis testing, and gives rise to
a *kernel* function.
Without going into the details, the last property is the most exciting
one, at least for me, because it permits using several powerful machine
learning algorithms, such as kernel support vector machines, to use
these summaries for classification. In the paper, we demonstrate that we
are capable of outperforming a state-of-the art method for
non-attributed graph analysis, but this is just the beginning. I hope
that the availability of our
method will encourage more
research in this direction.

# Topology to understand deep learning

Following the second theme, in *Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology*, Matteo Togninalli, Christian
Bock, Michael Moor, Max
Horn, Thomas Gumbsch, Karsten
Borgwardt, and I developed
a novel topology-based method for analysing (deep) fully-connected
networks. Our method determines the amount of topological activity of
the network during the training process and relates it to a theoretical
maximum that depends on the corresponding architecture. We demonstrate
that our measure—dubbed *neural persistence*— is capable of
assessing the complexity of different (simple) networks. Moreover,
we show how it can be used as a valid criterion for early
stopping that does *not*
rely on additional validation data. Thanks to a thorough review
process, we added many
additional experiments in the supplementary materials, including an
extremely detailed analysis of early stopping scenarios on different
data sets, compared with a standard criterion that employs validation
loss. Of course, the code is publicly
available.

All in all, this was an extremely productive *spring of papers*. I am
grateful for my collaborators, team mates, and friends, in particular
Matteo and Christian, who were instrumental in the conception, planning,
and execution of this project.

Happy reading, until next time!