Metric Spaces and How To Compare Them
In this post, I want to briefly discuss the Hausdorff distance, its uses, and its extensions. Prior to delving into this, we need to understand our setting here. We are dealing with a metric space $(X, \operatorname{d})$ and its subsets. A metric space is, in some sense, one of the most fundamental mathematical objects that everyone is working with, either consciously or unconsciously. It is defined as a space, i.e. a set of points (or objects in general, but for the sake of intuition, I will only refer to points), together with a distance function $\operatorname{d}\colon X \times X \to \mathbb{R}$, the eponymous metric of the space. This function must satisfy three properties for all points $x, y, z \in X$:
 $\operatorname{d}(x, y) = 0$ if and only if $x = y$
 $\operatorname{d}(x, y) = \operatorname{d}(y, x)$
 $\operatorname{d}(x, z) \leq \operatorname{d}(x, y) + \operatorname{d}(y, z)$
These properties are usually also known as the ‘identity of indiscernibles’, ‘symmetric’, and the ‘triangle inequality’, respectively. Property 1 just tells us that we can always distinguish between points that are not the same—the metric is zero if and only if the points are identical. Property 2 is a symmetry condition, telling us that the order in which we evaluate the arguments of the function does not matter. Finally, Property 3 indicates that the distance between two points is never larger than adding a third point inbetween. It is called the triangle inequality because the sides of a triangle can be seen to satisfy this. We can also reformulate it as stating that the sum of the lengths of any two sides in a triangle must be greater than or equal to the length of the last side.
The quintessential example of a metric space is the $d$dimensional Euclidean space $\mathbb{R}^d$ with the usual Euclidean distance between points, i.e. $\operatorname{d}(x, y) := \sqrt{\sum_{i=1}^d(x_i  y_i)^2}$. While there are all kinds of interesting relaxations of the three aforementioned properties, leading to things such as pseudometrics, I want to focus on a very simple question here: given two finite subsets $X_1, X_2 \subset X$, possibly of different sizes, can we measure their distance from each other?
It turns out that we can and this is where we need the Hausdorff distance! Before giving definitions, let us look at an example and see how we could define such a distance ourselves. To this end, assume that these are our two subsets $X_1$ (in red) and $X_2$ (in blue):
To motivate our first definition for the Hausdorff distance, note that if $X_1 \subset X_2$ (or vice versa), it would be easier to calculate the distance, because we would only have to account for the parts that are not part of the subset, and we would have to measure how far it is to every other point of the smaller subset. With this in hand, let’s pretend that $X_1$ and $X_2$ can be turned into subsets of each other! To this end, we define the $\epsilon$thickening for $\epsilon \in \mathbb{R}$ as the space we obtain by replacing every point $x$ in the sample by an $\epsilon$ball, i.e. the set of all points in the metric space $X$ that are at most $\epsilon$ units away from $x$. This has the effect of ‘squinting’ at a subsample. Let’s illustrate this for the circle:
We can of course increase $\epsilon$ to ‘squint even more’:
How do we know when to stop, though? Well, we can stop once both $X_1$ and $X_2$ are subsets of an $\epsilon$thickening of the respective other space! Using the notation $X_1^{(\epsilon)}$ for the $\epsilon$thickening of $X_1$ (and analogously for $X_2$), this leads to our first definition of the Hausdorff distance as $\operatorname{d}_{\mathrm{H}}(X_1, X_2) := \inf \left\{ \epsilon > 0 \mid X_1 \subset X_2^{(\epsilon)} \text{ and } X_2 \subset X_1^{(\epsilon)} \right\}$.
Let’s briefly disentangle this! We define the distance between $X_1$ and $X_2$ as the smallest $\epsilon$ that we need such that both $X_1$ and $X_2$ are subsets of an $\epsilon$thickening of the other space. As a simple example, consider thickening both $X_1$ and $X_2$ with increasing values:
We can see that it takes large values of $\epsilon$ for $X_1 \subset X_2^{(\epsilon)} \text{ and } X_2 \subset X_1^{(\epsilon)}$ to hold. Let’s take a look at an alternative definition. To this end, assume that an adversary can place you on one point of one of the sets. Your goal is to travel as quickly as possible to the other set. The Hausdorff distance is now the longest distance the adversary can force you to travel. We can formulate this as follows:
$$d_{\mathrm{H}}(X_1, X_2) := \max\left\{\sup_{x_1 \in X_1} \inf_{x_2 \in X_2} d(x_1, x_2), \sup_{x_2 \in X_2} \inf_{x_1 \in X_1} d(x_1,x_2)\right\}$$
Note the inherent symmetry of the two terms in the distance here. The first one calculates the distance from the perspective of $X_1$, looking for the point $x_1 \in X_1$ that maximises the minimum distance to $X_2$. The second term works analogously.
The two definitions yield the same distance.^{1} This is intuitive to see when you consider that if we follow the first definition, we end up with a value for $\epsilon$ that is precisely the worstcase bound for the distance one needs to travel to go from set to the other set. On the other hand, following the second definition, we can construct the $\epsilon$thickening and it will be guaranteed—thanks to the symmetry condition—that there is no point in either one of the sets to lie outside the thickened space.
This covers the case for subsets from the same metric space $X$. But what can we do if we have two different compact^{2} metric spaces $X$ and $Y$ (or subsets thereof), each one with their own metric? To solve this case, we will use a trick and use an intermediate space. This leads to the Gromov–Hausdorff distance $d_\mathrm{GH}(X, Y)$, which is defined as the infimum of all Hausdorff distances $d_\mathrm{H}\left(f(X), g(Y)\right)$, where $f\colon X \to Z$ and $g\colon Y \to Z$ are isometric embeddings into a shared metric space $Z$.
In other words: we are looking for two functions $f$ and $g$ that leave the distances in $X$ and $Y$ intact—hence the term isometric—so that we can measure the Hausdorff distance in the shared space $Z$ instead. As you can imagine from the definition, the Gromov–Hausdorff distance is tough to calculate because finding an appropriate $f$ and $g$ is not easy. Nevertheless, the Gromov–Hausdorff distance has some fascinating properties and is one of the ‘workhorses’ in geometrical data analysis. If you want to learn more, I can highly recommend reading the paper Comparing Point Clouds by Facundo Mémoli and Guillermo Sapiro.
Happy distance calculations, until next time!

Interestingly, I have never seen a textbook that states both definitions and proves them to be equivalent! If you know of one, please tell me. ↩︎

Roughly speaking, this property generalises the notion of a ‘closed’ subset. For example, the open unit interval is not compact, but the closed unit interval is compact. ↩︎