How to improve your mathematical writing
Tags: musings, research
Having been a reviewer for numerous machine learning conferences now, there are some repeated issues I want to point out, concerning the way authors describe mathematical concepts in their papers. I hope that my suggestions will serve in particular those that are new to machine learning and for whom the task of reading a paper with a ‘bad’ mathematical style might seem a daunting prospect. Without further ado, here are some suggestions for improving your research article.^{1}
Build intuition
Your primary objective when writing mathematics should be to build intuition for readers. This does not mean that you should sacrifice any rigour or precision—I am just advocating for considering that a reader of your paper is not an expert in the subject matter yet. If you want readers to take away some wisdom from the paper, you have to address their intuition first. No one except an expert in your field will be swayed by longwinded proofs.^{2}
I am not advocating for sloppy thinking, sloppy writing, or ‘handwaving’ in your proofs. I am just saying that the way you lay out your overall argument should be based on intuition, such that even reviewers that are not experts in your domain can follow along. If you only write a paper geared towards a very small audience, you are reducing its impact and make your life harder because it will almost be impossible to find good reviewers for it in the conference cycle.
Strive for consistent notation and terminology
Whenever you introduce notation, try to remain consistent. For example, if you want to name three objects, prefer calling them $a$, $b$, and $c$, instead of $\alpha$, $\mathbf{B}$, and $\mathfrak{c}^\prime$.
Whenever possible, follow established terminology—such as calling loss terms $\mathcal{L}$—but do not assume that the reader is aware of these conventions. Always take the time to at least briefly introduce your notation.
This point is in fact closely related to the second issue, namely…
Less can be more
Some people aim to be extremely precise and always employ indexbased
notations. For example, instead of referring to objects $a$ and $b$, it
is always $a_1$ and $a_2$ or something. There is nothing wrong with this
in principle, but I would suggest to always strive for as little index
notations as possible and, if only two objects of the same type are needed,
why not use a $\prime$ symbol (\prime
in LaTeX) instead of indices?
This can often increase readability and makes you less errorprone
because you do not have to keep track of the indices any more. For
example, in our survey on graph kernels,
we try to always speak of $G$ and $G^\prime$ when we need to refer to
two graphs. This leaves the indices free for when you actually have to
index something, for example because you are aggregating over objects
arising from a certain set.
In addition, try to adhere to ’less can be more’ in other places as well, by removing unnecessary variables, unnecessary concepts, and unnecessary definitions. It does you credit if you want to explain everything about a certain field in your paper, but if you introduce a concept X as a special case of a concept Y, there is no need to go to great lengths to describe any superconcept Z, unless you actually need to refer to its properties. This does not mean that you should leave out salient information—I am suggesting to focus on pertinent information only. For example, there is no need to expand on properties of metric spaces, topologies, and so on, if all you need is Lipschitz continuity. Aim to get to the point quickly and without too many redirectionsÂ (unless you are writing a book, but I am assuming that the reader of this post is interested in writing research papers).
Try also not to fall prey to the tendency to make your formulas more complex than they have to be. For example, peppering your formulas with logical quantifiers such as $\forall$ or $\exists$ is more appropriate when you are publishing in logic. Many people, however, treat these symbols as shortcuts. Please do resist that temptation. As Knuth remarks in Mathematical Writing, the overuse of such symbols just decreases readability.
As a corollary to the preceding rule, do not include ‘cosmetic mathematics’ in your paper. Every equation, lemma, proposition, or theorem should be there because it needs to be there. I know that there is a tendency for reviewers to look favourably on a paper whose mathematics seem ‘sufficiently complex’, but this is not the right way. If you want to publish good papers, your first duty is to ensure that readers of that paper can glean some insights from it. Showing off will not help in the long run.
Conclusion
Similar to my previous post on writing in general, the main thing to remember is to know your audience. In machine learning, this audience can come from highly different backgrounds. Make them appreciate the ideas in your paper by guiding them gently into the mathematics. Reserve the ‘heavy stuff’ for an expert audience that can give you some actual feedback for it instead of just skimming over it.
Write well, until next time!

I am assuming throughout this article that your research article is written for a machine learning conference. You might have to adjust your writing considerably if you are a mathematician publishing for other mathematicians—but in machine learning, many people are not exclusively trained in mathematics, making writing research papers a little bit more challenging if you want to do it well. ↩︎

Whether proofs should be included in machine learning papers at all is still a matter of debate. I think that a rigorous, principled exposition in a paper is to be welcomed. Yet, the more our community expands, the more we have to think about how to successfully get the point across in our papers! At some point, there will not be a sufficient number of expert reviewers for highly specific papers, I fear. ↩︎