There is Fun in the Fundamentals

Tags: research, musings

Published on
« Previous post: Clopen Data

As much as I hate to admit it, I am slowly becoming a more senior member of the academic community.1 Part of this involves providing some guidance and addressing things that are not talked about openly but that one should talk about openly. Here, then, is my answer to the unspoken question of ‘Given the predominance of large language models, is there value in focusing on other things and, if so, what should one focus on?’

First, let me stipulate that working on LLMs is great but it feels like it is an overcrowded field at the moment. Hence, my advice to new graduate students is simple: Focus on the fundamentals. Learning the fundamentals of any field, be it learning theory, graph learning, or really any other branch of machine learning, is not only quite fun but it also creates the necessary ‘wetware,’ or deep structures in your brain that, ultimately, will help you learn other things. Resist the trap of seeking only the knowledge that is relevant for the topic du jour, because this topic might change very quickly. If you invest in a strong foundation, you will be able to adapt or, ideally, even define your own topic at some point.

I know that this is easier said than done, in particular in machine learning, where one of the first essays one gets to read is probably The Bitter Lesson. Everyone heeds Sutton’s (very wise) words, but there is a subtlety in his warning that is often misunderstood. To wit, I do not read Sutton’s lesson as stating that there is no need for human ingenuity or creativity. On the contrary! Of course, general-purpose computation methods will generally beat hand-crafted features, provided there is sufficient data available, but not all constraints can—or, for that matter, should—be gleaned from data. For example, convolutional neural networks, rightfully heralded as the harbingers of the deep learning revolution, have locality as one of their core architectural concepts.2 They still have their role to play and are often more efficient and effective than other methods.

Thus, when it comes to ‘more data’ versus ‘more inductive biases,’ I am firmly in the ‘why-not-both’ camp. Philosophically, I am an opponent of
physicalism,3 and, by analogy, I believe that it is not exclusively about the data. Data is important and more data will certainly help in some settings, but I believe that the role of scientists of all ilks does not stop at collecting data, even though data collection and curation are central to the scientific endeavour per se.

To return to the idea of fundamental knowledge: If you understand how certain things work at a low level, you will inevitably be drawn to small cracks or imperfections in other settings. Maybe there are some erroneous assumptions lurking somewhere, maybe you find a glaring oversight in the theory behind a model…4

You can only recognise such things by putting in the hard work of understanding something very deeply. So, have fun studying the fundamentals of something—anything—you like, for it will serve you well.


  1. I certainly do not feel like it, but my misgivings about hierarchies in academia will have to wait for another time. ↩︎

  2. Similar things could be said about transformers, although their inductive biases are more general. Hence, in some sense, transformers might be closer to Sutton’s idea of a general-purpose architecture, but this also comes at the price of different compute and data requirements. ↩︎

  3. Reading Chalmers changed my views considerably. ↩︎

  4. Glaring to you, that is! ↩︎