AI: From Homer to ChatGPT
AI is omnipresent. With speculations on its capabilities running rampant, it is time to slow down a bit and assume a broader perspective. This is an article I wrote for a general audience, so if you are already in the know, please forgive me for taking some shortcuts for the sake of exposition.
Automatons In The Iliad
As it turns out, AI has been with us for much, much, longer. Already Homer describes attendants of Hephaestus, the god of blacksmiths and artisans, in the Iliad:
Grasping a thick staff he limped from the forge, supported by servants made of gold, fashioned like living girls, who attended swiftly on their master. As well as the use of their limbs they had intellect, and the immortals gave them skill in subtle crafts.
Somewhat tellingly, we would refer to such contraptions as “automatons,” which comes from the Ancient Greece word αὐτόματος (automatos), composed of “self-” and “thinking.” This already provides some insights into what we humans hope AI to be.
The Mechanical Turk And The Difference Engine
Moving from myth to mechanics, we find the 18th century to be the heyday of all types of clever automatons, including filigree music boxes (which our modern mind would probably not directly associate with anything AI) or contraptions like the infamous “Mechanical Turk”. Created by a certain Wolfgang von Kempelen, this machine was advertised as being able to play chess against a human player. Unfortunately, but maybe not unsurprisingly, it turned out to be a cleverly designed fake: The box essentially had enough space to hide a human player. Before people found out, the “Mechanical Turk” was a sensation (and rightfully so) since it appeared to mechanize that which used to be solely under the purview of humans.
All in all, however, the creativity of that epoch was restricted by the available technology. Thus, most automatons from that epoch, regardless of how cleverly designed and crafted they might have been, are just too specific to be considered AI. Nevertheless, an interesting anecdote by none other than Charles Babbage, creator and programmer of the difference engine, i.e., one of the precursors of the modern computer. Babbage wanted to solve polynomial equations faster and more precisely; when presenting his machine, he made the following entries in his journal:1
On two occasions I have been asked, “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?” […] I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
Overpromising & Underdelivering?
Arriving finally in our times, AI and its promises were rediscovered. In 1956, leading scientists met at the Dartmouth Workshop, widely considered to be the creation of AI as a modern research field. Like our own research, the initial brief was widely optimistic and considered that about two months (!) would be sufficient to reach the following aims:2
[…] The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. […] We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.
Despite bringing about many new ideas (including expert systems and ideas for the first “artificial” neural networks), the workshop did not manage to bring AI into the public eye. This changed briefly in 1996, when Garry Kasparov played chess against “Deep Blue”—and lost. However, as much as this was a milestone for rule-based systems and AI in general, “Deep Blue” was a research cul-de-sac; modern neural networks are operating based on different paradigms, and “Deep Blue” was not even close to a general-purpose device.
The Revolution Has Good Graphics
Skipping over a couple of years, the breakthrough finally happened around the year 2012. Mostly driven by the availability of modern graphics cards, i.e., Graphics Processing Units (GPUs), neural networks essentially became feasible over night.3 The primary idea of such networks is to combine many small computational units, dubbed the “neurons,” which receive an input signal and may slightly modify it to pass it on to other neurons using trainable weights (more about that later). By arranging neurons hierarchically in different layers, an input signal (such as an image) can be iteratively transformed into an output signal (such as the probability that the image contains a dog).
To train such neural networks, we require vast amounts of data. Ideally, this data should be of high quality and diverse (whatever that might mean for a given application). A very simplistic approximation to what happens during the training process of neural networks is that the neural network is shown data and the expected output. Its internal weights are then adjusted such that the actual output more closely matches the expected output. By repeating this countless times and with a sufficiently large dataset, the hope is that at some point, the neural network will have picked up the underlying pattern and generalize to hitherto-unseen data. Despite decades of progress, training such a neural network remains something of an art, and numerous interesting unexpected phenomena arise during training. This did not stop the empirical progress, and after several “victories” in the field of computer vision, neural networks started transforming other fields. Already in 2018, three visionary AI researchers of the 1980s—Yann LeCun, Yoshua Bengio, and Geoffrey Hinton—received the “Turing Award,” one of the highest honors in computer science. Together with other pioneers like Jürgen Schmidhuber and Fei-Fei Li, these researchers could be considered the godparents of modern AI.
Pay Attention!
The next revolution happened thanks to the Attention Mechanism, which provided a neural network with a way to assign different parts of its inputs different weights, denoting their importance for a specific task. The resulting architecture, the transformer, turned out to be of general use, enabling, among other things, improved translation engines. This led to neural networks being capable to make use of inputs in the form of natural language, thus encroaching on humanity’s primary domain, viz., our capability of wielding language. Again, through vast amounts of data, often painstakingly annotated by humans, OpenAI managed to create one of the first generally usable large language models, dubbed ChatGPT. The true genius of this model is that it makes a plethora of tools available through language. No more arcane command-line inputs or scripts, but instead incantations, i.e., prompts, that remind me very much of magic spells…
Whatever our own opinions on AI, the transformation is already happening now, and large language models are ubiquitous. The use of language as the primary way of communication makes models appear “intelligent,” despite the mechanisms being essentially the same as in the previous decades, involving “just” a larger amount of data and compute. Whether this is a difference in degree or a difference in kind remains to be shown. However, there is at least one major distinction to known special-purpose machines like the aforementioned difference engine: Such machines typically make no mistakes in the calculations for which they were constructed (setting aside the fact that floating-point calculations are still hard to do with finite memory). This precision comes at the price of being fundamentally limited. Large language models do not exhibit such limitations a priori, but all their cognition is based on their inputs. If ChatGPT reads that people write about Switzerland having mountains, it will “learn” said fact, but it would equally be “happy” with learning that Switzerland is an almost-planar country. Hence, a lot of human intervention is required to teach AI the “right” things—and we should not be content with that type of training happening behind closed doors! Moreover, given the fundamentally stochastic nature of these models, it is not guaranteed that a model will always describe the right things. Unwillingly, it may hallucinate new “facts,” including references to non-existent articles or books. Some researchers thus believe that modern large language models will ultimately suffer the fate of “Deep Blue” and turn out to be too specific to be of long-term general use. In a recent WSJ article, Yann LeCun even discourages aspiring PhD students to work on large language models!
The Future Is Here And (Sort Of) Evenly Distributed
Whatever the future may hold, seemingly “modern” AI is now obviously there, and it is a technology that demands a lot from us. It challenges our beliefs, our systems, and our ways of living. Like any other technology, it has the potential for good and for evil. Unlike other technologies, however, AI is much more seductive since it promises short-term cognitive shortcuts that may result in long-term deficits.4 Moreover, modern large language models pretend to be persons, but at least for now, this is most likely5 just a facade and there is a lot of cognition but no consciousness. This does not prevent many users to treat their large language model as a person, though, and trust its utterances. Such trust can be misplaced and turn out fatal. As a society, we thus have to ask ourselves how we want to deal with AI. Do we treat everything it like something out of Pandora’s box or do we rather want to consider it as something straight from the horn of Amalthea? Regardless of our stance, everyone should at least understand the basics of this technology, lest the new relationship between human and machine turns out to be a toxic one.
(This is an extended and translated version of my essay on AI, appearing in UNIVERSITAS.)
-
“Passages from the Life of a Philosopher,” 1864 ↩︎
-
This makes me feel less bad when reflecting upon my own research proposals. ↩︎
-
The readers familiar with neural networks may have to pardon that I am glossing over a lot of details here. ↩︎
-
While AI is famously compared to mechanical calculators, I think this is a half-baked argument at best. We still need to teach students basic arithmetical skills so that they may hopefully understand what calculations are all about. Even a TI-82 will not solve your homework if all you do is mash buttons at random. ↩︎
-
At least I hope so, because I find the idea of an enslaved conscious entity to be loathsome. ↩︎