I Don’t Understand AI Alignment

Tags: academia, musings

Published on Friday, July 5, 2024

« Previous post: Thoughts about my Research Philosophy — Next post: Underestimating Users »

Recently, it occurred to me that I don’t understand AI alignment researchers. The goal of AI alignment is to teach AI our values, thus making AI systems ostensible safer and more reliable to use for everyone. Proponents of artificial generalised intelligence, also known as AGI, believe that alignment is critical to prevent an AI-based extinction event.

Setting aside these fears for now, my main issue with alignment is that the current approach strikes me as infeasible, mainly because even humans cannot agree on what human values should be, i.e. we are not aligned among each other. I do not mean to equivocate here: I am of course aware that there are also ultimately more feasible alignment goals such as preventing the spread of misinformation, but the main purpose of alignment is always explained as having our machine-learning models adopt human values.

What might these values be, though? Even if we would be able to agree on a universal set of principles such as ‘Thou shalt not lie,’ practical philosophy and our own experience tells us that there are always grey areas or dilemmas. While not everything is a trolley problem or a prisoner’s dilemma, we are not aligned when it comes to most—if not all—issues of societal relevance.

Of course, knowing the shortcomings of our own species, we have developed a system to partially mitigate some alignment problems: democracy. We vote on relevant issues, have discussions, and try to reach consensus. Typically, there are also safeguards in place that ensure that the rights of underrepresented groups are protected, leading preventing a tyranny of the majority.

In light of this, I find it astonishing that alignment researchers are not drawing more upon the wealth of knowledge of political scientists, social scientists, psychologists, ethicists, philosophers, and many more disciplines of the humanities.

Alignment is a problem that concerns everyone, so why not involve everyone?