From RB Wiki

Alignment is the problem of designing the goal of a maximization algorithm, so that the goal reflects what we really want to optimize. It has been argued that the social choice NGAD+18 of humans' volition Yudkowsky04 could be the best approach to alignment HoangElmhamdi19FR.

Framing other approaches as alignment

Cite Hadfield. Interruptibility.


Alignment is widely recognized as a difficult problem. Take the case of companies for example.

In fact, alignment is so hard that even in the very restricted yet widely studied framework of supervised learning, there's still no good theory of what function should be optimized. Indeed, because of overfitting, it has become common to add regularization terms, whose usefulness has been lately questioned by the discovery of double descent. From a Bayesianism viewpoint, this all boils down to noting that learning is not naturally described by an optimization framework, and that it must invoke some prior.

Goodhart's law formalizes this difficulty.

Eckersley18 exploit the repugnant conclusion and variants to argue against implementing a utility function.