Alignment

Alignment is the problem of designing the goal of a maximization algorithm, so that the goal reflects what we really want to optimize. It has been argued that the social choice NGAD+18 of humans' volition Yudkowsky 04 could be the best approach to alignment HoangElmhamdi 19^FR.

Framing other approaches as alignment

Cite Hadfield. Interruptibility.

Pitfalls

Alignment is widely recognized as a difficult problem. Take the case of companies for example.

In fact, alignment is so hard that even in the very restricted yet widely studied framework of supervised learning, there's still no good theory of what function should be optimized. Indeed, because of overfitting, it has become common to add regularization terms, whose usefulness has been lately questioned by the discovery of double descent. From a Bayesianism viewpoint, this all boils down to noting that learning is not naturally described by an optimization framework, and that it must invoke some prior.

Goodhart's law formalizes this difficulty.

Eckersley 18 exploit the repugnant conclusion and variants to argue against implementing a utility function.

Anonymous

Search

Alignment

Namespaces

More

Page actions

Framing other approaches as alignment

Pitfalls

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Alignment

Framing other approaches as alignment

Pitfalls

Navigation

Wiki tools

Page tools