ABCDE roadmap

The ABCDE roadmap refers to a decomposition proposed by Hoang 19a to better highlight the key challenges of AI ethics and safety. It is also discussed at greater length in Hoang 19b and HoangElmhamdi 19^FR.

The decomposition

The decomposition consists of 5 steps: Alice, Bob, Charlie, Dave and Erin.

Erin's goal is quality data collection and certification, by relying on all sorts of sensors and user inputs, as well as cryptography. Techniques related to Blockchain may also be useful to guarantee the traceability of data.

Dave is in charge of world model inference from Erin's data. In particular, Dave should correct for sampling biases and account for uncertainty due to data incompleteness. To do so, it may rely on heuristic forms of Bayesianism, like representational learning constructed by GAN-like architectures.

Charlie must compute human preferences. In particular, she should probably implement some social choice mechanism to combine incompatible preferences. Also, she should probably distinguish volition from instinctive preference. Combining techniques like inverse reinforcement learning and active learning is probably critical to design Charlie.

Bob would design incentive-compatible rewards to be given to Alice. By combining Erin, Dave and Charlie's computations, Dave could send to Alice humans' preferences for different states of the world, including the (likely) current state of the world and the probable future states of the world. But to avoid Goodhart's law and wireheading, it would likely be dangerous to do so directly. Instead, Bob could enable and incentivize the corrigibility of Erin, Dave and Charlie's computations, by feeding Alice with larger rewards when Erin, Dave and Charlie perform more accurate computations. Designing Bob may be called the programmed corrigibility problem.

Finally, Alice is going to perform reinforcement learning using Bob's rewards.

Motivation and justification

The fundamental assumption of the ABCDE roadmap is that tomorrow's most powerful algorithms will be performing within the reinforcement learning framework. From a theoretical perspective, this assumption is strongly backed by the AIXI framework and, for instance, Solomonoff's completeness theorem.

From an empirical perspective, it is also backed by the numerous recent successes of reinforcement learning, for instance in Go, Chess and Shogi SHSAL+17 SAHSS+19, in video games like Atari games MKSGA+13 or StarCraft VBCMD+19, in combinatorial problems like protein folding EJKSG+18 SEJKS+20, or in arguably today's most influential algorithm, namely YouTube's recommendation system IJWNA+19.

Reinforcement learning are probably going to keep improving. The only way to make sure that reinforcement learning algorithms will be robustly beneficial is arguably to make sure that these algorithms optimize a desirable goal. This is known as the alignment problem.

In the case of reinforcement learning, the goal is the sum of discounted future rewards. As a result, the reward function is critical to AI ethics and safety. The ABCDE roadmap highlights it, by further decomposing the components of the reward function.

It is argued that the ABCDE roadmap is likely to be useful, because it allows to decompose the alignment problem into numerous subproblems, which are both (hopefully) independent enough to be tackled separately, and complementary enough so that solutions of subproblems can be easily combined into a solution to the global alignment problem.

Anonymous

Search

ABCDE roadmap

Namespaces

More

Page actions

The decomposition

Motivation and justification

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

ABCDE roadmap

The decomposition

Motivation and justification

Navigation

Wiki tools

Page tools