This wiki argues that it is central to AI ethics to aim at being robustly beneficial. In particular, robustness refers to taking into account all the subtleties of decision-making that both humans and algorithms too often neglect.
Robustly beneficial to errors
Both humans and algorithms are victims of errors, sometimes called bugs. A robustly beneficial algorithm must remain beneficial, even if there were errors in its implementation or in its execution.
Program verification and distributed crash tolerance are critical to guarantee such robustness, though they are extremely hard to implement for large-scale machine learning systems. Moreover, learning raises other concerns like reward hacking. Better understanding the tradeoff between robustness to errors and efficiency is a critical aspect of AI ethics.
Robustly beneficial to biased data
Both humans and algorithms strongly rely on data for learning and decision-making. Unfortunately, such collected data must be assumed to be biased MMSLG19. Typically, data that are easier to collect will often be over-represented in humans' and algorithms' learning datasets. Additional biases may be due to the fact that the data that humans and algorithms learn from has been pre-processed by other humans or algorithms, whose data processing is inevitably itself biased (though hopefully biased only towards being more informative).
Meta-data such as data certification are probably going to be critical to design algorithms that are robust to biased data. Perhaps more importantly, alignment could be the only way to make sure that the decision-making of algorithms does not repeat undesirable historical patterns.
Robustly beneficial to flawed world model
Both humans and algorithms strongly rely on limited data to infer their world model. Given this, the world model cannot be complete. As an example, it is impossible for them to know the exact number of living humans at a given instant. All must acknowledge epistemic uncertainty on their world model.
As a result, AI ethics must address decision-making under epistemic uncertainty. Most importantly, it needs to take into account this uncertainty to avoid decision-making given a flawed world model, which may be greatly counter-productive if not catastrophic in the actual world. Bayesian principles and second opinion querying will likely be critical to this.
Robustly beneficial to unforeseen side effects
When making decisions, both humans and algorithms have to neglect features of their world model, because their world model may be too complex to be analyzed. This is particularly relevant for fast decision-making, say in days for important human decision-making, or in milliseconds in the case for recommendation by the YouTube algorithm. This constraint usually motivates us to neglect unforeseen side effects, which is a leading concern for AI risks.
This is particularly worrying in complex interacting environments such as social medias, where tweaks of recommender algorithms may change users' beliefs and preferences in unforeseen manners (see backfire effect). This concern is highlighted by Goodhart's law. Being robustly beneficial to unforeseen side effects may be today's greatest challenge in AI ethics, and seems unfortunately very neglected.
Robustly beneficial to distributional shift
Another difficulty for both humans and algorithms is that we are often trained in a given environment. Yet, the environment we live in may be different. Worse, our environment is always changing, arguably more so these days than ever in human history. This phenomenon is known as distributional shift OFRNS+19.
Unfortunately, today's algorithms are hardly robust to distributional shift, as they are often trained in limited environment which are very different from, say, the YouTube ecosystem. There may be some hope though that, as algorithms become more and more sophisticated and train with large and larger datasets, they may learn to fit deeper patterns rather than spurious correlations.
Robustly beneficial to malicious entities
Still another difficulty occurs when humans and algorithms become more and more influential. At some point, we have to expect malicious entities to try to hack humans and algorithms for their benefit, or sometimes just because they want to destruct the most influential entities. This is definitely already occurring for the most influential algorithms, like YouTube, Google or Facebook, through adversarial attacks of pedophile moderation algorithms, SEO optimization or misinformation-based targeted political campaigns.
Robustly beneficial to moral uncertainty
Finally, one last important robustness requirement is robustness to moral uncertainty. Given humans' disagreements on what is desirable, for instance in terms of hate speech moderation, it seems crucial to acknowledge that we don't really know what algorithms should aim to do. Having said this, it seems just as important to acknowledge that we do have a lot of common ground, say in terms of murder video moderation on YouTube.
There has been exciting recent developments in inverse reinforcement learning and social choice to better learn what users prefer and to aggregate diverging views. But a lot more research on designing practical reliable algorithms and on the interpretability of such algorithms seems needed.
Also, it is noteworthy that moral uncertainty is far from limited to interpersonal disagreements. Humans often have a hard time articulating their preferences. And even when they do, such revealed preferences are often full of inconsistencies because of cognitive bias. In fact, our future selves often disagree with our present selves' preferences. Robustness to moral uncertainty should also take this into account. The research on volition tries to address this issue, but this research has barely begun so far. It seems urgent to kickstart it effectively.