Backfire effect occurs when a treament idea is actually counterproductive. This is particularly likely in complex environments that we misunderstand. It is a dangerous pitfall, and thus a major challenge, for AI ethics.
Backfire due to cognitive biases
Perhaps the most important backfire risk concerns how human cognition reacts to treatment attempts. Here are some examples.
Muller08 Veritasium15 showed that exposure to correct physics video increases many students' confidence in flawed conceptions of physics. He showed that an effective treatment needs to highlight likely misconceptions.
BABBC+18 studied 1-week treatment where Democrats and Republican Twitter users were paid to follow a bot retweeting leading opponents' tweets. They showed increased polarization, mostly among Republicans. While the paper (brilliantly) points out limits of the generalizability of the findings, this line of study is clearly critical for the design of robustly beneficial algorithms.