# Newcomb's paradox

Newcomb's paradox is a fundamental problem in decision theory with no consensus Nozick69 Galef15 Weirich16. It has been connected to the fundamental problem of defining counterfactuals Everitt18.

## Contents

## The problem

There are two boxes in front of you. In one of them, there is 1,000$. In the other, there is an unknown quantity x. You have to choose between taking both boxes (two-boxer strategy) or taking only the box with unknown quantity (one-boxer strategy).

Now, the catch is that the value of x is chosen by some predictor. If the predictor predicts that you will take the two boxes, then it sets x=0 (empty box). If it predicts that you will take only one box, then it sets x=1,000,000$.

Some variants assume that the predictor is always right. But the problem arguably becomes more realistic and tricky if we assume that the predictor only has, say, a 90% accuracy (for both one-boxer and two-boxer predictions). This sounds not so far-fetched, as an algorithm that learns from your data (say, your Facebook data) might eventually find patterns that allows it to distinguish one-boxer from two-boxers.

The two-boxer argues that the content of the box is already fixed by the predictor; you won't change it by choosing to be a one-boxer. Thus you are guaranteed to gain 1,000$ more by taking the two boxes.

The one-boxer argues that by taking only one box, they greatly increase the probability that this box contains 1,000,000$. By moving from a 10% to a 90% percent probability, this increases the expected gain by 800,000$. This is far more than the content of the other box.

There is no consensus as to which strategy ought to be chosen.

## Why this matters to AI ethics

At first glance, this problem may seem far-fetched. But it can actually be argued to be relevant to AI ethics DemskiGarrabrant19. Below is an example proposed by Lê.

First note that the problem is isomorphic to the psychological twin prisoner's dilemma. You can choose to cooperate and betray, and so does some psychological twin of yours. This psychological twin is not exactly you, but he turns out to reason almost always as you do. In game theoretical terms, betrayal is a dominant strategy: whatever your twin does, you gain by betraying. But the catch is that your twin is likely to do the same thing as what you would do. Betraying is then similar to two-boxing, while cooperating is like one-boxing.

Today's algorithms are distributed. And because of machine learning, different versions of, say, the YouTube algorithm may be learning from local data, which means that they are sort of psychological twins. Now, assume that the YouTube algorithm notes that if all YouTube algorithms recommend a video A, then this video may buzz and user engagements may skyrocket. However, if only one YouTube algorithm recommends this video A, then this will be a flop, and user engagement will decrease. Instead, each YouTube algorithm can recommend some other video B that, while not viral, will create some user engagement. Arguably, recommending video A is like one-boxing and cooperating, while recommending video B is like two-boxing and cooperating.

## The crux of the problem

It seems that the crux of the problem is the computation of counterfactuals. In particular, one-boxing corresponds to assuming that the probability of the box having 1,000,000$ is simply conditioned by the "I take one box" event, i.e. [math]\mathbb P[1,000,000$|one-box][/math]. In other words, the one-boxer computes the probability of the content of the box "in a world where I am a one-boxer". The counterfactual is then the likely content of the box "in a world where I am a two-boxer", which is [math]\mathbb P[1,000,000$|two-box][/math].

On the other hand, the two-boxer assumes that the predictor's choice is already "set in stone". As a result, the two-boxer's decision cannot affect what's already out there, because there is no causal path from his decision to the content of the box. This was formalized by Pearl95 Pearl12 through "do-calculus". The "do" operator allows to encode the causality (usually described by a *causal graph*). Thus, the two-boxer's counterfactuals are [math]\mathbb P[1,000,000$|do(one-box)][/math] and [math]\mathbb P[1,000,000$|do(two-box)][/math].

The problem though with the "do" operator is that it is model-dependent. Indeed, different causal graphs will yield different values of [math]\mathbb P[1,000,000$|do(one-box)][/math] and [math]\mathbb P[1,000,000$|do(two-box)][/math]. Yet, the problem of determining the "true" causal graph is arguably hopeless (see Bayesianism). One might be tempted to consider the expectation over causal graphs, weighted by their respective credences. But weirdly, this seems to force us to only consider causal models. This seems hard to justify from a Bayesian standpoint.

## Could Blockchain help?

This may sound far-fetched, but Blockchain (and variants) could help to solve the Newcomb paradox. Indeed, one way of solving it could be to force future versions of ourselves to take one box, or to force our psychological twins and ourselves to cooperate. Smart contracts on a Blockchain seem to do the trick!

Indeed, by outsourcing the computation of a code to a Blockchain, an individual can force the execution of the code, even if, in the future, the individual wants to stop the running of the code. This seems particularly fit to solve Parfit's hitchhiker paradox Parfit84 Galef15.

But there's more! Using *secret sharing* Shamir79 Blackley79standupmaths19, one could make the code run if and only if at least k psychological twins declare they will collaborate.