Distributional shift is the problem of achieving good performances despite a change in the data distribution. Typically if an algorithm learns from a distribution [math]\mathcal D[/math], can we guarantee that it will perform well when tested on distribution [math]\mathcal D'[/math]?
[shankar roelofs mania fang recht imagenet citation needed] argue, however, that there may still be significant generalization errors caused by distributional shift. In particular, they designed their own labeling and showed that, while humans have similar accuracy on ImageNet and this alternative labeling, state-of-the-art algorithms failed to generalize. Distributional shift seems to be a big challenge.
Note: this does not seem to caused by over-optimization.