Preference learning from comparisons

It has been argued that we humans are much more effective at comparing alternatives than at scoring them MaystrePhD 18. Besides, implicit observations of humans mostly provide such choice data, in the form of like/no-like, share/no-share, click/no-click, and so on. Therefore, preference learning from comparisons seems to be an important approach to preference learning.

Classical models

The classical models for preference learning are due to Thurston 27, Zermelo 29, DavidBook 63, BradleyTerry 52, LuceBook59.

In these models, in a choice between 1 and 2, the human implicitly the scores [math]\theta_1[/math] and [math]\theta_2[/math] that she assigns to each alternative. But his computation of [math]\theta_1 - \theta_2[/math] is noisy, and is accompanied with some noise [math]\varepsilon[/math], to yield [math]x_{12} = \theta_1-\theta_2+\varepsilon[/math]. In the above models, the sign of [math]x_{12}[/math] then determines the choice of the human.

This approach allows to explain some of the inconsistencies in humans' decision-making. Equivalently, this corresponds to saying that the probability that the human chooses 1 over 2 is a function of the intrinsic difference [math]\theta_1-\theta_2[/math], which we can write [math]\mathbb P[1 \succ 2] = \mathbb P[x_{12}\gt 0] = \phi(\theta_1-\theta_2)[/math]. Intuitively, the greater the intrinsic difference between 1 and 2, the less likely it is for the human to say he prefers 2 to 1.

Different models assume different noise models for [math]\varepsilon[/math], or, equivalently, for the function [math]\phi[/math]. Thurstone's model assumes that [math]\varepsilon[/math] is normally distributed, which is equivalent to saying that [math]\phi(z) = \Phi(z/\sigma)[/math], where [math]\sigma^2[/math] is the variance of [math]\varepsilon[/math] (which may depend on the choice of 1 and 2), and where [math]\Phi[/math] is the cumulative density function of the standard normal distribution.

The Bradley-Terry model assumes that [math]\varepsilon[/math] follows a Gumbel distribution. Equivalently, it sets [math]\phi(z)= \frac{1}{1+\exp(-z)}[/math] (up to variance scaling). Luce's model generalizes the Bradley-Terry models, by considering choices among numerous alternatives and setting [math]\phi(z_2, z_3, ...)= \frac{1}{1+\exp(-z_2)+\exp(-z_3)+...}[/math], where this quantity is the probability of choosing option 1, and where [math]z_i = \theta_1-\theta_i[/math]. Interestingly, Luce proved that this framework was equivalent to demanding independence of irrelevant alternatives.

The inference problem is then the problem of inferring the values of parameters [math]\theta_i[/math] given observational data [math]\mathcal D[/math] of choices [math]i[/math] out of a set of alternatives [math]\mathcal A[/math]. Bayesian inference would suggest computing [math]\mathbb P[\theta|\mathcal D][/math]. But as often, this approach might be too computationally costly in practice.

Markov chain for preference learning

Note that in the Luce model, the log-likelihood [math]\log \mathbb P[\mathcal D|\theta] = \sum \left( \theta_i - \log \sum_{(j \succ i) \in \mathcal D} \exp \theta_j \right)[/math] is a strictly concave function in [math]\theta[/math] if the comparison graph is strongly connected (for any [math]i,j[/math], there exists [math]k_1, ... k_m[/math] such that there are some of the data that says [math]i \succ k_1 \succ k_2 \succ ... \succ k_m \succ j[/math]).

One approach to compute the maximum likelihood estimator ([math]\arg\max_\theta \mathbb P[\mathcal D|\theta][/math]) consists of computing the stationary distribution of a Markov chain on the comparison graph. This is a graph whose nodes are alternatives, and which contains a directed arrow from [math]i[/math] to [math]j[/math] if the user has once asserted that he prefers [math]i[/math] to [math]j[/math].

Gaussian process

Connection to sports

Chess Elo78, football MKFG 16.

Anonymous

Search

Preference learning from comparisons

Namespaces

More

Page actions

Contents

Classical models

Markov chain for preference learning

Gaussian process

Connection to sports

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Preference learning from comparisons

Contents

Classical models

Markov chain for preference learning

Gaussian process

Connection to sports

Navigation

Wiki tools

Page tools