The Elo system

Jose Leiva



My favorite moment of the film The Social Network takes place when Mark Zuckerberg asks his roommate Eduardo Saverin to explain the Elo system to him. This algorithm is used to rank chess players. At that time, Zuckerberg was developing the just-for-fun website Facemash, an early predecessor of Facebook targeted at Harvard students. In this website, visitors would be shown pairs of pictures of female students and asked about which one is hotter. “Do you think that is such a good idea?”, asks Eduardo in the film. Then he grabs a pen and writes the equations in a window. The reason why this is my favorite moment is that, unlike other films in which the equations don’t make any sense (consider Hitchcock’s Torn Curtain), these ones are correct!

Eduardo Saverin (Andrew Garfield) explains the Elo system to Mark Zuckerberg (Jesse Eisenberg) in The Social Network.

Eduardo Saverin (Andrew Garfield) explains the Elo system to Mark Zuckerberg (Jesse Eisenberg) in The Social Network.

This question was relevant for Mark Zuckerberg because he needed to build an absolute ranking of people only based on pairwise comparisons, in the same way as the chess rankings are built only from the results of individual games. The answer is also relevant to us because the strength of an asset might be given by pairwise comparisons or even matches against others. Did you know that Manchester United is listed in the NYSE under the ticker MANU?

The Elo system

The system was developed by the Hungarian-American physicist and chess master Arpad Elo in 1960 and immediately adopted by the American chess federation (USFC). The International Federation (FIDE) adopted it in 1970. The system is based on two basic premises:

  1. The difference between two players’ ratings must be as predictive as possible about the result of a game between them. In other words: the higher the difference between players A and B, the more likely A will beat B.
  2. The system must be self-correcting: as a player improves, his/her rating should improve accordingly. If a player’s performance in a tournament is poor, the rating should decrease to reflect his/her bad shape.

If the reader has seen the movie, he/she surely recognized the equations and shouted “That looks like a pair of logistic functions!” in the middle of the quiet theater. You were right! Let’s write them in a more readable format:

$$E_a = \frac{1}{1+10^{\frac{R_b – R_a}{400}}} \tag{1}$$

$$E_b = \frac{1}{1+10^{\frac{R_a – R_b}{400}}} \tag{2}$$

Here, \(E_a\) and \(E_b\) are the expected results for players \(A\) and \(B\), ranked with \(R_a\) and \(R_b\) points respectively. The logistic transformation is a good choice because it maps the ranking difference  \(R_a – R_b\)  into a range between 0 and 1, which allows for its interpretation as a probability. Also, it has a nice symmetry property that guarantees that \(E_a + E_b = 1\).  The choice of 10 as the basis of the exponential and 400 as the denominator are just arbitrary: they were chosen by Elo so that a difference of 400 points corresponds to a winning probability of 90% for the stronger player.

Here’s a personal confession: my FIDE rating is around 1500. As a comparison, the rating of Magnus Carlsen (the World Champion) is around 2800. What are my chances if I were given the opportunity of playing against him? This is left as an exercise (hint: my chances are embarrassingly low).

Updating the rating

FIDE updates the rating after each game according to the following rule:

$$R_a’ = R_a + K (O_a – E_a) \tag{3}$$

$$R_b’ = R_b + K(O_b – E_b) \tag{4}$$

where \(R_a’\) is the updated ranking for \(A\) after playing against \(B\) with real outcome \(O_A\) (1 for win, 0 for loss) and expected outcome \(E_a\) as computed in Eq. (1) .  \( K\) is a constant established to \( K=40\) for new players (for which updates might need to be more aggresive given the initial uncertainty) and \( K=20\) for more consolidated ones. These equations tell us three things:

  1.  A win always increases the rating, because \(O_a – E_a \ge 0\) in that case. For the same reason, a loss always decreases the rating.
  2. The increase is low for the winner if he/she was the favorite (if \(E_a\) is close to 1, then \(O_a-E_a\) is close to 0) and high if he/she was the underdog (if \(E_a\) is close to 1, then \(O_a-E_a\) close to 1) .
  3. Because of the system’s symmetry, the points won by the winner and those lost by the loser are equal, because \(O_b = 1 – O_a\) and \(E_b = 1-E_a\) so that \((O_b – R_b) = – (O_a – R_a)\).

Where do these update equations come from? They can be easily derived by using a stochastic gradient descent algorithm to maximise the log-likelihood of the model described by Eqs. (1) and (2).

A practical application

The Elo system is used by Nate Silver in his popular site FiveThirtyEight to rank the NBA teams and perform predictions on their future games. In the following, we apply the system to the results obtained by the football teams playing in the Spanish league at the current instant (31st round of the 2017/18 season). The data can be downloaded from in a machine-friendly format. We have initialized the values of all teams to 1500 (similarly to FIDE and FiveThirtyEight) and used an update factor \(K=40\).  The results of applying the Eqs. (1) to (4) to the 310 games are shown in the Figure below.

The Elo values are in agreement with the official ranking, even when they are built by giving 3, 1 and 0 points to the win, draw and loss, respectively (in contrast, we use 1, 0.5 and 0 points to compute the Elo values). The graph suggests that four teams are ahead on top, securing their qualification for the Champion’s League (although Real Madrid is 3rd in Elo while 4th in the table). They also suggest that there are three teams sunk in the bottom, which probably will descend to the Second division next season. However, the Elo suggests that Betis is stronger than suggested by the official table: in both cases they are 5th, but with a clear trend upwards in the Elo graph after 4 consecutive wins.

The (roughly) 400 points between the 1st and the 20th teams (Barcelona and Málaga, respectively) would correspond to a winning probability of 90% for Barcelona if the draw were not considered. This difference is narrower than bookmakers usually assign to a first-vs-last game. The reason for this underestimation is that our approach has over-simplified the problem by assigning the same initial Elo rating to all teams, instead of performing a multi-season estimation, that would lead to higher differences. In other words, we have assumed that all teams are equally strong at the beginning of the current season instead of inheriting the rating from the previous one, which would have made the rating gap wider.

Will these probability estimations, combined with the clever use of the Kelly criterion, make you rich? Probably not, but Elo rating can be used as an additional feature included in a more sophisticated Machine Learning algorithm.

add a comment