Forecasting the market or the outcome of a gamble is important. Deciding how much to invest or bet based on how confident you are about the prediction is similarly as important. But don’t let the pressure get to you; the Kelly criterion is here to help us make this decision.
Betting with the Kelly criterion
Imagine you are invited to place bets on an indefinite sequence of coin tosses with fair odds (2:1). Also imagine you have the opportunity to test the coin in advance, which turns out to be slightly loaded with \(P(head)=0.53\). Given this “inside information”, what strategy would you follow? Common sense tells us a couple of things:
- If the bookmaker (or bookie) is offering a payoff of 2:1 for both heads and tails, it means that he is assuming an implicit probability of 1/2 for both options. We should never bet on tails since the implicit probability is higher than the real one: \(P(tail)=0.47\).
- Given that only betting on heads makes sense, we should do it with caution and avoid betting our full bankroll, since we have a 47% probability of going bankrupt.
So, what fraction f of our wealth should we bet on each trial? Let’s do the maths.
Let \(g_t= X_{t} / X_{t-1} \) be the gain obtained after the t-th bet. A reasonable criterion would be to maximise the compound gain at the end of the sequence.
$$ G_{\infty} = \frac{X_{\infty}}{X_0} = \prod_{t=0}^{\infty} \frac{X_{t+1}}{X_t} = \prod_{t=1}^{\infty} g_t $$
Equivalently, we can take the logarithm to transform the product into a sum.
$$ \log G_{\infty} = \sum_{t=1}^{\infty} \log g_t $$
Let us assume the bet is a binary event that pays c:1. Let us also assume we are certain that the probability of winning the bet is p. If we bet a fraction f of our wealth, the expected gain is given by:
$$ E {\log g} = p \log(1+cf) + (1-p)\log(1-f) $$
where we have removed the t from \(g_t\) since this expectation is the same for all trials (probabilities and payoffs are constant along time). To maximise \(G_{\infty}\), we can maximise this expectation. The problem boils down to finding the optimal fraction \(f^*\) for all bets. To do so, we merely use pre-school maths: we search for the point where the derivative of \(\log g\) w.r.t. \(f\) is null. The result is given by the expression (this is left as an exercise):
$$f^* = \begin{cases} \frac{pc-1}{c-1} & p>1/c \\ 0 & p \leq 1/c\end{cases} \tag{1}$$
This result is easy to interpret and agrees with the previous common sense statements: if \(p < 1/c\), the true probability of winning is lower than the implicit probability, so \(f^*=0\) (the odds don’t pay off the risk). On the other hand, if \(p=1\), we are absolutely certain about winning and should therefore bet our whole stack \(f^*=1\).
In the example at hand, \(c=2\) and \(p=0.53\), so by applying Eq (1) we obtain \(f^*=0.06\). In other words, we should always bet 6% of our budget on heads no matter what, as long as the coin doesn’t change. In order to check this, let us perform a set of experiments where we flip the loaded coin thousands of times and bet the Kelly fraction in each trial. We include other values of \(f\) together with other fractions for comparison.
Here we can see that the Kelly fraction is indeed the one that maximises the long-term compound return.
The story behind the Kelly criterion
In 1948, the American mathematician Claude Shannon published A Mathematical Theory of Communication: one of the most influential papers of the 20th century. Eight years later, John Kelly, who was Shannon’s colleague at Bell Labs, wrote A New Interpretation of Information Rate: the paper that introduced the Kelly criterion and gave Shannon’s information theory another meaning from the perspective of gambling. After moving to MIT, Shannon met Edward Thorp, to whom she introduced the Kelly criterion. The excellent book, “Fortune’s Formula: The Untold Story of the Scientific Betting System That Beat the Casinos and Wall Street” by W. Poundstone tells the story of Shannon and Thorp’s trip to Las Vegas, where they tested a winning method for Blackjack and measured the bias of roulette with a hidden, portable computer (in the sixties!). Soon after that, Thorp moved focused his interests in the stock market and became one of the most successful hedge fund managers ever (and perhaps the first quant to deserve the name). In his 1998 paper “The Kelly Criterion in Blackjack, Sports Betting, and the Stock Market“, he wrote:
It is now May, 1998, twenty eight and a half years since the investment program began. The partnership and its continuations have compounded at approximately 20% annually with a standard deviation of about 6% and approximately zero correlation with the market. Ten thousand dollars, tax-exempt, would now be worth 18 million dollars.
One of the reasons why Ed. Thorp had such great success is that he was using (and profiting from) the Black-Scholes equation three years before Fischer Black and Myron Scholes published it. Another reason is that he systematically applied the Kelly criterion to the stock market. He outlines some clues as to how he went about this in his paper, “The Kelly Criterion and the Stock Market”, which we summarise in the following.
The stock market
We have previously studied the case of gambling with discrete outcomes. But what about a game, such as the stock market, where the outcomes are continuous? In this case, the expectation is given by an integral instead of a summation:
$$ E\{\log g\} = \int \log(1 + fr) P(r) dr \tag{2} $$
where r is the excess return of the asset in which we’d like to invest (the return minus the Treasury bill’s or another risk-free reference). This return is distributed with \(P(r)\). Again, the optimal fraction is the one that makes the derivative null:
$$ \frac{d}{d f} E\{\log g\} = \int_{-\infty}^{+\infty} \frac{r}{1+fr} p(r) dr = 0 \tag{3}$$
Ok, I admit that it takes a bit more than pre-school maths to solve the problem. This is a good excuse to introduce some nice tools for numerical computation in Python with the Scipy package!
There are two options here:
1. Create a function that computes the integral in Eq. (2) and maximise the function w.r.t. f :
$$ f^* = \arg \max_f E\{\log g\} $$
2. Create a function that computes the derivative of the integral as in Eq. (3) and use a numerical solver to find a zero.
$$ \frac{d}{d f} E\{\log g\}_{f=f^*}=0$$
Thorp focuses on annual returns and suggests modeling \(P(r)\) as a normal distribution truncated at \(\pm 3\sigma\). The reported statistics for the 1926-1984 period are \(\mu = 0.058\) and \(\sigma = 0.216\). Here is the Python snippet that enables us to solve the problem:
from scipy.optimize import minimize_scalar, newton, minimize from scipy.integrate import quad from scipy.stats import norm def norm_integral(f,m,st): val,er = quad(lambda s: np.log(1+f*s)*norm.pdf(s,m,st),m-3*st,m+3*st) return -val def norm_dev_integral(f,m,st): val,er = quad(lambda s: (s/(1+f*s))*norm.pdf(s,m,st),m-3*st,m+3*st) return val # Reference values from Eduard Thorp's article m = .058 s = .216 # Option 1: minimize the expectation integral sol = minimize_scalar(norm_integral,args=(m,s),bounds=[0.,2.],method='bounded') print('Optimal Kelly fraction: {:.4f}'.format(sol.x)) # Option 2: take the derivative of the expectation and make it null x0 = newton(norm_dev_integral,.1,args=(m,s)) print('Optimal Kelly fraction: {:.4f}'.format(x0))
The result is \(f = 1.197\), which is slightly different from the value reported in the article (\(1.17\)). This is due to the fact that we haven’t considered the normalisation needed to account for the fact that the normal distribution has been truncated. So the results of this experiment ultimately recommend leveraged investing in the S&P 500.
Here arises an interesting question: how would the strategy (i.e. this particular fraction) have performed in the last years?
In an initial experiment, we assume that borrowing money is cheap, at the official interest rate at 1 month. The result of applying the previous statistics to the period 1993-2016 is shown in the Figure:
Note that under the unrealistic assumption that money is cheap, the higher the leverage (even above the Kelly fraction), the better. Let’s see what happens when we apply a margin of 2.5% to the official interest rate.
In this case, \( f=1.19\) obtains the best result in terms of return and risk-adjusted return. Also, due to the high cost of borrowing money, profits are not so spectacular and leveraging beyond the Kelly fraction is not such a good idea.