# Estimating the probability of something that never happened

### pmercatoris

#### 21/11/2018

Have you ever needed to estimate the probability of a rare event? So rare that you haven’t been able to encounter it in real data? Well, what if I told you that there exists a way to calculate a statistically correct approximation. Oh, and you won’t even need a calculator!

Recently I have just heard of the statistical “Rule of three”. No, this is not the mathematical “Rule of three” where if:

then:

Nor is it the rule where, in literature, a trio of events is always more humorous/satisfying than any other number. And neither is it the “Rule of thirds” helping the photographic composition.

No, this one states that it is possible to estimate the probability of an event that has never occurred, to actually happen. For example, imagine you are looking for the probability of a typo to occur in a book. You would ideally count all the typos in the book and divide by the number of pages. However, what if you can only read the first 30 pages, and you need to give an estimate. Unfortunately, you haven’t found any typo in those pages. Would you then estimate that probability to be 0? There must certainly be a typo. There has to be!

Following an easy and useful rule of thumb, you can give a statistically correct estimate with a 95% confidence. And this is just by dividing 3 by n (the number of observations with counted and the event hasn’t occurred). So in our previous example, having counted 30 pages we can estimate with a 95% confidence that the real proportion is between 0 and 3/30. As most statistical inference rules though, it is recommended to have n  30 (See Wikipedia for more details).

However, I wanted to know whether this method could be applied to finance. More particularly, to estimate the probability of a big loss in a series. To do this, I have counted a number of drawdown events lower than -5, -10 and -15% for the S&P 500 Index since the 1st of January 1965. Then, for each month I have calculated the proportion of such event over the passing months, without using the “Rule of three”. As we can see, if we were to use that proportion as the probability of such event to occur in the future, we would greatly underestimate the risk, as the probability would be 0 for more than one year (for a drawdown bigger than -15%). That isn’t very reassuring when all we want is controlling the risk…

Applying the “Rule of three”, we get a rapidly decreasing (initially) followed by a probability that never reaches 0. In the graph below, we can actually see what was the probability estimated by the rule before the first event was registered. We can then see how the real probability stays below that prediction.

Another way I wanted to test it, was to use generated (but realistic) financial time series. Below, we can see 20 of the 1000 series generated over the same period (from January 1965).

The simple test used was to calculate the proportion of the 1000 time series where the “Rule of three” estimate was higher than the real proportion of events over the whole series. This was done of varying events severity (0 to – 15% loss in a month).

As we can see, the proportion does not quite reach 95%, but that might be because 1000 series aren’t enough series to estimate the real distribution, as I was accepting any estimate from the “Rule of three” with n 30 (very close to the minimum required). Then the proportion drops from -8% loss, but that is most likely due to the fact that the frequency of events is just too high.

Overall, I could not really reject the validity of the “Rule of three”, but I have certainly managed to show that it is only an estimate. It is better to use it than not when estimating future probabilities of events that did not happen, but this only gives a 95% confidence. There is still a 5% chance to be wrong…

Let me know in the comments, if you have managed to use that rule in any other meaningful way!