# On the origins of Bayesian statistics

### Javier Cárdenas

#### 23/02/2023

Bayesian statistics is a powerful field of mathematics that has wide-ranging applications in many fields, including finance, medical research, and information technology. It allows us to combine prior beliefs with evidence to obtain new posterior beliefs, thereby enabling us to make more informed decisions.

In this post we will have a brief look at some of the main mathematicians that gave birth to this field.

## Before Bayes

To have a better understanding of Bayesian statistics, we need to go back to the 18th century with mathematician De Moivre and his essay “Doctrines of Chances” [1].

In his essay, De Moivre wrote the solution to many of the problems related with probability and gambling of his time. As you may know, his solution to one of his problems led to the origins of the Normal distribution, but that is another story.

One of the most simple problems you can find in his essay is:

“To finding the probability of obtaining three heads in a row with a fair coin.”

Reading the problems described in “Doctrines of Chances”, you may notice that most of them start with an assumption, from which he then calculates the probability of a given event. As an example, in the problem stated above, there is an assumption that the coin is fair and thus the probability of obtaining one head in a toss is 0.5.

This nowadays is expressed in mathematical terms as:

$$P(X|\theta)$$

But, what happens if we don’t know if this coin is fair? What happens if we don’t know $$\theta$$?

## Thomas Bayes and Richard Price

Nearly fifty years later, in 1763 “An Essay towards solving a Problem in the Doctrines of Chances” [2] was published in the Philosophical Transactions of the Royal Society of London.

In the first pages of the document appears a text written by the mathematician Richard Price, summarizing the content of the essay written by his friend Thomas Bayes some years before his death. In his introduction note, Price explains the importance of the discoveries Thomas Bayes had made that weren’t addressed by De Moivre in his essay “Doctrines of Chances”.

Indeed, he was referring to a specific problem:

“The number of times an unknown event has happened and failed being given, to find the chance that the probability of its happening should lie somewhere between any two named degrees of probability.”

In other words, what is the probability of finding our unknown parameter $$\theta$$ between two degrees of probability, having observed a certain event. This is, in fact, one of the first problems related to statistical inference in history, and which gave birth to the name inverse probability. In mathematical terms:

$$P(\theta|X)$$

This, of course, is what we call today the posterior distribution in the Bayes theorem.

#### Uncaused caused

It is really interesting to understand what was driving the research of these two Presbyterian ministers, Thomas Bayes and Richard Price. But to do so, we need to set aside some of our knowledge of statistics.

We are in the 18th century, and probability was becoming a growing area of interest among mathematicians. Mathematicians like De Moivre or Bernoulli had shown that some events occur with a certain degree of randomness, but are nevertheless governed by fixed rules. For example, if you throw a die many times, one sixth of the time it will land on six. It’s as if there was a hidden rule that determined the destiny of chance.

Now, imagine that you are a mathematician and a religious person living during this time period. You would probably be interested in understanding the relation between this hidden rule and God.

And this is indeed what Bayes and Price were asking themselves. They hoped that the solution to this problem would be directly applicable to proving that “the world must be the effect of the wisdom and power of an intelligent cause; and thus to confirm the argument taken from final causes for the existence of the Deity” [2] – that is, the uncaused cause.

## Laplace

Surprisingly, a couple of years after in 1774, and apparently without having read Thomas Bayes essay, the french mathematician Laplace wrote “Mémoire sur la probabilité des causes par les évènements” [3], and essay regarding the problems in inverse probability. In the first page you could read the main principle:

“If an event can be produced by a number n of different causes, the probabilities of these causes given the event are to each other as the probabilities of the event given the causes, and the probability of the existence of each of these is equal to the probability of the event given a cause, divided by the sum of all the probabilities of the event given each of these causes.”

This is the Bayes theorem as we know it today:

$$P(\theta|X)= \frac{P(X|\theta) P(\theta)}{P(X)}$$

Being $$P(\theta)$$ a uniform distribution.

## Coin experiment

We will finish this post by bringing Bayesian statistics to the present and doing a simple experiment with Python and the library PyMC.

Let’s suppose a friend comes to you with a coin an asks you if you think it is a fair coin. Since he is in a hurry he tells you that you can only toss the coin 10 times. As you can see, in this problem we have a unknown parameter $$p$$ which is the probability of obtaining a head in a toss coin, and we would like to have an estimation of what are the most probable values for this $$p$$.

(Note: We are not saying that the parameter $$p$$ is a random variable, but that this parameter is fix and we want know between which values is more likely to be.)

To have different views of the problem we will solve it under two different prior beliefs:

1. You have no previous information about the fairness of the coin and you assign equal probabilities to $$p$$. In this case we would use what it is call an uninformative prior, since you aren’t adding any information in your beliefs.
2. You know by experience that, even tough the coin could be unfair, it is difficult to make it very unfair so you think it is highly unlikely that the parameter $$p$$ is found to be lower than 0.3 or higher than 0.7. In this case we would use an informative prior.

For each of this two cases our prior belief would look like:

After tossing the coin 10 times, you obtain 2 heads. Having this evidence, where would we probably find our parameter $$p$$?

As you can see, in the first case, where we didn’t have any previous information of where our parameter $$p$$ could be, the posterior distribution is centered around the Maximum Likelihood Estimator (MLE) $$p=0.2$$ which is the analogous method of statistical inference using the frequentist approach. An the true unknown parameter will lay, with a 95% credible interval, between 0.04 and 0.48.

On the other hand, having a high degree of confidence that the parameter $$p$$ should lie between 0.3 and 0.7, we can see that the posterior distribution is around 0.4 much higher than what our MLE gives us. In this case, the true unknown parameter will lay, with a 95% credible interval, between 0.23 and 0.57.

So in the first case you would say to your friend that you are confident that the coin is unfair. But in the other, you would say to him that you are not sure enough if the coin is fair or not.

As you can see the results differ even though we have the same evidence (2 heads in 10 tosses), under different prior beliefs. This is one of the strengths of Bayesian statistics, similar to the scientific method, it allows us to update our beliefs by combining our prior beliefs with new observation and evidence.

## Conclusions

In today’s post we have seen the origins of Bayesian statistics and its main contributors.
Since then there have been many other important contributions to this field of statistics (Jeffreys, Cox, Shannon…) but we will leave them for future posts.

Thanks for reading and see you in future posts!

## References

1. De Moivre, Abraham. The doctrine of chances: or, A method of calculating the probability of events in play. W. Pearson, 1718.
2. Bayes, Thomas. An essay towards solving a problem in the doctrine of chances. 1763.
3. Laplace, Pierre Simon. Mémoire sur la probabilité des causes par les évènements. 1774