The normal distribution is used in a broad number of fields. In finance, it was first used by Louis Bachelier back in 1900 to describe the movements of a stock price [1] and chances are you have heard about it if you are working in this field. But, where does it come from?
Today we will talk about the origins of this mathematical function, which was not always called Normal.
Brief history
If I have seen further it is by standing on the shoulders of Giants
Newton
In mathematics is often hard to find all the components and origins that lead to a discovery. This happens because many of the achievements are not made by one particular person, but rather by a large number of people during long periods of time.
Such is the case with the normal distribution. Our story starts with De Moivre (1667-174) who was the first person to describe this function as we know it today.
De Moivre was raised in France but he moved to England when he was around 20, escaping the persecution of Huguenots in France. Not long after he had arrived to England, he became friends with other great mathematicians such as Newton and Stirling [3], both of whom contributed very decisively on the first derivations of the normal distribution.
Interestingly, later in his life, Newton referred to De Moivre some mathematical questions arguing that “He knows all these things better than I do”. [3]
In 1733 he published in Latin the book The Doctrine of Chances, which he translated to English in 1738. His doctrine “far from encouraging Play, that it is rather a Guard against it, by setting in a clear Light, the Advantages and Disadvantages of those Games wherein Chances are concerned.” [2]
In his book you can find the solution to questions such as “to find the probability of throwing an Ace in three rows”. But the topic that brings our attention today is:
“A Method of approximating the Sum of the Terms of the Binomial \((a+b)^n\) expanded into a Series, from whence are deduced some practical Rules to eliminate the Degree of Asset which is to be given to Experiments.”
The problem
As he states, some solutions to problems of probability require adding several terms of the binomial \((a+b)^n\). But finding this aggregated sum gets more and more complex as n increases.
But… why are these terms useful?
To answer this question we will see an example using the coefficients of the binomial expansion. Notice that those coefficients gives us the Pascals triangle.

Let’s suppose now that we have a tricked coin and the chances of getting heads “a” are 4 times greater than those of getting tails “b”. If we toss the coin twice, what are the probabilities of getting two heads?
Having:
$$
a=4 ; \ b=1 ; \ n=2
$$
The solution to this problem is given by the ratio of the first term to the sum of all the terms, this is:
$$
\frac{a^2}{(a+b)^2}= \frac{4^2}{(5)^2}=0.64
$$
As you can see, we can get the probabilities of any event in the binomial distribution just by knowing its terms!
Furthermore, De Moivre had a special interest on finding an expression for the ratio of the middle term to the sum of the terms as n increases.
Why the ratio of the middle term?
The middle term is the especial case where both events “a” and “b” are repeated the same number of times. When “a” and “b” have equal probabilities, the middle term is equal to the height of the distribution and the ratio gives us the probability of getting the mean.

Solving the problem
It took him no less than 12 years until he found a function to this problem.
He started by considering the expansion of \((1+1)^n\), which implies equal weights for both outcomes, similar to a coin toss and focussing on the ratio of the middle term to the sum of the coefficients.
As an example, for n=4, having a=1 and b=1 this ratio would be \(\frac{6}{16}\). Notice in the table that 6 is the coefficient in the middle when n=4 and represents the number of possible combinations of getting equal tails and heads in 4 tosses and 16 is the sum of the terms.
He found that this ratio could be expressed as a function of n by the fraction:
$$
R=\frac{2A(n-1)^n}{n^n \sqrt{n -1}}= \frac{2A}{ \sqrt{n -1}}\frac{(n-1)^n}{ n^n}
$$
Where:
$$
A = e`^{\frac{1}{12} – \frac{1}{360} + \frac{1}{1260} – \frac{1}{1680} + …}
$$
Luckily for De Moivre, Jacob Bernoulli had discovered not long time ago that:
$$
\lim_{n \to -\infty} (1 – \frac{1}{n})^n = e^{-1}
$$
So the ratio R could be written as:
$$
\frac{2e`^{-1 + \frac{1}{12} – \frac{1}{360} + …} }{\sqrt{n -1}} =
\frac{2}{ e`^{1 – \frac{1}{12} + \frac{1}{360} – …} \sqrt{n -1}} =
\frac{2}{B\sqrt{n -1}}
$$
At this point of the process you may not see many similarities between De Moivre’s expression and the normal distribution. But this is when his friend, James Stirling, comes with an elegant solution finding that the value of B is “the Square-root of the Circumference of a Circle whose Radius is Unity“. So the final ratio would be expressed as:
$$
\frac{2}{\sqrt{2\pi}\sqrt{n-1}}
$$
For large value of n it can be written as:
$$
\frac{2}{\sqrt{2\pi}\sqrt{n}}
$$
You may have notice that using \((1 + 1)^n\) De Moivre is approximating a binomial distribution of \(p=\frac{1}{2}\) whose standard deviation is:
$$
\sigma = \sqrt{np(1-p)} = \frac{\sqrt{n}}{2}
$$
Solving for n we have that:
$$
\sqrt{n} = 2\sigma
$$
So our ratio would be:
$$
\frac{1}{\sqrt{2\pi}\sigma}
$$
As you remember we were calculating the ratio of the middle term, this is when \(x=\mu\) so:
$$
\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x – \mu)^2}{2\sigma^2}}=
\frac{1}{\sqrt{2\pi}\sigma}e^0=
\frac{1}{\sqrt{2\pi}\sigma}
$$
This would gives us the probability of getting the mean.
To conclude
Since De Moivre, other great mathematicians have contributed to the formula, such as Gauss or Laplace, to arrive to the function as we know it today.
Not to mention all the mathematicians that have contributed indirectly before and after De Moivre, such as Newton, Stirling, Jacob Bernoulli, etc.
Thanks for reading and see you in the following post!
References
- Bachelier, Louis. “Théorie de la spéculation.” Annales scientifiques de l’École normale supérieure. Vol. 17. 1900.
- De Moivre, Abraham. The doctrine of chances: or, A method of calculating the probability of events in play. W. Pearson, 1718.
- Abraham de Moivre Wikipedia
- Gélinas, Jacques. “Original proofs of Stirling’s series for log (n!).” arXiv preprint arXiv:1701.06689 (2017).
- History of statistics 2. Origin of the normal curve