Thought you knew everything about correlation? Think there’s no fooling you with the question of correlation with financial prices or returns? Well maybe, just maybe, this post will enlighten you.

**Correlation: the debate is on**

Correlation can be a controversial topic. Things can go awry when two seemingly unrelated variables appear to move in a similar pattern and are found to be correlated. Take a look here at some unusual examples. My personal favourite is the clear relationship between the age of Miss America winners and the number of murders by hot things. There’s no denying it folks, just take a look for yourselves…

Although there must be similar cases with financial series (and I’d be interested to know of any) this post focuses on another tricky aspect of correlation in finance. We take a look at a typical mistake made by most finance newbies: **calculating correlation with prices instead of returns**. We’ve all been there.

You’ve just begun your quant career and been made aware of your mistake; “you should use returns not prices for correlation”. And you accept it without a second thought and continue with your research, right? Well, now is your chance to take a closer look at that pesky correlation and prepare to be amazed.

But hold on a second, why are we even interested in correlation?

**Correlation is the key to diversity**

Who hasn’t heard the phrase “diversify your portfolio”? Diversification is pretty much number one priority in financial management (after making money, of course). The concept of **not putting all your eggs in one basket** is not new and it makes complete sense to control risk by spreading investments.

Diversifying methods vary from selecting different asset classes (funds, bonds, stocks, etc.), combining industries, or varying the risk levels of investments. And the most common and direct diversification measurement used in these methods is **correlation**.

**A simple decision**

From the point of view of an investor, what would you do given these possible asset investments?Your first reaction is probably “invest in assets A and B, because C doesn’t look as good”. Then after a moment, you think “but A and B look highly correlated, so maybe A and C would be better”.

But how would you feel if I told you that in fact **A and B are** **perfectly negatively correlated** and **A and C perfectly positively correlated**? A little confused, maybe? Not buying it?

Let’s put the returns in a scatter plot:That’s what I said: A and B have negative correlation and A and C positive correlation (and the points lie on exact straight lines). But your thinking: “the prices look positively correlated”. Yes, something strange is going on here.

**Misconceptions**

Don’t worry; you’re not the only one confused. Correlation, despite its apparent simplicity, is often misinterpreted even by experienced academics and investors.

One misconception is that extreme values of correlation imply the movements of two series are in exact opposite directions (for -1) or the same direction (for +1). But this is *not* correct.

Assets A and C are perfectly positively correlated. You would then often hear people say “A and C move up and down together”. But not so fast… for small positive returns of asset A (less than 1%) asset C has negative returns. Hmmm…

Not as common is the belief that the magnitude of the movements is the same for series with ±1 correlation. This is also *not* correct.

Assets A and B are perfectly negatively correlated. Some may say “B moves the same amount as A but in the opposite direction”. Nope again. When A moves 4% B moves close to 0%.

Wait, so what did we miss? Let’s go back to basics.

**What is correlation?**

Correlation is how closely variables are related. The Pearson correlation coefficient is its most common statistic and it measures **the degree of linear relationship between two variables**. Its values range between -1 (perfect negative correlation) and 1 (perfect positive correlation). Zero correlation implies no relationship between variables.

It is defined as the covariance between two variables, say \(X\) and \(Y\), divided by the product of the standard deviations of each. Covariance is an unbounded statistic of how the variables change together, while standard deviation is a measure of data dispersion from its average.

$$\rho^{}_\mathrm{X,Y} = \frac{\mathrm{cov(X,Y)}}{\sigma^{}_\mathrm{X}\sigma^{}_\mathrm{Y}}$$

This formula can be estimated for a sample by:

$$\hat{\rho}^{}_{X,Y} = \frac{\sum^T_{t=1}(x_t-\bar{x})(y_t-\bar{y})}{\sqrt{\sum^T_{t=1}(x_t-\bar{x})^2\sum^T_{t=1}(y_t-\bar{y})^2}}$$ where \(x_t\) and \(y_t\) are the values of \(X\) and \(Y\) at time \(t\). The sample means of \(X\) and \(Y\) are \(\bar{x}\) and \(\bar{y}\) respectively.

**Uncovering the mystery**

Looking carefully at this last formula we see all the bracketed terms are differences to the variable average, so correlation is a comparison of the deviations from the means and not of the variations in the raw data itself. Hence, Pearson actually **measures whether the variables are above or below their average at the same time**. The term \((x_t-\bar{x})(y_t-\bar{y})\) is positive if both series are above (or below) their average together (and note the denominator is always positive).

So a correct statement of perfect positive correlation would be “the upward deviations from the mean of asset A returns are simultaneous to upward deviations from the mean of asset B returns, and similarly with downward deviations”.

This isn’t as intuitive as the typical “asset B goes up and down with asset A” and it is certainly not as easy to visualise. It’s no wonder correlation can be misleading.

**Removing the mean**

Let’s go back to our example. The asset prices were created to follow geometric Brownian motions with a trend component and an irregular component. All three series have strong, positive, constant trend components, hence their upward random walks (A and B have the same magnitude and C has half). The irregular components are generated with the same series of random numbers but their sign, have been inverted for B. These settings ensure the extreme correlations between the series.

If we create two new series E and F with **trend components set to zero** then the upward bias is removed in the prices but the correlation on the returns stays the same. This is because the trend component doesn’t matter in the correlation calculation since it compares deviations from the mean returns, or in other words, from the trend.The difference is that all upward returns in asset E do correspond to downward returns in asset F, and vice versa. This is like shifting the axes in the first scatter plot and centring them on the means of the series of A and B.

This shifting concept can be applied to the correlation calculation by **removing the means** from the formula: $$\hat{\mathrm{dq}}^{}_\mathrm{X,Y} = \frac{\sum^T_{t=1}x_ty_t}{\sqrt{\sum^T_{t=1}x_t^2\sum^T_{t=1}y_t^2}}$$

Instead of comparing deviations from the series’ averages we are directly comparing the values themselves. Using this QuantDare formula, we have the following correlations on the asset returns:Well, it kind of makes more sense looking at the price series, but they’re very different to the Pearson coefficients.

But hold on a second, wasn’t this post about correlation of prices and returns?

**Prices vs returns**

Yes, let’s get back to that. Thinking about Pearson’s formula, it’s more likely that deviations from average prices are above and below at the same time since financial series usually have an upward bias together. Due to this, price correlations tend to be positive. Also, prices are not independent. Let \(P_t\) be the price of an asset at time \(t\) and then the time series can be written as: $$P_0, P_1, P_2, …, P_T.$$ Let \(R_t\) be the return at time \(t\): \(R_t = P_t-P_{t-1}\). Then we can rewrite the price series as: $$P_0, P_0+R_1, P_0+R_1+R_2, …, P_0+R_1+…+R_T.$$

Imagine correlation calculated over these prices. The first return \(R_1\) contributes to all the following entries and impacts every data point. On the other hand, the last return \(R_T\) only contributes to one. In this way, early changes in the prices have more weight than later changes in the correlation calculation whereas with the returns each one has equal importance. For this reason, correlation with prices is more sensitive to the number of time periods it’s calculated over.

Using our asset examples, the Pearson correlation coefficients over prices are more in line with the visual perception. The magnitudes are different, but the signs coincide with the QuantDare formula with returns. This QD formula, however, doesn’t work with prices. It always produces positive correlations since it requires stationary series.

### Which correlation calculation convinces us more?

Well, it all depends on the relationship you’re interested in comparing. Short-term changes are better interpreted from returns correlations, whilst valorations of long-term evolutions may be improved using prices. And if what you really want is to analyse if two series move up and down together, then you should replace the Pearson coefficient with the QuantDare formula over the return series.

The most important thing with correlation is to really** understand what is being measured and give the correct interpretation**. It is such a common statistic used by professionals and laymen alike in all kinds of fields; it is easy to build a false confidence around its meaning and make inaccurate statements or misleading conclusions.

But maybe, just maybe, this post will help to avoid future confusion and misinterpretation of this useful measure of relationship.