All

Quantitative clustering with Machine Learning

Pablo Sánchez

28/03/2018

No Comments

Today is QuantDare’s 4th birthday. I want to celebrate by doing a quantitative clustering, as an alternative to the traditional sectors clustering. To do so, let’s use some Machine Learning techniques seen in this blog along the years.

What have we learned previously?

First, I want to congratulate the QuantDare writers who are working for the growth of this blog. Here, you can find several explanations and applications of some well-known techniques (both within and outside of Machine Learning) which you can use in finance, or extrapolate to other areas.

During these 4 years, we have seen different posts focused, for example, on some clustering methods like K-Means (for investment), K-Medoids or Hierarchical Clustering (for ETFs) to group a big set of assets. I want to use the knowledge learned in these posts to give an example of a new application.

In this case, we are going to make a cluster, without supervision, that serves as an alternative or complement to our investment strategy. The objective is to group the 500 stocks of the S&P by the behavior of their weekly returns. The aim is to have them group themselves to behave in a similar way, thereby obtaining 11 clusters which are as diversified as possible.

I have 52 dimensions; the value of the last 52 weekly returns (Friday to Friday) of the S&P 500’s stocks. Clustering that data, we can see the Euclidean distance between the different stocks.

The main difference between the quantitative and qualitative clustering is our goal. These quantitative clustering methods will group the stocks universe, taking into account the returns behavior instead of the “financial label” used in the qualitative clustering. It’s easy to do so by using Machine Learning, and it gives us a different approach.

Making it readable

But, wait a moment… Seeing 52 dimensions is impossible for humans, so how can I show the differences?

According to other posts seen on this blog, such as the Principal Component Analysis (PCA), we can reduce the 52 dimensions to the number we want. The stocks will be located close to one another if their returns are related, and on the other hand, the stocks will be far if their returns aren’t related:

Clustering Experiment

As we can see in the plot, there are some differences. These differences come from the variety of approximation processes, and can help us to add value to our investment algorithms, through new ways of diversification.