We’ve spoken previously about different clustering methods many times: K-Means, Hierarchical Clustering, and so on. However, this field does not end here. In this post, I will try to find how **K-Means clustering works in an investment solution**.

# K-Means Clustering

The K-Means algorithm **partitions** the points in a data set **into clusters**. This partition minimises the sum, across the clusters, of the within-cluster sums of **point-to-cluster-centroid distances** (you can look here for further information).

As I did in a previous post “Hierarchical clustering, using it to invest“, I will use this clustering method to invest in a set of assets. I only have to follow these straightforward steps:

- Characterising data with the return and volatility from the previous six months.
- Standardising the variables.
- Applying K-Means algorithm looking for 4 clusters.
- Selecting the cluster which has the maximum performance and the minimum volatility (if there are two clusters with the same position, I select the one with higher performance).
- Investing in each asset that the cluster is composed of, equally weighted, every day.

The universe is composed of fixed income and equity from all countries assuming a currency hedge so that a good benchmark could be the MSCI World Local Currency.

The result is not good, as it does not outperform the benchmark during the whole period. There is a **protection in the most important market losses** in 2008 and 2011 (marked as a blue circle), but it underperforms the benchmark in 2009 and 2015 (marked as a red circle).

# Helping K-Means

K-Means clustering does not always work, as we’ve just discovered in the previous test. Moreover, they told that in this post.

In my previous post, I said that the result of Hierarchical clustering could be used as a robust method of initializing other clustering methods. Thus, I will use the Hierarchical clustering to** initialise the K-Means algorithm**.

We repeat the previous simulation process, adding a new step:

- Characterising data with the return and volatility from the previous six months.
- Standardising the variables.
- Applying Hierarchical clustering algorithm (Ward) with Euclidean distance.
- Applying K-Means algorithm looking for 4 clusters with the clusters reached in Ward clustering.
- Selecting the cluster which has the maximum performance and the minimum volatility (if there are two clusters with the same position, we select the one with higher performance).
- Investing in each asset that the cluster is composed of, equally weighted, every day.

In this case, the **result is better in the whole period**, but it does not outperform the reference index.

# Conclusion

K-Means clustering can be a good method to separate the data set into groups, but we need to look for better characteristics to describe the data. In addition, if we initialised the centroids, the results would improve.

In conclusion, **K-Means is an easy and useful algorithm,** but it needs help.