Machine Learning world is quite big. In this blog you can find different posts in which the authors explain different machine learning techniques. One of them is clustering and here is another method: **Hierarchical Clustering**, in particular the Ward’s method.

You can find some examples in ‘Reproducing the S&P500 by clustering’ by fuzzyperson, ‘Returns clustering with K-Means algorithm’ by psanchezcri or ‘”K-means neves fails”, the said…’ by fjrodriguez2.

There are some clustering methods, such as partitional clustering or hierarchical clustering, among others. The partitional one is simply a division of the data set into non-overlapping clusters such that each object is in exactly one cluster. However, **the hierarchical one permits clusters to have subclusters, as if in a tree**. Each node (cluster) is the union of its children (subclusters), and the root of the tree is the cluster containing all the objects. This post focuses on the hierarchical clustering.

One of the differences between the hierarchical cases and some partitional ones, such us K-Means, is that in the hierarchical method **once a cluster is formed, it cannot be split** or it cannot be combined with other clusters.

## How does hierarchical clustering work?

This is one of the easiest methods and you can find two types of hierarchical clustering: agglomerative or divisive. **The agglomerative case starts with every object being a cluster itself and** in the next steps merging with the two closest clusters, finally **the process finishes with every object in one jolly cluster**. The divisive algorithm, in turn, starts with every object in one cluster and ends with every object in individual clusters.

In any case, the steps to follow are very straightforward:

- Deciding which
**variables**to use as characteristics to check the similarity. **Standardizing**the variables. This point is very important as variables with large values could contribute more to the distance measure than variables with small values.- Stabilizing the criterion to determine
**similarity or distance**between objects. - Selecting the criterion for determining which clusters to merge at successive steps, that is, which
**hierarchical clustering algorithm**to use. - Setting the
**number of clusters**needed to represent data.

One of the ways to represent this technique is plotting a **dendrogram**, as you can see below. In it you see the links between each element of the data, and the links between the clusters themselves. Just to know how the data is divided, you can draw a horizontal line and each object that is linked to each vertical line is included in one cluster. For example, the first horizontal line splits the data into two clusters: the green one and the red one.

## Ward’s method

There is a great deal of hierarchical clustering algorithms, but this post is done by using one of them: the Ward’s method. For this method the proximity between two clusters is defined as the increase in the squared error that results when two clusters are merged.

## Asset management by clustering

You can think about **a lot of ways to use clustering**. I propose you one of them; maybe not the most intuitive or the best one, but just one. I encourage you to try different options.

As I have mentioned, in this application I follow the straightforward steps for clustering:

- Characteristics: the return and volatility over the previous six months.
- Standardizing the variables.
- Distance between objects: Euclidean.
- Hierarchical clustering algorithm: Ward.
- Number of clusters: four.

And now, what do we do with these clusters? We select the cluster which has the maximum performance and the minimum volatility, in order to select it we sort the clusters by performance and volatilility and we choose the one which is on the top. If there are two clusters with the same position, we select the one with higher performance. Then we invest in each asset that the cluster is composed of, equally weighted invested. We do so every day. The universe is composed of fixed income and equity from all countries assuming a currency hedge, so that a good benchmark could be the MSCI World Local Currency.

The result is quite good, as it outperforms the benchmark during the whole period. The major benefit of this strategy is the** protection in the most important market losses** in 2008 and 2011.

## Other uses

As agglomerative hierarchical clustering algorithms tend to make good local decisions about combining clusters, they can be used as **a robust method of initializing other clustering methods**, such as K-means which changes the clusters until finding the best division. Combining machine learning techniques is the way!

## related posts

### add a comment

[…] Hierarchical clustering, using it to invest [Quant Dare] Machine Learning world is quite big. In this blog you can find different posts in which the authors explain different machine learning techniques. One of them is clustering and here is another method: Hierarchical Clustering, in particular the Wards method. You can find some examples in Reproducing the S&P500 by clustering by fuzzyperson, Returns clustering with K-Means […]