The Machine Learning world is quite big. In this blog you’ll find posts in which the authors explain different machine learning techniques. One such method is clustering and here’s another method: **Hierarchical Clustering**, in particular the Ward’s method.

You can find some examples in ‘Reproducing the S&P500 by clustering’ by fuzzyperson, ‘Returns clustering with K-Means algorithm’ by psanchezcri or ‘”K-means neves fails”, they said…’ by fjrodriguez2.

There are some clustering methods, such as partitional clustering or hierarchical clustering, among others. The partitional one is simply a division of the data set into non-overlapping clusters such that each object is in exactly one cluster. However, **the hierarchical method permits clusters to have subclusters, as if in a tree**. Each node (cluster) is the union of its children (subclusters), and the root of the tree is the cluster containing all the objects. This post focuses on the hierarchical clustering.

One of the differences between hierarchical cases and some partitional ones, such us K-Means, is that in the hierarchical method, **once a cluster is formed, it cannot be split** or combined with other clusters.

## How does hierarchical clustering work?

This is one of the easiest methods, and you can find two types of hierarchical clustering: agglomerative or divisive. **The agglomerative case starts with every object being a cluster itself **and, in the next steps, merging with the two closest clusters. T**he process finishes with every object in one jolly cluster**. The divisive algorithm, in turn, starts with every object in one cluster and ends with every object in individual clusters.

In any case, the steps to follow are very straightforward:

- Deciding which
**variables**to use as characteristics to check the similarity. **Standardising**the variables. This point is very important, as variables with large values could contribute more to the distance measure than variables with small values.- Stabilising the criterion to determine
**similarity or distance**between objects. - Selecting the criterion for determining which clusters to merge at successive steps. That is, which
**hierarchical clustering algorithm**to use. - Setting the
**number of clusters**needed to represent data.

One of the ways to represent this technique is plotting a **dendrogram**, as you can see below. In it you see the links between each data element, and the links between the clusters themselves. Just to know how the data is divided, you can draw a horizontal line and each object that is linked to each vertical line is included in one cluster. For example, the first horizontal line splits the data into two clusters: the green one and the red one.

## Ward’s method

There’s a great deal of hierarchical clustering algorithms, but this uses just one of them: the Ward’s method. For this method, the proximity between two clusters is defined as the increase in the squared error that results when two clusters are merged.

## Asset management by clustering

**There are a lot of ways to use clustering**. Here, I propose just one of them; maybe not the most intuitive or the best, but just one. I encourage you to try different options.

As I have mentioned, in this application I follow the straightforward steps for clustering:

- Characteristics: the return and volatility over the previous six months.
- Standardising the variables.
- Distance between objects: Euclidean.
- Hierarchical clustering algorithm: Ward.
- Number of clusters: four.

And now, what do we do with these clusters? We select the cluster which has the maximum performance and the minimum volatility. In order to select it, we sort the clusters by performance and volatilility and we choose the one which is on the top. If there are two clusters with the same position, we select the one with higher performance. Then we invest in each asset that the cluster is composed of, equally weighted invested. We do so every day. The universe is composed of fixed income and equity from all countries assuming a currency hedge, so that a good benchmark could be the MSCI World Local Currency.

The result is quite good, as it outperforms the benchmark during the whole period. The major benefit of this strategy is the** protection in the most important market losses** in 2008 and 2011.

## Other uses

As agglomerative hierarchical clustering algorithms tend to make good local decisions about combining clusters, they can be used as **a robust method of initializing other clustering methods**, such as K-means which changes the clusters until finding the best division. Combining machine learning techniques is the way!