post list
QuantDare
categories
asset management

Foreseeing the future: a user’s guide

Jose Leiva

asset management

Stochastic portfolio theory, revisited!

P. López

asset management

“Past performance is no guarantee of future results”, but helps a bit

ogonzalez

asset management

Playing with Prophet on Financial Time Series (Again)

rcobo

asset management

Shift or Stick? Should we really ‘sell in May’?

jsanchezalmaraz

asset management

What to expect when you are the SPX

mrivera

asset management

K-Means in investment solutions: fact or fiction

T. Fuertes

asset management

How to… use bootstrapping in Portfolio Management

psanchezcri

asset management

Playing with Prophet on Financial Time Series

rcobo

asset management

Dual Momentum Analysis

J. González

asset management

Random forest: many are better than one

xristica

asset management

Using Multidimensional Scaling on financial time series

rcobo

asset management

Comparing ETF Sector Exposure Using Chord Diagrams

rcobo

asset management

Euro Stoxx Strategy with Machine Learning

fjrodriguez2

asset management

Lasso applied in Portfolio Management

psanchezcri

asset management

Markov Switching Regimes say… bear or bullish?

mplanaslasa

asset management

Exploring Extreme Asset Returns

rcobo

asset management

Playing around with future contracts

J. González

asset management

BETA: Upside Downside

j3

asset management

Approach to Dividend Adjustment Factor Calculation

J. González

asset management

Are Low-Volatility Stocks Expensive?

jsanchezalmaraz

asset management

Predict returns using historical patterns

fjrodriguez2

asset management

Dream team: Combining classifiers

xristica

asset management

Stock classification with ISOMAP

j3

asset management

Could the Stochastic Oscillator be a good way to earn money?

T. Fuertes

asset management

Correlation and Cointegration

j3

asset management

Momentum premium factor (II): Dual momentum

J. González

asset management

Dynamic Markowitz Efficient Frontier

plopezcasado

asset management

‘Sell in May and go away’…

jsanchezalmaraz

asset management

S&P 500 y Relative Strength Index II

Tech

asset management

Performance and correlated assets

T. Fuertes

asset management

Reproducing the S&P500 by clustering

fuzzyperson

asset management

Size Effect Anomaly

T. Fuertes

asset management

Predicting Gold using Currencies

libesa

asset management

Inverse ETFs versus short selling: a misleading equivalence

J. González

asset management

S&P 500 y Relative Strength Index

Tech

asset management

Seasonality systems

J. González

asset management

Una aproximación Risk Parity

mplanaslasa

asset management

Using Decomposition to Improve Time Series Prediction

libesa

asset management

Las cadenas de Markov

j3

asset management

Momentum premium factor sobre S&P 500

J. González

asset management

Fractales y series financieras II

Tech

asset management

El gestor vago o inteligente…

jsanchezalmaraz

asset management

¿Por qué usar rendimientos logarítmicos?

jsanchezalmaraz

asset management

Fuzzy Logic

fuzzyperson

asset management

El filtro de Kalman

mplanaslasa

asset management

Fractales y series financieras

Tech

asset management

Volatility of volatility. A new premium factor?

J. González

asset management

Hierarchical clustering, using it to invest

T. Fuertes

22/06/2016

1
Hierarchical clustering, using it to invest

The Machine Learning world is quite big. In this blog you’ll find posts in which the authors explain different machine learning techniques. One such method is clustering and here’s another method: Hierarchical Clustering, in particular the Ward’s method.

You can find some examples in ‘Reproducing the S&P500 by clustering’ by fuzzyperson, ‘Returns clustering with K-Means algorithm’ by psanchezcri or ‘”K-means neves fails”, they said…’ by fjrodriguez2.

There are some clustering methods, such as partitional clustering or hierarchical clustering, among others. The partitional one is simply a division of the data set into non-overlapping clusters such that each object is in exactly one cluster. However, the hierarchical method permits clusters to have subclusters, as if in a tree. Each node (cluster) is the union of its children (subclusters), and the root of the tree is the cluster containing all the objects. This post focuses on the hierarchical clustering.

HierarPartClustering

One of the differences between hierarchical cases and some partitional ones, such us K-Means, is that in the hierarchical method, once a cluster is formed, it cannot be split or combined with other clusters.

How does hierarchical clustering work?

This is one of the easiest methods, and you can find two types of hierarchical clustering: agglomerative or divisive. The agglomerative case starts with every object being a cluster itself and, in the next steps, merging with the two closest clusters. The process finishes with every object in one jolly cluster. The divisive algorithm, in turn, starts with every object in one cluster and ends with every object in individual clusters.

AggloDivHierarClustering

In any case, the steps to follow are very straightforward:

  • Deciding which variables to use as characteristics to check the similarity.esquemaAggHierarClust
  • Standardising the variables. This point is very important, as variables with large values could contribute more to the distance measure than variables with small values.
  • Stabilising the criterion to determine similarity or distance between objects.
  • Selecting the criterion for determining which clusters to merge at successive steps. That is, which hierarchical clustering algorithm to use.
  • Setting the number of clusters needed to represent data.

One of the ways to represent this technique is plotting a dendrogram, as you can see below. In it you see the links between each data element, and the links between the clusters themselves. Just to know how the data is divided, you can draw a horizontal line and each object that is linked to each vertical line is included in one cluster. For example, the first horizontal line splits the data into two clusters: the green one and the red one.dendrograma

Ward’s method

There’s a great deal of hierarchical clustering algorithms, but this uses just one of them: the Ward’s method. For this method, the proximity between two clusters is defined as the increase in the squared error that results when two clusters are merged.

Asset management by clustering

There are a lot of ways to use clustering. Here, I propose just one of them; maybe not the most intuitive or the best, but just one. I encourage you to try different options.

As I have mentioned, in this application I follow the straightforward steps for clustering:

  • Characteristics: the return and volatility over the previous six months.
  • Standardising the variables.
  • Distance between objects: Euclidean.
  • Hierarchical clustering algorithm: Ward.
  • Number of clusters: four.

And now, what do we do with these clusters? We select the cluster which has the maximum performance and the minimum volatility. In order to select it, we sort the clusters by performance and volatilility and we choose the one which is on the top. If there are two clusters with the same position, we select the one with higher performance. Then we invest in each asset that the cluster is composed of, equally weighted invested. We do so every day. The universe is composed of fixed income and equity from all countries assuming a currency hedge, so that a good benchmark could be the MSCI World Local Currency.

The result is quite good, as it outperforms the benchmark during the whole period. The major benefit of this strategy is the protection in the most important market losses in 2008 and 2011.

ResultAggHierClustering

Other uses

As agglomerative hierarchical clustering algorithms tend to make good local decisions about combining clusters, they can be used as a robust method of initializing other clustering methods, such as K-means which changes the clusters until finding the best division. Combining machine learning techniques is the way!

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Email this to someone

add a comment

[…] Hierarchical clustering, using it to invest [Quant Dare] Machine Learning world is quite big. In this blog you can find different posts in which the authors explain different machine learning techniques. One of them is clustering and here is another method: Hierarchical Clustering, in particular the Wards method. You can find some examples in Reproducing the S&P500 by clustering by fuzzyperson, Returns clustering with K-Means […]

wpDiscuz