post list
QuantDare
categories
artificial intelligence

What is the difference between Artificial Intelligence and Machine Learning?

ogonzalez

artificial intelligence

Random forest: many is better than one

xristica

artificial intelligence

Non-parametric Estimation

T. Fuertes

artificial intelligence

Classification trees in MATLAB

xristica

artificial intelligence

Applying Genetic Algorithms to define a Trading System

aparra

artificial intelligence

Graph theory: connections in the market

T. Fuertes

artificial intelligence

Data Cleansing & Data Transformation

psanchezcri

artificial intelligence

Learning with kernels: an introductory approach

ogonzalez

artificial intelligence

SVM versus a monkey. Make your bets.

P. López

artificial intelligence

Clustering: “Two’s company, three’s a crowd”

libesa

artificial intelligence

Euro Stoxx Strategy with Machine Learning

fjrodriguez2

artificial intelligence

Visualizing Fixed Income ETFs with T-SNE

j3

artificial intelligence

Hierarchical clustering, using it to invest

T. Fuertes

artificial intelligence

Markov Switching Regimes say… bear or bullish?

mplanaslasa

artificial intelligence

What is the difference between Bagging and Boosting?

xristica

artificial intelligence

Outliers: Looking For A Needle In A Haystack

T. Fuertes

artificial intelligence

Machine Learning: A Brief Breakdown

libesa

artificial intelligence

Stock classification with ISOMAP

j3

artificial intelligence

Sir Bayes: all but not naïve!

mplanaslasa

artificial intelligence

Returns clustering with k-Means algorithm

psanchezcri

artificial intelligence

Confusion matrix & MCC statistic

mplanaslasa

artificial intelligence

Reproducing the S&P500 by clustering

fuzzyperson

artificial intelligence

Random forest vs Simple tree

xristica

artificial intelligence

Clasificando el mercado mediante árboles de decisión

xristica

artificial intelligence

Árboles de clasificación en Matlab

xristica

artificial intelligence

Redes Neuronales II

alarije

artificial intelligence

Análisis de Componentes Principales

j3

artificial intelligence

Vecinos cercanos en una serie temporal

xristica

artificial intelligence

Redes Neuronales

alarije

artificial intelligence

Caso Práctico: Multidimensional Scaling

rcobo

artificial intelligence

“K-Means never fails”, they said…

fjrodriguez2

28/04/2016

No Comments
“K-Means never fails”, they said…

It is known that data mining algorithms are not perfect, and can fail under certain conditions. K-Means is an example of that triviality, but there is a good alternative: K-Medoids.

In a previous post, “Machine Learning: A Brief Breakdown” we already mentioned that K-Means is the cluster analysis algorithm par excellence and it is one of the most important data mining and machine learning techniques; even psanchezcri used it to analyse the direction of a financial time series, in his post “Returns clustering with K-means algorithm“.

Nevertheless, it’s difficult to find discussions about the algorithm’s unexpected results in certain cases. The algorithm documentation is too broad on the Internet, so this post’s main objective is to focus on showing a financial example of the problem. With this in mind, we are going to follow 4 steps:

1. At first, we select 6 stocks from STOXX Europe 600 composition. Three pairs from different sectors:

Financials: Banco Bilbao Vizcaya Argentaria S.A. & Banco Santander SA.

– Consumer Discretionary: LVMH Moet Hennessy Louis Vuitton SA & Christian Dior SA.

Energy: BP PLC & Galp Energia SGPS SA.

2. We get the prices between 2013/01/01 and 2015/12/31:

Prices

3. Using daily returns, we calculate the result of “1-correlation distance” between each pair of series. Next, we do a dimensional reduction of the distance matrix to draw points in the Euclidean Space. The stocks turn out grouped by sector.

Distance

4. Finally, we apply K-Means with 3 clusters over distance matrix. We hope that each cluster matches with each sector. As K-means starts with random points, we execute the algorithm 15 times.

About 80% of the time, clustering K-Means obtains the expected result:

K-Means well done

In the remaining 20%, there are “faulty” results like:

K-means fail

However, the very similar technique named K-Medoids provides expected results 100% of the time. It works like K-Means but its centroids are real point instead of means between points.

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Email this to someone

add a comment

wpDiscuz