post list
QuantDare
categories
risk management

Principal Component Analysis

j3

risk management

Exploring Extreme Asset Returns

rcobo

risk management

Playing around with future contracts

J. González

risk management

BETA: Upside Downside

j3

risk management

Predicting Gold using Currencies

libesa

risk management

Inverse ETFs versus short selling: a misleading equivalence

J. González

risk management

Cointegración: Seguimiento sobre cruces cointegrados

T. Fuertes

risk management

Using Decomposition to Improve Time Series Prediction

libesa

risk management

Clasificando el mercado mediante árboles de decisión

xristica

risk management

In less of a Bayes haze…

libesa

risk management

Cointegración

T. Fuertes

risk management

Cópulas: una alternativa en la medición de riesgos

mplanaslasa

risk management

¿Por qué usar rendimientos logarítmicos?

jsanchezalmaraz

risk management

In a Bayes haze…

libesa

risk management

Teoría de Valores Extremos

kalinda

risk management

Returns clustering with k-Means algorithm

psanchezcri

14/10/2015

5
Returns clustering with k-Means algorithm

Do you know how a fireman and the direction of a financial time series are related? If your answer is no, you’re reading the right post.
 

An introduction to k-Means: Voronoi diagram

Suppose that you are a worker in an emergency center in a city and your job is to tell the pilots of firefighter helicopters to take off. You receive an emergency call because there is a point of the city on fire and a helicopter is necessary to put it out. You need to choose which pilot has to do this work. It’s obvious that the farther helicopters (grey helicopters) will arrive later than the closer helicopters (red helicopters) but you don’t know which is the closest.

Farther_Closer_Helicopters

Georgy Voronoi (a mathematician born in the Russian Empire in 1868), defined the Voronoi diagram (is also called Thiessen polygons or Dirichlet Tesselation in honor of Alfred Thiessen and Gustav Lejeune Dirichlet) to find the answer to this kind of problems. It consists in associating all the firefighter helicopters to a polygonal cell (called Voronoi cell or Voronoi region) where all points included in it are closer to this helicopter than the others. Using the Euclidean distance, if you keep your eyes on a particular helicopter (hi), for each pair of helicopters (hi, hj) i<>j, the points set which is closer to hi than hj is defined as:

Formula H

Formula H2

hi Voronoi cell is the intersection of all half-planes where hi is inside:

Formula vor

If you calculate all Voronoi cells (for each helicopters) you could say which helicopter is the closest:

Voronoi_helicopters

You can find a lot of Voronoi diagrams in nature like:

  • Giraffe skin:

giraffe-59009_1920

  • Wings of a dragonfly:

dragonfly-862886_1920

  • Dry desert and more:

ground-753070_1920
 

k-Means algorithm in finance

Voronoi diagram is used in some machine learning techniques like clustering. In this post, you can learn how the k-means algorithm works (a clustering algorithm).

Given a n-dimensional points set (two-dimensional points set in our example), you need to define the number of classes (k classes) that you want to get. Using both things, the algorithm makes the following steps:

1. At first, there are two options to set initial points:

1.1. Choose k (two-dimensional) points.

1.2. Generate k random points.

2. Calculate Voronoi cells for the initial points.

3. Calculate midpoint of all points included in each region.

4. Calculate Voronoi cells for the midpoints.

5. Repeat steps 3 and 4 while midpoints are changing.

K-means algorithm could be applied to a financial time series. In finance it is really important to define which kind of returns and trends are in a time serie. As a first approach, using weekly percentage returns of a financial time serie (like Standard & Poor’s 500 Index), you could show two-dimensional scatter plot (weekly returns vs shift one week weekly returns) using the following matrix (x axis is the first column and y axis is the second column):

matriz

scatter_week_spx

At this moment you need to look how many classes you must to get. My objetive is to cluster weekly returns in three kinds of return movements called UpwardDownward and No trendIf I only select 3 classes the results are not coherent but I try it selecting more classes: Upward, Low Upward, No trendLow Downward and Downward. If I test the k-means algorithm using both initial options and 5 classes:

Option 1: Predefined initial points. I test the k-means function using like five initial points the four quadrants middle point and axis intersection, for example:

Captura

predef_mix

Option 2: Random initial points. Using 5 random points, the centers setted by k-means algorithm seem to be worse:

random_mix

The trend separation produced by predefined initial points is better than the separation produced by random initial points because the final voronoi cells produced in option 1 are more coherent than option 2 and it’s not necessary to look at the points after to associate it to a movement. I associate the classes (produced by k-means algorithm using weeki return and weeki-1 return) to weeki because I don’t want to use information early.

To finish the clustering, I only need to rename classes and colour it in S&P500 Index spot:

1. Downward = Downward + Low downward

2. Upward = Upward + Low Upward

SP500

Trends separation doesn’t look perfect and trends are so short. I need to keep in mind I’m using only two consecutive returns in clustering and it is neccesary to leave hanging the next question:

Could I get better results if the dimension (weekly returns before present) is increased in clustering?

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Email this to someone

add a comment

[…] Returns clustering with K-means algorithm [Quant Dare] […]

Why k-means? Why not a moving average?

Hi Shyam!

Thanks for reading this post, I hope you liked it.

How to apply the k-means algorithm in financial time series is the main objetive in this post. For that reason, I only show how to define the trend separation using this algorithm. However, moving average is another technique that you could use to do it. It could be a good topic to write about.

[…] Returns clustering with K-means algorithm [Quant Dare] Do you know how a fireman and the direcion of a financial time series are related? If your answer is no, youre reading the right post. Voronoi diagram Suppose that you are a worker in an emergency center in a city and your job is to tell the pilots of firefighter helicopters to take off. You receive an emergency call because there is a point of the city on fire and a helicopter is necessary to […]

Hi Mike:

I’m so grateful for the reference in your blog. I hope that all the readers like it.

See you soon in a new post.

wpDiscuz