Do you know how a fireman and the direction of a financial time series are related? If your answer is no, you’re reading the right post.

### An introduction to k-Means: Voronoi diagram

Suppose that you a work in an emergency center, and your job is to tell the pilots of firefighter helicopters to take off. You receive an emergency call because there’s a point of the city on fire and a helicopter is necessary to put it out. You need to choose which pilot has to do this work. It’s obvious that the farther helicopters (* grey helicopters*) will arrive later than the closer helicopters (

**), but you don’t know exactly which is the closest.**

*red helicopters**Georgy Voronoi* (a mathematician born in the Russian Empire in 1868), defined the Voronoi diagram (also called *Thiessen polygons* or *Dirichlet Tesselation* in honor of *Alfred Thiessen* and *Gustav Lejeune Dirichlet*) to find the answer to this kind of problems. It associates all the helicopters to a polygonal cell (called *Voronoi* cell or *Voronoi* region), where all points included are closer to one helicopter than the others. Using the Euclidean distance, if you keep your eyes on a particular helicopter (h_{i}), for each pair of helicopters (h_{i}, h_{j}) i<>j, the points set which is closer to h_{i} than h_{j} is defined as:

h_{i} Voronoi cell is the intersection of all half-planes where h_{i} is inside:

If you calculate all *Voronoi* cells (for each helicopter) you could discover which helicopter is the closest:

You can find a lot of *Voronoi diagrams* in nature like:

*Giraffe skin:*

*W**ings of a dragonfly:*

*Dry desert and more:*

## k-Means algorithm in finance

The* Voronoi diagram* is used in some machine learning techniques like clustering. In this post, you can learn how the *k-means algorithm* works (a clustering algorithm).

Given a *n*-dimensional points set (*two*-dimensional points set in our example), you need to define the number of classes (*k* classes) that you want to get. Using both things, the algorithm makes the following steps:

1. At first, there are two options to set initial points:

1.1. Choose *k* (two-dimensional) points.

1.2. Generate *k* random points.

2. Calculate *Voronoi* cells for the initial points.

3. Calculate midpoint of all points included in each region.

4. Calculate *Voronoi* cells for the midpoints.

5. Repeat steps 3 and 4 while midpoints are changing.

*K-means algorithm* could be applied to a financial time series. In finance it’s really important to define *which *kind of returns and trends are in a time serie. As a first approach, using weekly percentage returns of a financial time serie (like *Standard & Poor’s 500 Index*), you could show *two*-dimensional scatter plot (weekly returns vs shift one week weekly returns) using the following matrix (*x* axis is the first column and *y* axis is the second column):

At this point you need to look how many classes you must get. My objective is to cluster weekly returns in three kinds of return movements called * Upward*,

*and*

**Downward***. If I only select these 3 classes, the results are not coherent. So I try it by selecting more classes:*

**No trend***,*

**Upward***,*

**Low Upward***,*

**No trend***and*

**Low Downward***. If I test the*

**Downward***k-means*

*algorithm*using both initial options and 5 classes:

* Option 1*: Predefined initial points. I test the k-means function using like five initial points the four quadrants middle point and axis intersection, for example:

* Option 2*: Random initial points. Using 5 random points, the centers set by

*k-means algorithm*seem to be worse:

The trend separation produced by predefined initial points is better than the separation produced by random initial points, because the final voronoi cells produced in *option 1* are more coherent than *option 2,* and it’s not necessary to look at the subsequent points to associate it to a movement. I associate the classes (produced by *k-means algorithm* using week_{i} return and week_{i-1} return) to week_{i} because I don’t want to use information early.

To finish the clustering, I only need to rename classes and colour it in *S&P500 Index *spot:

1. * Downward* =

*+*

**Downward**

**Low downward**2. * Upward* =

*+*

**Upward**

**Low Upward**Trends separation doesn’t look perfect, and the trends themselves are short. I need to keep in mind that I’m using only two consecutive returns in clustering. It’s therefore neccesary to leave you with this next question:

*Could I get better results if the dimension (weekly returns before present) is increased in clustering?*

## related posts

### add a comment

[…] Returns clustering with K-means algorithm [Quant Dare] […]

Why k-means? Why not a moving average?

Hi Shyam!

Thanks for reading this post, I hope you liked it.

How to apply the k-means algorithm in financial time series is the main objetive in this post. For that reason, I only show how to define the trend separation using this algorithm. However, moving average is another technique that you could use to do it. It could be a good topic to write about.

[…] Returns clustering with K-means algorithm [Quant Dare] Do you know how a fireman and the direcion of a financial time series are related? If your answer is no, youre reading the right post. Voronoi diagram Suppose that you are a worker in an emergency center in a city and your job is to tell the pilots of firefighter helicopters to take off. You receive an emergency call because there is a point of the city on fire and a helicopter is necessary to […]

Hi Mike:

I’m so grateful for the reference in your blog. I hope that all the readers like it.

See you soon in a new post.