Do you know how a fireman and the direction of a financial time series are related? If your answer is no, you’re reading the right post.
An introduction to k-Means: Voronoi diagram
Suppose that you a work at an emergency center, and your job is to tell the pilots of firefighter helicopters to take off. You receive an emergency call because there’s a point of the city on fire and a helicopter is necessary to put it out. You need to choose which pilot has to do this work. It’s obvious that the farther helicopters (grey helicopters) will arrive later than the closer helicopters (red helicopters), but you don’t know exactly which is the closest.
Georgy Voronoi (a mathematician born in the Russian Empire in 1868), defined the Voronoi diagram (also called Thiessen polygons or Dirichlet Tesselation in honor of Alfred Thiessen and Gustav Lejeune Dirichlet) to find the answer to this kind of problems. It associates all the helicopters to a polygonal cell (called Voronoi cell or Voronoi region), where all points included are closer to one helicopter than the others. Using the Euclidean distance, if you keep your eyes on a particular helicopter (\(h_{i}\)), for each pair of helicopters (\(h_{i}\), \(h_{j}\)) \(i \neq j\), the points set which is closer to \(h_{i}\) than \(h_{j}\) is defined as:
\(H_{i}^{j} = \{x \in X \; | \; d\left(x, h_{i}\right) \leq d\left(x, h_{j}\right), \; i \neq j \}\)
\(h_{i} \in H_{i}^{j}; \; h_{j} \notin H_{i}^{j}\)
\(h_{i}\) Voronoi cell is the intersection of all half-planes where \(h_{i}\) is inside:
\(vor\left(h_{i}\right) = \bigcap\limits_{j=1;j \neq i}^{n} H_{i}^{j}\)
If you calculate all Voronoi cells (for each helicopter) you could discover which helicopter is the closest:
You can find a lot of Voronoi diagrams in nature like:
- Giraffe skin:
- Wings of a dragonfly:
- Dry desert and more:
k-Means algorithm in finance
The Voronoi diagram is used in some machine learning techniques like clustering. In this post, you can learn how the k-means algorithm works (a clustering algorithm).
Given a n-dimensional points set (two-dimensional points set in our example), you need to define the number of classes (k classes) that you want to get. Using both things, the algorithm makes the following steps:
1. At first, there are two options to set initial points:
1.1. Choose \(k\) (two-dimensional) points.
1.2. Generate \(k\) random points.
2. Calculate Voronoi cells for the initial points.
3. Calculate midpoint of all points included in each region.
4. Calculate Voronoi cells for the midpoints.
5. Repeat steps 3 and 4 while midpoints are changing.
K-means algorithm could be applied to a financial time series. In finance it’s really important to define which kind of returns and trends are in a time serie. As a first approach, using weekly percentage returns of a financial time serie (like Standard & Poor’s 500 Index), you could show two-dimensional scatter plot (weekly returns vs shift one week weekly returns) using the following matrix (x axis is the first column and y axis is the second column):
\( \left( \begin{array}{cc} Week_{2} & Week_{1} \\ Week_{3} & Week_{2} \\ \cdots & \cdots \\ Week_{i} & Week_{i-1} \\ \cdots & \cdots \\ Week_{n} & Week_{n-1} \end{array} \right) \)
At this point you need to look how many classes you must get. My objective is to cluster weekly returns in three kinds of return movements called Upward, Downward and No trend. If I only select these 3 classes, the results are not coherent. So I try it by selecting more classes: Upward, Low Upward, No trend, Low Downward and Downward. If I test the k-means algorithm using both initial options and 5 classes:
Option 1: Predefined initial points. I test the k-means function using like five initial points the four quadrants middle point and axis intersection, for example:
\( \begin{array}{cccc} + & + & \longrightarrow & \text{Upward} \\ – & + & \longrightarrow & \text{Low upward} \\ + & – & \longrightarrow & \text{Low downward} \\ – & – & \longrightarrow & \text{Downward} \\ 0 & 0 & \longrightarrow & \text{No trend} \\\end{array}\)
Option 2: Random initial points. Using 5 random points, the centers set by k-means algorithm seem to be worse:
The trend separation produced by predefined initial points is better than the separation produced by random initial points, because the final voronoi cells produced in option 1 are more coherent than option 2, and it’s not necessary to look at the subsequent points to associate it to a movement. I associate the classes (produced by k-means algorithm using weeki return and weeki-1 return) to weeki because I don’t want to use information early.
To finish the clustering, I only need to rename classes and colour it in S&P500 Index spot:
1. Downward = Downward + Low downward
2. Upward = Upward + Low Upward
Trends separation doesn’t look perfect, and the trends themselves are short. I need to keep in mind that I’m using only two consecutive returns in clustering. It’s therefore neccesary to leave you with this next question:
Could I get better results if the dimension (weekly returns before present) is increased in clustering?
related posts
[…] Returns clustering with K-means algorithm [Quant Dare] […]
Why k-means? Why not a moving average?
Hi Shyam!
Thanks for reading this post, I hope you liked it.
How to apply the k-means algorithm in financial time series is the main objetive in this post. For that reason, I only show how to define the trend separation using this algorithm. However, moving average is another technique that you could use to do it. It could be a good topic to write about.
[…] Returns clustering with K-means algorithm [Quant Dare] Do you know how a fireman and the direcion of a financial time series are related? If your answer is no, youre reading the right post. Voronoi diagram Suppose that you are a worker in an emergency center in a city and your job is to tell the pilots of firefighter helicopters to take off. You receive an emergency call because there is a point of the city on fire and a helicopter is necessary to […]
Hi Mike:
I’m so grateful for the reference in your blog. I hope that all the readers like it.
See you soon in a new post.