When you’re working with a huge data set, knowing how to deal with them is as important as representing them in a proper way. In the machine learning world, you can find a great many ways to represent data, and most of them are visually fantastic. In this post, we want to introduce you the graph theory as a way of representation.
This technique is useful to represent a lot of points and their relationships. These can be based on different rules; for example, from statistic data which relates each point to the others directly, to a combination of some statistics that are reduced to a single value.
In the graph theory you should distinguish the following parts:
- Node: Each point to be represented.
- Link: Represents the union between nodes. The stronger this union, the bigger the link.
The links can be represented in different ways, depending on what they show. If the union has only one way, an arrow is drawn to indicate the relationship’s direction. In that case, it is called a directed graph. Moreover, the intensity of the union can be represented by the line thickness.
A toy example
To illustrate the graph theory, we will start with a simple example. We get six MSCI indeces: Asia, Europe, World, Emerging, Japan and U.S.A. Then, we calculate the correlation between all of them to set the relationship between them. Note that in this example the union follows a rule based on a single statistic.
We represent these indeces and their links by using the graph theory. Note that this is not a directed graph because the union has both directions. That is, the correlation links one point with another and vice versa. To decide the union’s strength, we focus on the calculated correlation (from 2005 until October of 2016).
The next interactive graph shows the six indeces and the strength between their unions. If you hover your mouse over the nodes, you will see their index name. In addition, if you click on the nodes and drag them, you can see how the indeces move keeping the links.
More data in a graph
When you have few data, the graph theory is not useful, or at least, you do not take advantage of all its power.
Now we take the S&P500 index and its components. We will use the graph theory to show how the relationship between the components has changed through years.
To simplify the illustration, we choose only 100 components out of 500 each year. This subset of assets is selected by taking into account the market cap. In other words, we choose the 100 assets with the largest market capitalisation. If the correlation is over 0.6, we link the assets, and if it is lower, there is no union between assets.
As you can see in the following interactive graphs, the relationship between the most important companies in the U.S.A. has changed in the last years.
Relationship in 2005
Relationship in 2015
Although we focus on the visualitation part of the graph theory, it can be used in finance in order to analyse how strong relationships between variables are. Moreover, it also allows us to know how the relationship sits between the assets.