All

Understanding the shape of data (II)

jdporras

03/04/2019

No Comments

Topology could be used to gain insight on the shape of our data, as we explained in our last post. Today, we will put this theory into practice by analyzing the 2008 financial crisis.

Persistence diagrams

We will start by giving an equivalent representation of the persistence barcode that we saw previously. We are talking about the persistence diagram. First, let’s recall how the persistence barcode looked like:

persistence barcode, persistent homology

Persistence Barcode: The blue lines represent the lifespan of 1-dimensional holes. For instance, hole number 2 is born when t=0.75 and dies in t=0.83. Red line represents the lifespan of the 2-dimensional hole, i.e. a cavity.

The persistence diagram gives us exactly the same information as the barcode, but represented in a different way. For each hole \(h_i\), we take its time of birth, \(t_{i,0}\), and its time of death, \(t_{i,1}\). Then, we plot the point \((t_{i,0}, t_{i,1})\) in the diagram. As simple as that. All the points will lie above the identity line, \(y=x\), and the further from this line the point is, the more persistent the hole is. For this same persistence barcode, its associated persistence diagram would look like this:

persistence diagram, persistent homology

Persistence Diagram

Now, let’s go for the example.

Evolution of the persistence diagrams during the 2008 financial crisis.

We have taken several mutual funds that represent all the regions of the world and all the risk families. Our purpose is to see how the relation between these assets change during the crisis. Remember that we needed some kind of similarity or distance measure to build our  Čech complex? Taking 1 – the correlation matrix will be just perfect. This way, assets that are highly correlated will be closer and will form a simplicial complex sooner than those which are less correlated.

For the correlations, we have used the weekly returns of the assets, with a 1 year window. We have only calculated Homology groups up to the second,  \(H_2\), as calculating the higher groups is computationally very costly. Just to remind you, \(H_0\) will give us the information of how long does an asset remain isolated,  \(H_1\) will show us the persistence of the 1-dimensional holes, and  \(H_2\), will tell us about the persistence of 2-dimensional holes.

The state of the market

So first, let’s keep in mind the following picture. It shows the evolution of the MSCI World index between july 2006 and january 2010. This period captures all the phases of the 2008 crisis and gives us a glimpse of what happened before and after.

msci world, 2008 crisis

Evolution of the MSCI World Index between June 2006 and January 2010. Notice that in July 2007 we reach the peak, and in March 2009 we reach the bottom. In red, the times when we will calculate the persistence diagrams.

The persistence diagrams

Now, let’s see how the persistence diagrams look like.

This is our persistence diagram before reaching the peak:

persistence diagram

Persistence diagram in January 2007.

Observe how almost all the 1 and 2 dimensional holes lie next to the diagonal, but some 1D and 2D holes are a bit far from it, which means that assets are forming some loops and cavities. Also, loops are more visible than cavities.

In June, when the peak is being reached, the 1-dimensional holes seem to be much weaker:

persistence diagram

June 2007. Almost all the significant loops have disappeared. The cavities all lie very close to the identity line.

Next, we will see some snapshots of the bearish market between July 2007 and March 2009.

persistence diagram

Persistence diagrams in November 2007 and October 2008.

During this period, all the points lie close to the diagonal, which means there are no significant loops or cavities.

Finally, look how much the diagram changes just a few weeks after the bottom (March 2009) has been reached:

persistence diagram

March 2009, before the bottom, and April 2009, after the bottom.

As you can see,  in just a few weeks time, many 1D holes have emerged, as well as some little 2D holes.

In conclusion

By applying the Topological Data Analysis, we have learnt a great deal from our data. Using the correlations between assets as a similarity measure, and focusing on the 2008 crisis period, made us see the following:

  • Before the peak of the bubble was reached, the assets formed some loops.
  • These loops grew weaker as we approached the peak.
  • During the bear market period, no significant loops where found.
  • Just as the assets started bouncing back, new loops emerged abruptly.
  • Outside the bear market period, there were some small cavities, that disappeared during the bear market.

We could expect no holes to form during the bearish market. As all the assets fell, the correlations grew higher, and the assets were more concentrated according to the similarity measure, so there was little chance for persistent holes to form.

What is surprising is how rapidly the loops got formed, just after a few weeks after the recovery had started, and how the cavities started to get separated from the identity line.

Seeing how gradually the persistence diagrams change is possible by the use of the so called bottleneck distance, but we will talk about it next time.

Before finishing, I’d like to note two things: the use of the correlations of assets is not mandatory. For example, you could use fundamental data of stocks, as proposed in this post. Second, we have only tried the technique for the 2008 crisis. We would have to see if we would get to the same conclusions during other financial crisis.

And that is all for today. In case you want to know, I’ve used the Scikit-TDA package for the persistence diagrams, which works for Python, it’s easy to use, and has a lot of useful tools. I recommend you to play with it, and then you can tell us how did it go.

See you next time!