One of the hardest and most frequent tasks for anyone in the quantitative finance world is to summarize or visualize in a simple way a vast amount of data to represent a company.
In this blog, we have covered different Machine Learning techniques that allow us to summarize information through dimensionality reduction. These techniques let us improve the performance of our models and the display of information in two or three-dimensional spaces.
Today, we are going to apply these techniques to fundamental information (yeah, you have read correctly! Fundamental Data), in order to determine if it’s a nice deal to run analysis discriminating by sectors.
In this case, the dataset will be made out of 13 fundamental ratios of companies (about a 100 of them) belonging to the Financial and Real Estate sectors, which are constituents of the S&P 500 too, from 2017 to 2019:
Just in case…
Fundamental information is the one related to the balance sheet or the income statement from a company. This information is used to analyze a company looking at its assets, liabilities and cash flows (for example measuring profitability, assets turnover, financial leverage…).
But, there is a lot of data! How can I plot it?
To visualize any possible hidden structure, we will use three techniques that have been widely explained before in previous posts in this blog: PCA (Principal Component Analysis), MDS (Multidimensional Scaling) and T-SNE (T-distributed Stochastic Neighbour Embedding).
Allow me to quickly sum up the techniques and their main focus:
PCA: linear dimensionality reduction that uses singular value decomposition (eigenvectors and eigenvalues) to simplify the information sorting them by explained variance.
MDS: is a visual representation of distances or dissimilarities between sets of data. The closer they are in the graph, the more similar they are, and vice-versa.
T-SNE: exploits similarities between features to create probabilities to minimize the Kullback-Leiber divergences between the lower-dimensional data and the original set.
It is really interesting to analyse how using the first to third principal component of each technique we are able to separate, in a very effective way, companies from both sectors.
In fact, we can observe a disgregation in the Financial sector in two groups looking at figures 1) and 3).
With this kind of analysis, we can prove the existence of a recognizable quantitative logic behind the fundamental accounts of a company and the sector in which this company operates.
So, maybe it is not a bad idea to do some quantitative analysis of fundamental information… and if you do not think so, you should try to prove your hypothesis to make sure that you are in the correct path!
See you next time!
Hi, congratulations for your post. However, I’d like to ask you: what kind of dissimilarity measure you used for the distances computation? The intuition is that one measure or another may be more mathematically convenient for the 3D projection and visualization. Cheers!