One common problem when looking at financial data is the enormous number of dimensions we have to deal with. For instance, if we are looking at data from the S&P 500® index, we will have around 500 dimensions to work with! If we have enough computing power, we will be able to process so much data, but that will not always be the case. Sometimes, we need to reduce the dimensionality of the data.
Another problem that arises from having so many dimensions is noise. The data will have useful information that we can extract from it, but it will also pack some random variance. Thus, we will have to find a way to keep most of the information, but eliminate some of the noise.
There are many algorithms to achieve just that. In QuantDare we have seen some of them, such as autoencoders, and today we will explain the Principal Components Analysis (PCA). Using this technique, we will reduce the number of dimensions, while maintaining the maximum possible variance of the original data.
The algorithm
To begin, let’s assume we have a data matrix \( x \), of dimensions \( m * n \), where \( m \) is the number of observations (in our S&P500 example, the number of days observed), and \( n \) is the number of features or dimensions (such as the 500 components of our index).
The first thing we need to do is to calculate the covariance matrix of \( x \), which we shall call \( M \). It will have dimensions \( n * n \). Once we have \( M \), we compute its eigenvalues and associated eigenvectors (\( \lambda _i \) and \( v _i \), respectively). See the Wikipedia page about these two concepts if you are unfamiliar with them.
The next step is sorting the eigenvectors, in decreasing order of their associated eigenvalues, and selecting \( k \) eigenvectors. With these vectors, we form an eigenvector matrix (\( W \)), which of course will have a dimension of \( k * n \).
The only thing left for us to do is to transform the original data to reduce its dimensionality, getting a transformed data (\( y \)). To do so, we just have to multiply:
\( y = x * W’ \)
As we can see, the resulting matrix will have a dimension of \( m * k \).
I can hear you say… Wait! The algorithm is very straightforward, but there is a parameter which we must choose, \( k \), the number eigenvectors to select. How do we choose a value for this parameter?
Well, it depends on how much information you are willing to sacrifice, as opposed to how much reduction you need. But there is a metric that will help you, the explained variance, expressed as a percentage (\( V_e \)). Simply put, this metric tells you how much of the original variance or \( x \) will be kept in \( y \).
For each eigenvector (\( v_i \)) added, its explained variance increment (\( \Delta V_e(v_i) \)) will be:
\( \Delta V_e(v_i) = \frac {\lambda _ i} {\sum _ j \lambda _ j} \)
Therefore, to know the whole explained variance, we take the explained variance of each of the eigenvectors that make up our eigenvector matrix and add them up.
An example
Let’s see how the algorithm works with an example, Let’s define our data as:
\( x = \begin{pmatrix} 4 & 9 & 1 & 0 \\ 4 & 3 & 4 & 0 \\ 2 & 1 & 3 & 7 \\ 1 & 1 & 7 & 4 \\ 3 & 2 & 9 & 8 \end{pmatrix} \)
Our covariance matrix (\( K_{x_i x_j} \)) will be:
\( K_{x_i x_j} = \begin{pmatrix} 1.7 & 3.05 & -1.8 & -2.8 \\ 3.05 & 11.2 & -6.95 & -8.45 \\ -1.8 & -6.95 & 10.2 & 7.45 \\ -2.8 & -8.45 & 7.45 & 14.2 \end{pmatrix} \)
And, once we calculate the eigenvalues and eigenvectors of the covariance matrix, we get (results already ordered by descending eigenvalue):
\( \lambda_1 \approx 28.08 \\
\lambda_4 \approx 4.6 \\
\lambda_3 \approx 3.85 \\
\lambda_2 \approx 0.77 \)
\(
v_1 \approx \begin{bmatrix} -0.17 & -0.96 & -0.24 & -0.02 \end{bmatrix} \\
v_4 \approx \begin{bmatrix} 0.64 & -0.07 & -0.24 & 0.73 \end{bmatrix} \\
v_3 \approx \begin{bmatrix} 0.5 & 0.07 & -0.59 & -0.63 \end{bmatrix} \\
v_2 \approx \begin{bmatrix} -0.56 & 0.27 & -0.73 & 0.28 \end{bmatrix} \)
Great! Now, for selecting the number of components, let’s calculate how much explained variance each eigenvector has, using the above mentioned technique using the eigenvalues.
\( \Delta V_e(v_1) \approx 0.753 \\
\Delta V_e(v_4) \approx 0.123 \\
\Delta V_e(v_1) \approx 0.103 \\
\Delta V_e(v_1) \approx 0.021 \)
Now, let’s assume that, due to the nature of the problem we are studying, we consider it appropriate to keep at least 80% of the variance. In that case, we will use \( v_1 \) and \( v_4 \) to form our eigenvector matrix \( W \).
\( W = \begin{pmatrix} -0.17 & -0.96 & -0.24 & -0.02 \\ 0.64 & -0.07 & -0.24 & 0.73 \end{pmatrix} \)
And, transforming our original data, we have that the new, “compressed” data will be:
\( y = \begin{pmatrix} -9.52 & 1.72 \\ -4.48 & 1.42 \\ -2.14 & 5.59 \\ -2.86 & 1.82 \\ -4.7 & 5.47 \end{pmatrix} \)
Conclusions
As we have seen, PCA is a fairly easy algorithm to reduce the dimensionality of a dataset, while keeping the maximum variance of the original data. Once you have your “compressed” dataset, it will be easier to work with. We hope you found this post helpful. See you next time!