In my post Linking Impact in Divergence Attribution I explained the need to use linking algorithms in order to aggregate single-period returns. I ended my exposition by setting out the formula for adjusted returns using Andrew Frongello’s algorithms (arguably the ones with best qualities in the industry).

If you found this final expression of Frongello-adjusted attribution factors quite nasty, maybe tiring just to look at, don’t worry! I am here again to give you the best approximation with the most wonderful, short and easy to understand/remember formula.

Let us do some recap. In finance, we often find ourselves in the situation of wanting to attribute the performance of an instrument (a portfolio, a fund class…) to a set of factors. Identifying the financial sources that drive a return, whether they are positive or negative, pull or drag, is key to being able to enhance or avoid them.

## Linking Impact in Performance Attribution

Imagine that we were able to achieve this breakdown of the daily returns of (for example) a given portfolio, and we have an expression of this short:

$$

r_t = \hat{r}_t + a_t + b_t.

$$

Where the daily returns of our portfolio are expressed as the return of a benchmark plus the returns of two certain factors (for our purpose we do not need to go into what they might be).

That will do for a single period (usually a day), which is not very useful, right? Normally we would like to breakdown the portfolio’s total performance during a period of time, such as 1 year, composed of several single periods. Here comes the problem; **daily returns are not additive**, they have different bases.

$$

1 + r_t = \frac{P_t}{P_{t-1}} \Rightarrow R_{[1, T]} \neq \sum_{t=1}^T r_t \quad \text{but} \quad 1 + R_{[1, T]} = \prod_{t=1}^T (1 + r_t).

$$

As a consequence, daily factors are not additive either. As much as we could be tempted to write \(A_{[0, T]} \neq \sum_{t=1}^T a_t\), this will not lead us to our goal of having:

$$

R_{[0, T]} = \hat{R}_{[0, T]} + A_{[0, T]} + B_{[0, T]}.

$$

This is where Frongello’s first adjustment comes to our rescue. The benefits of this linking method, also known as Base-Period adjustment (for obvious reasons) are largely discussed in bibliography items [1] to [5], so I will confine myself to describe the algorithm: we are allowed to sum daily returns if we scale them with the cumulative returns up to that point.

$$

R_{[0, T]} = \sum_{t=1}^T r_t (1 + R_{[0, t-1]}) = \sum_{t=1}^T (\hat{r}_t + a_t + b_t)(1 + R_{[0, t-1]})

$$

where the total returns of the attribution factors \(r^{fa}= \{\hat{r}, a, b\}\) throughout the period are therefore

$$

R^{fa}_{[0, T]} = \sum_{t=1}^T r^{fa}_t (1 + R_{[0, t-1]}).

$$

## Linking Impact in Divergence Attribution

This is great but has a little drawback: the factor \(\sum_{t=1}^T \hat{r}_t (1 + R_{[0, t-1]})\) is not equal to the observed performance of the benchmark during the period \([0, T]\). If we have the total returns of the portfolio and the benchmark, we will probably want to break down the difference between the two into factors.

In that case we would have:

$$

R_{[0, T]} – \hat{R}_{[0, T]} = \sum_{t=1}^T (\hat{r}_t + a_t + b_t)(1 + R_{[0, t-1]}) – \sum_{t=1}^T \hat{r}_t (1 + \hat{R}_{[0, t-1]}).

$$

And note that **the terms corresponding to the benchmark will not be cancelled exactly as they are scaled by different coefficients**. The total return of the benchmark is different when observed as part of the portfolio and when observed alone.

This gives place to an additional **“linking factor”** in the divergence breakdown. Calling \(\mu_t = 1 + R_{[0, t-1]}\) and \(\hat{\mu_t} = 1 + \hat{R}_{[0, t-1]}\):

$$

R_{[0, T]} – \hat{R}_{[0, T]} = \sum_{t=1}^T\left[ \mu_t a_t + \mu_t b_t + \hat{r}_t (\mu_t – \hat{\mu}_t)\right].

$$

And as performance attribution problems call for Frongello’s First Adjustment to link single-period returns, **divergence attribution problems call for Frongello’s Second Adjustment**.

This was the main point of my post Linking Impact in Divergence Attribution, so to avoid repeating myself, I will take as my starting point the conclusion drawn in it. Namely, that this linking impact factor can be distributed among the others in a rather natural way, and that the final expression of these doubly-distorted factors is given by:

$$

\widetilde{r}^{fa}_{t} = \mu_t r^{fa}_{t}+\hat{r}_t\sum_{j=1}^{t-1} \widetilde{r}^{fa}_{j},\\

\widetilde{r}^{fa}_{1}=r_1^{fa}.

$$

The total return of each factor is \(R^{fa}_{[0, T]} = \sum_{t=1}^T \widetilde{r}^{fa}_t\) and the sum of the factors thus defined gives the divergence of the period, without the need to add any additional linking factors.

$$

R_{[0, T]} – \hat{R}_{[0, T]} = \sum_{fa}R^{fa}_{[0, T]} = \sum_{t=1}^T (\widetilde{a}_t + \widetilde{b}_t).

$$

The equation of the daily second-adjusted factor returns \(\widetilde{r}^{fa}_t\) is a recursive expression; the factor on day \(t\) is calculated using the returns of the days up to \(t-1\). This expression quite complicated and it does not have a clear interpretation like the one of the base-period adjusted returns did.

What is more, while the first-adjusted factors could be calculated with a simple matrix multiplication, which in code is an efficient operation, this computation would have to be implemented as a loop, which is quite simple, but very inefficient.

## Frongello Second Adjustment: Matrix Calculation

Here we will explain how to calculate these **second-adjusted factors as solutions of a linear system of equations**, which in code is a much faster operation.

Get ready to get your hands dirty, because now we are really going to get into the mathematics of the thing. We will have to deal with, if not difficult, at least very long formulas, but I believe the result will be worth it.

We can rewrite the equation of the second-adjusted factors placing all of our unknowns (the \(\widetilde{r}^{fa}_t\)) to the left side as follows:

$$

– \hat{r}_t\sum_{j=1}^{t-1} \widetilde{r}^{fa}_{j} + \widetilde{r}^{fa}_{t} = \mu_t r^{fa}_{t};\\

– \hat{r}_t(\widetilde{r}^{fa}_{1}+\widetilde{r}^{fa}_{2} + … + \widetilde{r}^{fa}_{t-1} ) + \widetilde{r}^{fa}_{t} = \mu_t r^{fa}_{t}

$$

And this for every \( t \in \{ 1, …, T\}\). Thus, we see the \(\widetilde{r}^{fa}_t\) as \(T\) unknowns and we have \(T\) linear equations.

$$

\begin{align*}

\left\{

\begin{matrix}

\widetilde{r}_1^{fa} & & &= &\mu_1r_1^{fa}\\

-\widehat{r}_2\widetilde{r}_1^{fa} + &\widetilde{r}_2^{fa} & &= &\mu_2r_2^{fa}\\

-\widehat{r}_3\widetilde{r}_1^{fa} -\widehat{r}_3&\widetilde{r}_2^{fa} + &\widetilde{r}_3^{fa} &= &\mu_3r_3^{fa}\\

\cdots & & &= &\cdots

\end{matrix}

\right .

\end{align*}

$$

In matrix form:

$$

\begin{align}

\begin{pmatrix}

1 & 0 & 0 & \cdots & 0\\

-\widehat{r}_2 & 1 & 0 & \cdots & 0\\

-\widehat{r}_3 & -\widehat{r}_3 & 1 & \cdots & 0\\

\vdots & \vdots & \vdots & \ddots & \vdots\\

-\widehat{r}_T & -\widehat{r}_T & -\widehat{r}_T & \cdots & 1

\end{pmatrix}

\begin{pmatrix}

\widetilde{r}_1^{fa}\\

\widetilde{r}_2^{fa}\\

\vdots\\

\widetilde{r}_T^{fa}\\

\end{pmatrix}

=

\begin{pmatrix}

\mu_1r_1^{fa}\\

\mu_2r_2^{fa}\\

\vdots\\

\mu_Tr_T^{fa}\\

\end{pmatrix}

\end{align}

$$

And just like that, we did it. This matrix system \(A \cdot x = b\) can be solved relatively easily.

We only need to calculate the inverse of the matrix \(A\):

$$

A\cdot x = b \Rightarrow x = A^{-1}\cdot b,

$$

where

$$

A =

\begin{pmatrix}

1 & 0 & 0 & \cdots & 0\\

-\widehat{r}_2 & 1 & 0 & \cdots & 0\\

-\widehat{r}_3 & -\widehat{r}_3 & 1 & \cdots & 0\\

\vdots & \vdots & \vdots & \ddots & \vdots\\

-\widehat{r}_T & -\widehat{r}_T & -\widehat{r}_T & \cdots & 1

\end{pmatrix}.

$$

Matrix \(A\) is a lower diagonal matrix with unit diagonal. There is a Theorem in mathematics which assures us that its inverse is also a lower diagonal matrix, and the elements of its diagonal are the inverses of the elements of the diagonal of \(A\), i.e. they are 1s. Bearing this in mind, it is easy to calculate \(A^{-1}\), the result is:

$$

A^{-1} = \begin{pmatrix}

1 & 0 & 0 & \cdots & 0\\

\widehat{r}_2 & 1 & 0 & \cdots & 0\\

\widehat{r}_3(1+\widehat{r}_2) & \widehat{r}_3 & 1 & \cdots & 0\\

\vdots & \vdots & \vdots & \ddots & \vdots\\

\widehat{r}_T\prod_{t=2}^{T-1}(1+\widehat{r}_t) & \widehat{r}_T\prod_{t=3}^{T-1}(1+\widehat{r}_t) & \widehat{r}_T\prod_{t=4}^{T-1}(1+\widehat{r}_t) & \cdots & 1

\end{pmatrix}.

$$

It is left to the reader to verify that, indeed, \(A \cdot A^{-1}=\mathbf{I}_T\).

$$

\left\{

\begin{matrix}

\widetilde{r}_1^{fa} &= \mu_1 r_1^{fa} & &\\

\widetilde{r}_2^{fa} &= \widehat{r}_2\mu_1 r_1^{fa} &+ \mu_2 r_2^{fa} &\\

\widetilde{r}_3^{fa} &= \widehat{r}_3(1+\widehat{r}_2)\mu_1 r_1^{fa} &+ \widehat{r}_3\mu_2 r_2^{fa} &+ \mu_3 r_3^{fa}\\

\cdots &= & \cdots &

\end{matrix}

\right .

$$

The general expression is

$$

\widetilde{r}_t^{fa} = \widehat{r}_t\prod_{i=2}^{t-1}(1+\widehat{r}_i)\mu_1r_1^{fa} + \widehat{r}_t\prod_{i=3}^{t-1}(1+\widehat{r}_i)\mu_2r_2^{fa} + \cdots + \mu_tr_t^{fa}.

$$

## Approximation

That was a quite complex expression. But, what happens if we now add up the daily factor returns to get the total return over the period \([1, T]\)? To our surprise, we are going to obtain something simpler. If we sum \(\widetilde{r}_1^{fa} + \widetilde{r}_2^{fa} + \widetilde{r}_3^{fa} + …\), and take common factor of the \(\mu_i r_i\) we arrive at:

$$

R^{fa}_{[1, T]} = \sum_{t=1}^T \widetilde{r}^{fa}_t = \widetilde{r}_1^{fa} + \widetilde{r}_2^{fa} + \widetilde{r}_3^{fa} + …\\

R^{fa}_{[1, T]} = \sum_{t=1}^T \left( \prod_{i=t+1}^T (1 + \widehat{r}_i) \right) \cdot \mu_t \cdot r^{fa}_t.

$$

That is, we have encountered the return of the factor over the period as a sum of its daily returns, where the return of day \(t\) is scaled by two coefficients:

- The product of the daily returns of the
**benchmark**, from the next day \(t+1\) to the end of the period: \(\prod_{i=t+1}^T (1 + \widehat{r}_i)\). - The product of the daily returns of the
**portfolio**, from the first day of the period to the previous day \(t-1\): \(\mu_t = 1 + R_{[0, t-1]} = \prod_{j=1}^{t-1} (1 + r_j)\).

$$

R^{fa}_{[1, T]} = \sum_{t=1}^T \left( \prod_{i=t+1}^T (1 + \widehat{r}_i) \cdot \prod_{j=1}^{t-1} (1 + r_j) \right)\cdot r^{fa}_t.

$$

Up until now, everything was exact, we did not make any assumption or approximation. However, to reach an expression even simpler and easier to interpret than this one, we must now conjecture that, generally, \(r_i, \widehat{r_i} << 1\) and

$$

1+ r_i \sim 1 +\widehat{r}

$$

is a good approximation. If we use it in our previous equation, we obtain:

$$

R^{fa}_{[0, T]} \simeq \sum_{t=1}^T \left( \prod_{j=1}^{t-1} (1 + \widehat{r}_j) \cdot \prod_{i=t+1}^T (1 + \widehat{r}_i) \right)\cdot r^{fa}_t \simeq \sum_{t=1}^T \left( \prod_{j=1}^{T} (1 + \widehat{r}_j) \right)\cdot r^{fa}_t; \\

R^{fa}_{[0, T]} \simeq (1+ \widehat{R}_{[0, T]}) \cdot \sum_{t=1}^T r^{fa}_t .

$$

But also, \(R^{fa}_{[0, T]} \simeq (1+ R_{[0, T]}) \cdot R^{fa}_{[0,T]}\), as the assumption we are making is that **the daily returns of the portfolio and the benchmark are reasonably close**, and we could have as well substituted the latter for the former.

## Interpretation

We are making two assumptions. First, that the daily returns are small, which should not be a problem for anyone in finance. And second, that the divergence we are aiming to break down is small compared with the performances of the two objects we compare.

Given that this is true, we have our wonderful equations, just concerning total returns, for the effect of linking algorithms:

$$

R^{fa}_{[0, T]} \simeq (1+ R_{[0, T]}) \cdot \sum_{t=1}^T r^{fa}_t \simeq (1+ \widehat{R}_{[0, T]}) \cdot \sum_{t=1}^T r^{fa}_t.

$$

The conclusion is the same whichever final formula we choose to take; **the contribution of an attribution factor to a divergence is magnified or reduced proportionally to the performance of the benchmark/portfolio**.

## Application: Hedged Share Class

Of course, we must test our hypotheses by calculating actual results. Here, the methodology we have developed is applied to a currency hedging problem: **the attribution of the performance divergence between a hedged share class and a reference class** to the corresponding overlay factors.

The explanation of this scenario and the description of the involved factors is itself a complex matter, and can be found, for example, in the post Hedged Share Class: from theory to practice by Ana Porras. Let us assume this knowledge.

If you have read the post, or have some familiarity with currency hedging, you would probably know that one of the factors that drive apart the return of a hedged share class from that of its currency-risk-free counterpart class, is the Interest Rate Differential.

For this exercise, I have calculated the divergence attribution of a number of hedged share class and their reference classes. In particular I have the contributions of this Interest Rate Differential (IRD) factor for each of them. This contributions have been calculated using Frongello’s first and second adjustment as explained, **so daily returns are addable and no extra linking term is present in the divergence**.

Also, although it is a * forbidden* operation, and does not have a total return interpretation, I can calculate the raw sum of the daily returns of the factor:

$$

\sum_{t=1}^T r^{IRD}_t.

$$

Now I have all the ingredients I need to check if my assumptions were correct and the equations we have arrived at are a good approximation of reality.

In the image, we can see represented the distortion introduced by Frongello adjustments in the calculation of the factor, that is, the relative difference:

$$

\frac{R^{IRD}_{[0, T]} – \sum_{t=1}^T r^{IRD}_t}{\sum_{t=1}^T r^{IRD}_t},

$$

against the total performance over the period of the reference class, \( \widehat{R}_{[0, T]}\). If we were correct in our approximations, it should be satisfied that

$$

\frac{R^{IRD}_{[0, T]} – \sum_{t=1}^T r^{IRD}_t}{\sum_{t=1}^T r^{IRD}_t} \simeq \widehat{R}_{[0, T]}.

$$

We observe that the points in the plot fall quite close to the line \(x = y\), i.e., the relative impact is indeed close to the total reference performance.

Also, as one would expect, **the larger the divergence, the worse the approximation**. This is evidenced by the fact that the points with more extreme divergences, whether positive or negative, are shifted up or down the \(x = y\) line. In the plot, we have coloured the points depending on their divergence, and it is clear that this grouping polarizes them to one or the other side of the line.

We can also plot the same variable against the total performance of the hedged share class, as we know that it also should be satisfied that:

$$

\frac{R^{IRD}_{[0, T]} – \sum_{t=1}^T r^{IRD}_t}{\sum_{t=1}^T r^{IRD}_t} \simeq R_{[0, T]}.

$$

The behaviour is similar, just this time the darker points (large positive divergence) move downwards and vice versa. This makes a lot of sense, dare you venture why?

## References

- Enrique Millán, Linking Attribution Factors.
- Andrew Frongello,
*“Linking Single Period Attribution Results”*. - Andrew Frongello,
*“Attribution Linking: Proofed and Clarified”*. - Andrew Frongello,
*“Linking Single Period Arithmetic Attribution Results”.* - Clara Díaz-Pinés, Linking Impact in Divergence Attribution.
- Ana Porras, What cannot be hedged.
- Ana Porras, Hedged Share Class: from theory to practice.
- Juan Martínez, FX Swap pricing and the mystery of Covered Interest Parity.