**Random Forests **are widely used Machine Learning algorithms. In finance, certain **financial ratios** are used to try and predict whether or not a company will outperform the market. Can we use the random forest on financial ratios to articulate an investment strategy which outperforms a buy and hold strategy?

## Thesis on financial ratios

In previous posts, we have seen how certain financial ratios express information about the market. Now, we will try to combine both approaches: we will extract information about stocks using some ratios, and we will try to outperform the market using this information.

Some financial ratios seem to have predictive power over the market. For instance, the Cyclically Adjusted Price-to-Earnings (CAPE) ratio is historically accurate assessing whether or not a market is overvalued. A high CAPE ratio is correlated with low returns, as it signals that a market is overvalued. In that scenario, prices should go down relative to earnings to go back to historically sound levels.

For individual stocks, we will use different kinds of ratios. The ratios we have chosen to analyze are: Debt to equity, Return on equity, Gross margin, EBITDA to enterprise value, Dividend Yield, ROIC, Debt to Assets, Book to price and Accruals ratio. These represent a diverse group that look at different kinds of characteristics. Some look for market valuations versus some more objective kinds of valuations. Others measure levels of debt. And yet others look at the companies’ financial results.

Our investment universe will be all the components from the S&P 500. We will use data from 2005 to 2016 for our training, and from 2017 to 2021 to analyze the results.

## Algorithm: random forest

Also in previous posts, we have explained what random forests are and how they are different from random trees. We will use a forest comprised of 100 trees. We used Gini as our splitting criterion (to read on differences between Gini coefficient and entropy read this excellent post). As our target labels, we have chosen a very naive one: whether or not the stock’s return 1 month forward was higher than the market’s. We chose this level as it means that, if we could predict it perfectly, our portfolios return would be greater than the benchmark’s.

Once we have our Random Tree trained, for the first business day of each month, we let it decide in which stocks to invest. Among them, we do an **equally weighted** **allocation**. For the rest of the month, we let the weights evolve with no re-balances.

## Results

The results are disappointing. Although not terrible, the portfolio clearly underperforms the market:

If we instead look at the last two years of the train period, the picture is completely different:

We can clearly tell our model is overfitted. Even though forests avoid it better than single trees, they are still susceptible to be overfitted. And we can see that random forests on financial ratios do produce that over fitting.

## Improvements

- We have taken the ratios we have considered interesting. Instead, we should start with a bigger set of ratios, study them and select the best ones. For instance, we could see the covariance matrix to eliminate redundant ratios. We could also do some feature engineering to see which ratios have a biggest influence on the outcome. What ratios would you consider adding into the mix?
- We have taken a really naive label for training, trying to predict whether or not a stock will have a higher return than the market. Instead, we could compare other risk metrics, such as volatility, maximum drawdown or Sharpe ratio. We could even just try to predict which stocks will be in the best decile, instead of it being a comparison to the benchmark. As you can see, there are multiple targets we can choose. Some of them will get better returns, some of them will get less risky profiles. What other target labels do you consider interesting?
- Random forests have a multitude of parameters which we could use to try to avoid overfitting. These include the maximum depth of trees or the minimum samples to consider a split. We can also consider other algorithms. Are there any other algorithms or techniques you would want to try in this particular case?

Maybe next time we will dig deeper into these improvements. So stay tuned if you want to see if it is possible to outperform the market using a simple random forest on financial ratios!