post list
QuantDare
categories
asset management

Playing with Prophet on Financial Time Series

rcobo

asset management

Dual Momentum Analysis

J. González

asset management

Using Multidimensional Scaling on financial time series

rcobo

asset management

Comparing ETF Sector Exposure Using Chord Diagrams

rcobo

asset management

Euro Stoxx Strategy with Machine Learning

fjrodriguez2

asset management

Hierarchical clustering, using it to invest

T. Fuertes

asset management

Lasso applied in Portfolio Management

psanchezcri

asset management

Markov Switching Regimes say… bear or bullish?

mplanaslasa

asset management

Exploring Extreme Asset Returns

rcobo

asset management

Playing around with future contracts

J. González

asset management

BETA: Upside Downside

j3

asset management

Approach to Dividend Adjustment Factor Calculation

J. González

asset management

Are Low-Volatility Stocks Expensive?

jsanchezalmaraz

asset management

Predict returns using historical patterns

fjrodriguez2

asset management

Dream team: Combining classifiers

xristica

asset management

Stock classification with ISOMAP

j3

asset management

Could the Stochastic Oscillator be a good way to earn money?

T. Fuertes

asset management

Correlation and Cointegration

j3

asset management

Momentum premium factor (II): Dual momentum

J. González

asset management

Dynamic Markowitz Efficient Frontier

plopezcasado

asset management

‘Sell in May and go away’…

jsanchezalmaraz

asset management

S&P 500 y Relative Strength Index II

Tech

asset management

Performance and correlated assets

T. Fuertes

asset management

Reproducing the S&P500 by clustering

fuzzyperson

asset management

Size Effect Anomaly

T. Fuertes

asset management

Predicting Gold using Currencies

libesa

asset management

Inverse ETFs versus short selling: a misleading equivalence

J. González

asset management

S&P 500 y Relative Strength Index

Tech

asset management

Seasonality systems

J. González

asset management

Una aproximación Risk Parity

mplanaslasa

asset management

Using Decomposition to Improve Time Series Prediction

libesa

asset management

Las cadenas de Markov

j3

asset management

Momentum premium factor sobre S&P 500

J. González

asset management

Fractales y series financieras II

Tech

asset management

El gestor vago o inteligente…

jsanchezalmaraz

asset management

¿Por qué usar rendimientos logarítmicos?

jsanchezalmaraz

asset management

Fuzzy Logic

fuzzyperson

asset management

El filtro de Kalman

mplanaslasa

asset management

Fractales y series financieras

Tech

asset management

Volatility of volatility. A new premium factor?

J. González

asset management

Random forest: many is better than one

xristica

15/02/2017

No Comments
Random forest: many is better than one

Random forest is one of the most well-known ensemble methods for good reason – it’s a substantial improvement on simple decision trees.

In this post, I’m going to explain how to build a random forest from simple decision trees, and to test how they actually improve the original algorithm.

Maybe you first need to know more about a simple tree; if that’s the case, take a look at my previous post. Furthermore, if you would rather read in Spanish, you can find the translation of the post here.

Like in any other unsupervised learning method, the starting point is a set of features or attributes, and on the other side, what we would like to explain a set of labels or classes:

Set of labels or classes

Set of labels or classes

 

What is a random forest?

Random forest is a method that combines a large number of independent trees trained over random and equally distributed subsets of the data.

 

How to build a random forest

The learning stage consists of creating many independent decision trees from slightly different input data:

  • The initial input data is randomly subsampled with replacement.

This step is what Bagging ensemble consists of. However, random forests usually include a second level of randomness; this time subsampling the features:

  • When optimising each node partition, we will only take into account a random subsample of the attributes.

Once a large number of trees have been built, around 1000 for example, the classification stage works like this:

  • All trees are evaluated independently and averaged to compute the forest estimate. The probability that a given input belongs to a given class is interpreted as the proportion of trees that classify that input as a member of that class.

 

What are the advantages of a random forest over a tree?

Stability. Random forests suffer less overfitting to a particular data set than simple trees.

 

Random forest versus Simple tree. Test 1:

We have designed two trading systems. The first system uses a classification tree and the second one uses a random forest, but both are based on the same strategy:

  • Attributes: A set of transformations of the input series.
  • Classes: For each day, it will be the sign of the next price return (i.e. binary responses): 1 if price moves up and 0 otherwise.
  • Learning stage: We will use the beginning of the time series to build the trees–3000 days in the example.
  • Classification stage: We will use the remaining years to test classifier performance. For each day in this period, the tree and the forest will return an estimate, 1 or 0, and its probability.

Our strategy will buy when the probability of the class 1 is larger than the probability of the class 0, indicating up movement in the series, and sell otherwise. We will also use the classification probability to compute the trade’s magnitude.

Let’s see what the results of these strategies are by applying them to several different financial series as “test*”:

random forest test

The result, positive or negative, is less extreme for random forest. It does not happen that the average result of a random forest is always better than a tree result, but the risk taking is always lower. That means better draw down control.

random forest test

The trees that make up the forest were trained with different yet similar datasets, different random subsamples of the original dataset. This provides the random forest with a better capacity to generalise and to perform better in new unknown situations.

 

Random forest versus Simple tree. Test 2:

Let’s do a second test. Imagine that we would like to build again the previous trees. This time, instead of using 3000 historical data points as the train set, we are going to use 3100 data points. We would expect both strategies to be similar. Although random forest behaves as expected, this is not true for the classification trees, which are very prone to overfitting.

We trained individual trees and random forests using slightly larger or smaller data sets, 2500 data to 3500 data points. Then we measured the variability of the results. In the following graphs, we show the range of the results and their standard deviation:

random forest test

It’s clear that the random forest technique is less sensitive variations in the training set.

random forest test

 

Therefore, it is not true that the random forest method is going to perform better than any classification tree.

Nevertheless, we can assure that random forest guarantee better draw down control and higher stability. These advantages are important enough to make the extra complexity worth it.

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Email this to someone

add a comment

wpDiscuz