Can a set of weak systems turn into a single strong system?
When you’re in front of a complex classification problem, as is often the case with financial markets, different approaches may appear while searching for a solution. Although these approaches can estimate the classification, sometimes none of them are better than the rest. In this case, a reasonable choice is to keep them all, and then create a final system by integrating the pieces. This method of diversification is one of the most convenient practices: divide the decision among several systems in order to avoid putting all your eggs in one basket.
Once I have a number of estimates for the one case, how can I combine the decisions of the N sub-systems? As a quick answer, I can take the decision average and use this. But are there different ways of making the most out of my sub-systems? Of course there are!
Think outside the box!
Several classifiers with a common objective are called multiclassifiers. In Machine Learning, multiclassifiers are sets of different classifiers which make estimates and are fused together, obtaining a result that is a combination of them. Lots of terms are used to refer to multiclassifiers: multi-models, multiple classifier systems, combining classifiers, decision committee, etc. They can be divided into two main groups:
- Ensemble methods: Refers to sets of systems that combine to create a new system using the same learning technique. Bagging and Boosting are the most extended ones.
- Hybrid methods: Takes a set of different learners and combines them using new learning techniques. Stacking (or Stacked Generalization) is one of the main hybrid multiclassifiers.
How to build a multiclassifier motivated by Stacking.
Imagine that I would like to estimate the EURUSD’s trends. First of all, I turn my issue into a classification problem, so I split the price data into two types (or classes): up and down movements. Guessing every daily movement is not my intention. I only want to detect the main trends: up for trading Long (class = 1) and down for trading Short (class = 0).
I have done this split a posteriori; by which I mean that all historical data have been used to decide the classes, so it takes into account some future information. Therefore, I’m not able to assure iup or down movement at the current moment. For this reason an estimate for the today’s class is required.
For the purpose of this example I have designed three independent systems. They are three different learners using separate sets of attributes. It does not matter if you use the same learner algorithm or if they share some/all attributes; the key is that they must be different enough in order to guarantee diversification.
Every day they respond with a probability for class 1, E, and class 0, 1-E.
Then, they trade based on those probabilities: If E is above 50%, it means Long entry, more the bigger E is. If E is under 50%, it is Short entry, more the smaller E is.
These are the results of my three systems:
Their results are far from perfect, but their performances are slightly better than a random guess. (E. g., their error rates are smaller than 0.5):
It’ss clear that these three individual systems are unexceptional, but they are all I have for now.
Next, I need to find the best combination of the individual systems.
Can a set of poor players make up a dream team?
The purpose of building a multiclassifier is to obtain better predictive performance than what could be obtained from any single classifier. Let’s see if this is the case.
The method I am going to use in this example is based on the Stacking algorithm. The idea of Stacking is that the output of primary classifiers, called level 0 models, will be used as attributes for another classifier, called meta-model, to approximate the same classification problem. The meta-model is left to figure out the combining mechanism. It will be in charge of connecting the level 0 models’ replies and the real classification.
The rigorous process consists in splitting the training set into disjoint sets. Then train each level 0 learner on the whole data, excluding one set, and apply it over the excluded set. By repeating for each set, an estimate for each data is obtained for each learner. These estimates will be the attributes for training the meta-model or level 1 model. As my data was a time series, I decided to build the estimation for day d just using the set from day 1 to day d-1.
Which model does this work with?
The meta-model can be a classification tree, a random forest, a support vector machine… Any classification learner is valid. For this example I chose to use a nearest neighbours algorithm. It means that the meta-model will estimate the class of the new data finding similar configurations of the level 0 classifications in past data, and then will assign the class of these similar situations.
Let’s see how good my dream team result is…
As you can see above, the meta-model outperformed the three initial models. The result is much better than using a simple average. Maybe it’s still not enough to consider the EURUSD’s classification problem as solved, but it’s clearly a worthy step.
This is just one example of the huge amount of available multiclassifiers. They can help you not only to join your partial solutions into a unique answer by means of a modern and original technique, but to create a real dream team. There’s also an important margin for improvement in the way that the individual pieces are integrated into a single system.
So, next time you need to combine, spend more than a moment working on the possibilities. Avoid the traditional average by force of habit and explore more complex methods. They may surprise you with extra performance.