When you are in front of a complex classification problem, often the case with financial markets, different approaches may appear while searching for a solution. These systems can estimate the classification and sometimes none of them is better than the rest.
In this case, a reasonable choice is to keep them all and then create a final system integrating the pieces. At least we would have a more diversified solution than if we had chosen only one sub-system. Diversifying is one of the most convenient practices: divide the decision among several systems in order to avoid putting all your eggs in one basket.
Now then, once I have a number of estimates for the one case, what is the final decision? How can I combine the decisions of the N sub-systems? As a quick answer I can take the average of the decisions and use this. But, are there different ways of making the most out of my sub-systems? Of course, there are! Think outside the box!
When there are several classifiers with a common objective it is called a multiclassifier. In Machine Learning multiclassifiers are sets of different classifiers which make estimates and are fused together, obtaining a result that is a combination of them. Lots of terms are used to refer to multiclassifiers: multi-models, multiple classifier systems, combining classifiers, decision committe, etc.
They can be divided into two big groups:
- Ensemble methods: Refers to sets of systems that combine to create a new system using the same learning technique. Bagging and Boosting are the most extended ones.
- Hybrid methods: Takes a set of different learners and combines them using new learning techniques. Stacking (or Stacked Generalization) is one of the main hybrid multiclassifiers.
In this post I want to show you an example of how to build a multiclassifier motivated by Stacking:
Imagine that I would like to estimate the EURUSD’s trends. First of all, I turn my issue into a classification problem, so I split the price data in two types or classes: up and down movements. Guessing every daily movement is not my intention. I only want to detect the main trends: up for trading Long (class = 1) and down for trading Short (class = 0).
I have done this split “a posteriori”, i. e., all historical data have been used to decide the classes, so it takes into account some future information. Therefore I am not able to assure if it is up or down at the current moment. For this reaso, an estimate for today’s class is required.
For the purpose of this example, I have designed three independent systems. They are three different learners using separate sets of attributes. It does not matter if you use the same learner algorithm or if they share some/all attributes; the key is that they must be different enough in order to guarantee diversification.
Every day they respond with a probability for class 1, E, and class 0, 1-E.
Then, they trade based on those probabilities: If E is above 50%, it means Long entry, more the bigger E is. If E is under 50%, it is Short entry, more the smaller E is. These are the results of my three systems:
Their results are far from perfect, but their performances are slightly better than a random guess:
In addition, there is a low correlation between the three system’s errors:
It is clear that these three individual systems are unexceptional, but they are all I have…
Next, I need to see what the best combination of the individual systems is. Can a set of poor players make up a dream team? The purpose of building a multiclassifier is to obtain better predictive performance than what could be obtained from any single classifier. Let’s see if it is our case.
The method I am going to use in this example is based on the Stacking algorithm:
The idea of Stacking is that the output of the primary classifiers, called level 0 models, will be used as attributes for another classifier, called meta-model, to approximate the same classification problem. Mainly, the meta-model will figure out the combining mechanism. It will be in charge of connecting the level 0 models’ replies and the real classification.
The rigorous process consists of splitting the training set into disjoint sets as if it were a cross-validation. Then for each level 0 learner: Train it on the whole data excluding one set and apply it over the excluded set. By repeating for each set, an estimate for each data is obtained, for each learner. These estimates will be the attributes for training the meta-model or level 1 model. As my data was a time series I decided to build the estimation for day d just using the set from day 1 to day d-1.
The meta-model can be a classification tree, a random forest, a support vector machine… Any classification learner is valid. For this example, I chose to use a nearest neighbours algorithm. It means that the meta-model will estimate the class of the new data finding similar configurations of the level 0 classifications in past data and then will assign the class of these similar situations.
Let’s see how good my dream team result is…
As you can see in the previous data the meta-model outperformed the three initial models and its result is much better than using a simple average. Maybe it is still not enough to consider the EURUSD’s classification problem as solved, but it is clear that it is a worthy step.
This is just one example of the huge amount of available multiclassifiers. They can help you not only to join your partial solutions into a unique answer by means of a modern and original technique but to create a real dream team.
There is also an important margin for improvement in the way that the individual pieces are integrated into a single system. So, next time you need to combine, spend more than a moment working on the possibilities. Avoid the traditional average by force of habit and explore more complex methods because they may surprise you with extra-performance.