post list
QuantDare
categories
artificial intelligence

“Past performance is no guarantee of future results”, but helps a bit

ogonzalez

artificial intelligence

K-Means in investment solutions: fact or fiction

T. Fuertes

artificial intelligence

What is the difference between Artificial Intelligence and Machine Learning?

ogonzalez

artificial intelligence

Random forest: many is better than one

xristica

artificial intelligence

Non-parametric Estimation

T. Fuertes

artificial intelligence

Applying Genetic Algorithms to define a Trading System

aparra

artificial intelligence

Graph theory: connections in the market

T. Fuertes

artificial intelligence

Data Cleansing & Data Transformation

psanchezcri

artificial intelligence

Learning with kernels: an introductory approach

ogonzalez

artificial intelligence

SVM versus a monkey. Make your bets.

P. López

artificial intelligence

Clustering: “Two’s company, three’s a crowd”

libesa

artificial intelligence

Euro Stoxx Strategy with Machine Learning

fjrodriguez2

artificial intelligence

Visualizing Fixed Income ETFs with T-SNE

j3

artificial intelligence

Hierarchical clustering, using it to invest

T. Fuertes

artificial intelligence

Markov Switching Regimes say… bear or bullish?

mplanaslasa

artificial intelligence

“K-Means never fails”, they said…

fjrodriguez2

artificial intelligence

What is the difference between Bagging and Boosting?

xristica

artificial intelligence

Outliers: Looking For A Needle In A Haystack

T. Fuertes

artificial intelligence

Machine Learning: A Brief Breakdown

libesa

artificial intelligence

Stock classification with ISOMAP

j3

artificial intelligence

Sir Bayes: all but not naïve!

mplanaslasa

artificial intelligence

Returns clustering with k-Means algorithm

psanchezcri

artificial intelligence

Confusion matrix & MCC statistic

mplanaslasa

artificial intelligence

Reproducing the S&P500 by clustering

fuzzyperson

artificial intelligence

Random forest vs Simple tree

xristica

artificial intelligence

Clasificando el mercado mediante árboles de decisión

xristica

artificial intelligence

Árboles de clasificación en Matlab

xristica

artificial intelligence

Redes Neuronales II

alarije

artificial intelligence

Análisis de Componentes Principales

j3

artificial intelligence

Vecinos cercanos en una serie temporal

xristica

artificial intelligence

Redes Neuronales

alarije

artificial intelligence

Caso Práctico: Multidimensional Scaling

rcobo

artificial intelligence

Classification trees in MATLAB

xristica

20/01/2017

No Comments
Classification trees in MATLAB

Classification trees are used, as the name suggests, in solving Classification problems. Here are some definitions and Matlab tips to help you dabble in this subject.

The objective of any problem of this nature is to assign an object to one of a number of specified categories or classes. Classification is a useful technique that’s widely applied in many other fields, such as pattern recognition.

A tree is essentially a set of sequential conditions and actions that relate certain factors with a result or decision. It is a supervised classification method, meaning that data are classified a priori and from this the knowledge may be extracted.

First of all, the algorithm analyses the data provided in search of patterns, and then uses the result of this analysis to define sequences and conditions in order to create the classification model.

In this post we go over the main theoretical concepts and translate them to Matlab, which includes the tools required to work with trees in its Statistics Toolbox. I hope you find it useful.

 

What are classification trees?

A decision tree is a way of representing knowledge obtained in the inductive learning process. The space is split using a set of conditions, and the resulting structure is the tree.

These conditions are created from a series of characteristics or features, the explained variables:

1

We initialise the matrix a with features in Matlab. We define the function attributes to encompass the characteristics that we want the tree to consider.

If we are working with financial series, some examples of features are volatility, the efficiency of the movements, etc.

2

Each set of features is assigned to a response or class, which it attempts to explain. This classification has been done using future information; hence the possibility to categorise historical data, but it is impossible to do so with new data:

3

According to the class type, we have classification trees with discrete classes and regression trees with continuous classes. We focus on the binary case, as this is the condition for the Matlab function we are going to use. An example of a binary class, for example, is the price moving upwards or downwards, the price moving above a certain value or not, or if the market has been lateral moving or in a trend.

We label these classes as -1 and 1 from here onwards. We define our function of classes as a vector of responses for each data in x:

4

 

What are classification trees like?

A tree is made up of nodes:

Internal nodes: Each internal node contains a question about a specific feature (node = <attribute,value>) and it provides two children, one for each possible answer, classification or decision. The questions are of the type: \(a_i \geq j\)?

End nodes (leaves): The ones that are assigned to a single class at the bottom of the tree. The complexity of the tree is established by the number of leaves.

5viewtree2

6rf3

 

How are classification trees built?

The construction of a tree is the learning stage of the method. It consists of analysing a set of available features and obtaining some logical rules adapted to the known classification; the classes that are assigned to each vector \((a_1,\ldots,a_s)\).

The construction process is recursive:

  1. All the possible partitions are analysed (attribute, value) and from them we take the one with the best separation.
  2. The optimum separation is applied.
  3. Step 1 is repeated with the children nodes, only for the ones that are not leaves.

7rf4

 

All partitions?

Yes, all of them. For each one of the features, the values of the observations are used.

The partitions are defined by using the midpoint between each two values as the cutoffs.

 

Which is the optimum separation?

The best separation is the one that divides the data into groups such that there is one dominant class (child node 1 => -1 and child node 2 => 1).

The so called impurity measures are used and, from all possible partitions, we choose the one that minimises the impurity of the two child nodes.

We use Gini diversity index, although there are many other impurity measures, amongst them we highlight entropy and information gain.

The impurity is calculated as the product of the probabilities of belonging to one class or the other.

When these probabilities are very equal it means that the separation doesn’t coincide well with the classification defined by the vector of classes. The more similar the probabilities are, more impurity exists and the greater the Gini index.

The Gini index of a node a(ij) =(attribute i, value j) is calculated by adding together the impurities of the children nodes.\(\)$$G(a_{ij})=P(a_{i}<j)\cdot G(c|a_{i}<j) + P(a_{i}\ge j)$$$$G(c|a_{ij})=P(c=1|a_{ij})\cdot P(c\neq 1|a_{ij})+P(c=-1|a_{ij})\cdot P(c\neq -1|a_{ij})=1-\sum_{c_k} P(c_{k}|a_{ij})^{2}$$A pure node has a Gini index of 0.

9rf5

In this way, the importance of the features is established. It’s possible that the algorithm keeps some of the characteristics that we had chosen out of the tree definition. We therefore interpret this feature as irrelevant, since the decision is independent of its value.

In the same way, the features in the lower levels also have less importance.

By analysing the tree structure we can infer the interest of each of the chosen explanatory variables.

 

When is a node a leaf?

If the node contains a single class, then it is pure and the process is complete. However, it’s typical to use additional stop criteria, such as when the number of data within the node is too small. In such cases, progressing with its classification isn’t deemed relevant.

Furthermore, no further divisions are allowed if the impurity of the children is not better than the impurity of the parent: if we reach a node where any possible separation leads to children nodes with too few values, the partition would not be carried out either.

10rf6

 

How should we use classification trees?

One of the greatest advantages of classification trees is that they are intuitive and easy to apply.

The application of the tree is known as the classification stage, and consists of assigning the class corresponding to a new set of features, independent from the learning set.

Once created, it’s straight forward to find the unknown class of a new value x with characteristics \((a_1,\ldots,a_s)\). One simply has to answer the posed questions at each node and follow the path mapped out by our tree, until the node leaf is reached. Class c is the predicted, most common class for the leaf to which the value x belongs.

11rf7

The percentage of data with the most common class in the leaf for x is the accuracy of the classification.

The result of the model classification provided by the tree is not merely a decision, but it is also its own certainty, since the probabilities of all the possible alternatives are also available:

12rf8

¡Léeme en español!

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Email this to someone

add a comment

wpDiscuz