post list
QuantDare
categories

### ogonzalez

artificial intelligence

## Random forest: many is better than one

### xristica

artificial intelligence

## Non-parametric Estimation

### T. Fuertes

artificial intelligence

## Classification trees in MATLAB

### xristica

artificial intelligence

## Applying Genetic Algorithms to define a Trading System

### aparra

artificial intelligence

## Graph theory: connections in the market

### T. Fuertes

artificial intelligence

## Data Cleansing & Data Transformation

### psanchezcri

artificial intelligence

## Learning with kernels: an introductory approach

### ogonzalez

artificial intelligence

## SVM versus a monkey. Make your bets.

### P. López

artificial intelligence

## Clustering: “Two’s company, three’s a crowd”

### libesa

artificial intelligence

## Euro Stoxx Strategy with Machine Learning

### fjrodriguez2

artificial intelligence

## Hierarchical clustering, using it to invest

### T. Fuertes

artificial intelligence

## Markov Switching Regimes say… bear or bullish?

### mplanaslasa

artificial intelligence

## “K-Means never fails”, they said…

### fjrodriguez2

artificial intelligence

## What is the difference between Bagging and Boosting?

### xristica

artificial intelligence

## Outliers: Looking For A Needle In A Haystack

### T. Fuertes

artificial intelligence

## Machine Learning: A Brief Breakdown

### libesa

artificial intelligence

## Stock classification with ISOMAP

### j3

artificial intelligence

## Sir Bayes: all but not naïve!

### mplanaslasa

artificial intelligence

## Returns clustering with k-Means algorithm

### psanchezcri

artificial intelligence

## Confusion matrix & MCC statistic

### mplanaslasa

artificial intelligence

## Reproducing the S&P500 by clustering

### fuzzyperson

artificial intelligence

## Random forest vs Simple tree

### xristica

artificial intelligence

## Clasificando el mercado mediante árboles de decisión

### xristica

artificial intelligence

## Árboles de clasificación en Matlab

### xristica

artificial intelligence

## Redes Neuronales II

### alarije

artificial intelligence

## Análisis de Componentes Principales

### j3

artificial intelligence

## Vecinos cercanos en una serie temporal

### alarije

artificial intelligence

## Caso Práctico: Multidimensional Scaling

### rcobo

There are some things impossible to quantify. How delicious are your mom’s cookies or how exciting is to train a neural network, for example. But financial markets are made of numbers – among other things. They should be measurable, quantifiable. Nobody said it was easy. But we dare.

# Visualizing Fixed Income ETFs with T-SNE

### j3

#### 07/07/2016

In recent articles, we talked about PCA and ISOMAP, as techniques for dimensionality reduction. On this occasion, we put the focus on T-SNE, in relation with visualisation and understanding of multidimensional datasets in a low dimension space, where the human eye can find patterns easily.

T-SNE was developed in 2008 by Laurens van der Maaten and Geoffrey Hinton. It comprises of two main stages:

1. Stage One: t-SNE constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects have a high probability of being picked, whilst dissimilar points have an infinitesimal probability of being picked.
2. Stage Two: t-SNE defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the Kullback–Leibler divergence between the two distributions with respect to the locations of the points on the map.

In this google talk, Laurens van der Maaten explains how the algorithm works, and he compares with PCA and ISOMAP. He gives a clear example where he tries to group handwritten numbers coded in an image, like in the photo on the right:

Each color, in the picture below, represents one of the numbers, between 0 to 9. With PCA and ISOMAP you can see some groups like orange (number 1) or the red (number 0), are clearer than others, but with T-SNE the differentiation is amazing. Is important to realise that the algorithm only sees images of numbers. The colours are added afterwards to validate the response.

## So how can I apply this to finance?

I have 67 ETFs, only Fixed Income from North America in Dollars, and I want to plot the ETFs by the correlation between them. I calculate it in a common period of 5 years to have a dataset with 67 observations by 67 features and 7 different Fixed Income asset types.

library(quantmod)
library(RDRToolbox)
library(tsne)

tickers&lt;-c("IEF",  "SHY", "TLT", 	"TFI", 	"AGG",	"TIP",	"MUB", 	"HYG",	"GBF",	"CSJ",#10
"TLH",	"IEI", "INY",	"PZA",	"AGZ",	"CIU",	"GVI",	"MBB", "PHB",	"BSV",
"EDV",	"IPE",	"JNK",	"CXA",	"LWC", "TLO", "VGIT", "VGSH",	"VMBS",	"CLY",
"CMF",	"NYF", "SUB",	"BAB",	"VCIT",	"VCSH",	"CPI",	"US13.PA",	"US10.PA",	"US57.PA",
"US1.PA",	"US3.PA",	"US7.PA",	"SMB",	"SMMU",	"STIP","TUZ",	"CSBGU7.MI",	"IDTM.L",
"XUT3.L",	"XUTD.L",	"XUIT.L",	"ITPS.MI",	"IBTS.MI",	"HYD",	"HYLD",	"MUNI",	"ITM", "MLN",
"CORP",	"STPZ",	"LTPZ",	"ZROZ",	"UDN",	"CRED",	"MINT",	"SCHP")

type&lt;-c('Govern','Govern','Govern',	'Govern',	'Aggreg',	'Govern',	'Govern',	'High Yield',	'LongT',	'Aggreg',	'Govern',	'Govern',
'Govern',	'Govern',	'Aggreg',	'Aggreg',	'Govern',	'Aggreg',	'High Yield',	'Short-Med T', 'Govern', 'Aggreg', 'High Yield', 'Govern',
'LongT',	'LongT', 'LongT',	'Govern',	'LongT', 'LongT', 'Govern', 'Govern',	'Govern',	'LongT',	'Corp',	'Corp',	'Short-Med T',	'LongT',
'LongT',	'LongT',	'Govern',	'Govern',	'Govern',	'Short-Med T',	'Short-Med T',	'Short-Med T',	'Short-Med T',
'Short-Med T',	'LongT',	'Govern', 'Govern',	'Govern',	'Short-Med T', 'Short-Med T',	'Govern',	'High Yield',	'LongT',	'Govern',

typeId&lt;-c(1,1,1,1,5,1,1,2,4,5,1,1,1,1,5,5,1,5,2,3,1,5,2,1,4,4,4,1,4,4,1,1,1,4,7,7,3,4,4,4,1,1,1,3,3,3,3,3,4,1,1,1,3,3,1,2,4,1,1,7,6,6,1,3,5,3,6)

datas &lt;- getSymbols(tickers, from="2011-01-01", to = "2016-01-01")
CloseReturns &lt;- do.call(merge, lapply(datas, function(x) dailyReturn(Cl(get(x)))))
CloseReturns[is.na(CloseReturns)]&lt;-0
correlation&lt;-cor(CloseReturns)

# Colors
colors = rainbow(length(unique(type)))
names(colors) = unique(type)

# PCA
dev.new()
pca_iris = princomp(1-correlation)$scores[,1:2] plot(pca_iris, t="n") text(pca_iris, labels=type, col=colors[typeId]) title("PCA") # Isomap iso &lt;- Isomap(1-correlation, dims=2, k=2, plotResiduals = TRUE) plot(iso$dim2, t="n")
text(iso\$dim2, labels=type, col=colors[typeId])
title("ISOMAP")

# TSNE
tsneM = tsne(correlation, perplexity=7, max_iter=2000)
plot(tsneM, t="n")
text(tsneM, labels=type, col=colors[typeId])
title("TSNE")



I use PCA, ISOMAP and T-SNE for a 2 dimension reduction. Are any of these algorithms able to create groups in data without knowing the type tags? I create these 3 plots:

In this case T-SNE doesn’t perform as well as in the other example. PCA puts data in a better order in relation with the type tags. Maybe because this technique is defined in such a way that the two first principals dimensions have the largest possible variance, and that’s what we are looking for.