post list
QuantDare
categories
artificial intelligence

Neural Networks

alarije

artificial intelligence

Stochastic portfolio theory, revisited!

P. López

artificial intelligence

“Past performance is no guarantee of future results”, but helps a bit

ogonzalez

artificial intelligence

K-Means in investment solutions: fact or fiction

T. Fuertes

artificial intelligence

What is the difference between Artificial Intelligence and Machine Learning?

ogonzalez

artificial intelligence

Random forest: many are better than one

xristica

artificial intelligence

Non-parametric Estimation

T. Fuertes

artificial intelligence

Classification trees in MATLAB

xristica

artificial intelligence

Applying Genetic Algorithms to define a Trading System

aparra

artificial intelligence

Graph theory: connections in the market

T. Fuertes

artificial intelligence

Data Cleansing & Data Transformation

psanchezcri

artificial intelligence

Learning with kernels: an introductory approach

ogonzalez

artificial intelligence

SVM versus a monkey. Make your bets.

P. López

artificial intelligence

Clustering: “Two’s company, three’s a crowd”

libesa

artificial intelligence

Euro Stoxx Strategy with Machine Learning

fjrodriguez2

artificial intelligence

Hierarchical clustering, using it to invest

T. Fuertes

artificial intelligence

Markov Switching Regimes say… bear or bullish?

mplanaslasa

artificial intelligence

“K-Means never fails”, they said…

fjrodriguez2

artificial intelligence

What is the difference between Bagging and Boosting?

xristica

artificial intelligence

Outliers: Looking For A Needle In A Haystack

T. Fuertes

artificial intelligence

Machine Learning: A Brief Breakdown

libesa

artificial intelligence

Stock classification with ISOMAP

j3

artificial intelligence

Sir Bayes: all but not naïve!

mplanaslasa

artificial intelligence

Returns clustering with k-Means algorithm

psanchezcri

artificial intelligence

Confusion matrix & MCC statistic

mplanaslasa

artificial intelligence

Reproducing the S&P500 by clustering

fuzzyperson

artificial intelligence

Random forest vs Simple tree

xristica

artificial intelligence

Clasificando el mercado mediante árboles de decisión

xristica

artificial intelligence

Árboles de clasificación en Matlab

xristica

artificial intelligence

Redes Neuronales II

alarije

artificial intelligence

Análisis de Componentes Principales

j3

artificial intelligence

Vecinos cercanos en una serie temporal

xristica

artificial intelligence

Redes Neuronales

alarije

artificial intelligence

Caso Práctico: Multidimensional Scaling

rcobo

artificial intelligence

Visualizing Fixed Income ETFs with T-SNE

j3

07/07/2016

No Comments
Visualizing Fixed Income ETFs with T-SNE

In recent articles, we talked about PCA and ISOMAP, as techniques for dimensionality reduction. On this occasion, we put the focus on T-SNE, in relation with visualisation and understanding of multidimensional datasets in a low dimension space, where the human eye can find patterns easily.

T-SNE was developed in 2008 by Laurens van der Maaten and Geoffrey Hinton. It comprises of two main stages:

  1. Stage One: t-SNE constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects have a high probability of being picked, whilst dissimilar points have an infinitesimal probability of being picked.
  2. Stage Two: t-SNE defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the Kullback–Leibler divergence between the two distributions with respect to the locations of the points on the map.

Pixels numbers

In this google talk, Laurens van der Maaten explains how the algorithm works, and he compares with PCA and ISOMAP. He gives a clear example where he tries to group handwritten numbers coded in an image, like in the photo on the right:

Each color, in the picture below, represents one of the numbers, between 0 to 9. With PCA and ISOMAP you can see some groups like orange (number 1) or the red (number 0), are clearer than others, but with T-SNE the differentiation is amazing. Is important to realise that the algorithm only sees images of numbers. The colours are added afterwards to validate the response.

T-SNE

So how can I apply this to finance?

I have 67 ETFs, only Fixed Income from North America in Dollars, and I want to plot the ETFs by the correlation between them. I calculate it in a common period of 5 years to have a dataset with 67 observations by 67 features and 7 different Fixed Income asset types.

ETFscorrelaciones

 

library(quantmod)
library(RDRToolbox)
library(tsne)

tickers<-c("IEF",  "SHY", "TLT", 	"TFI", 	"AGG",	"TIP",	"MUB", 	"HYG",	"GBF",	"CSJ",#10
"TLH",	"IEI", "INY",	"PZA",	"AGZ",	"CIU",	"GVI",	"MBB", "PHB",	"BSV",
"EDV",	"IPE",	"JNK",	"CXA",	"LWC", "TLO", "VGIT", "VGSH",	"VMBS",	"CLY", 
"CMF",	"NYF", "SUB",	"BAB",	"VCIT",	"VCSH",	"CPI",	"US13.PA",	"US10.PA",	"US57.PA",	
"US1.PA",	"US3.PA",	"US7.PA",	"SMB",	"SMMU",	"STIP","TUZ",	"CSBGU7.MI",	"IDTM.L",
"XUT3.L",	"XUTD.L",	"XUIT.L",	"ITPS.MI",	"IBTS.MI",	"HYD",	"HYLD",	"MUNI",	"ITM", "MLN",
"CORP",	"STPZ",	"LTPZ",	"ZROZ",	"UDN",	"CRED",	"MINT",	"SCHP")

type<-c('Govern','Govern','Govern',	'Govern',	'Aggreg',	'Govern',	'Govern',	'High Yield',	'LongT',	'Aggreg',	'Govern',	'Govern',
  'Govern',	'Govern',	'Aggreg',	'Aggreg',	'Govern',	'Aggreg',	'High Yield',	'Short-Med T', 'Govern', 'Aggreg', 'High Yield', 'Govern',
  'LongT',	'LongT', 'LongT',	'Govern',	'LongT', 'LongT', 'Govern', 'Govern',	'Govern',	'LongT',	'Corp',	'Corp',	'Short-Med T',	'LongT',
  'LongT',	'LongT',	'Govern',	'Govern',	'Govern',	'Short-Med T',	'Short-Med T',	'Short-Med T',	'Short-Med T',
  'Short-Med T',	'LongT',	'Govern', 'Govern',	'Govern',	'Short-Med T', 'Short-Med T',	'Govern',	'High Yield',	'LongT',	'Govern',
  'Govern',	'Corp',	'Inf Linked',	'Inf Linked', 'Govern',	'Short-Med T',	'Aggreg', 'Short-Med T',	'Inf Linked')

typeId<-c(1,1,1,1,5,1,1,2,4,5,1,1,1,1,5,5,1,5,2,3,1,5,2,1,4,4,4,1,4,4,1,1,1,4,7,7,3,4,4,4,1,1,1,3,3,3,3,3,4,1,1,1,3,3,1,2,4,1,1,7,6,6,1,3,5,3,6)


datas <- getSymbols(tickers, from="2011-01-01", to = "2016-01-01")
CloseReturns <- do.call(merge, lapply(datas, function(x) dailyReturn(Cl(get(x)))))
CloseReturns[is.na(CloseReturns)]<-0
correlation<-cor(CloseReturns)


# Colors
colors = rainbow(length(unique(type)))
names(colors) = unique(type)

# PCA
dev.new()
pca_iris = princomp(1-correlation)$scores[,1:2]
plot(pca_iris, t="n")
text(pca_iris, labels=type, col=colors[typeId])
title("PCA")

# Isomap
iso <- Isomap(1-correlation, dims=2, k=2,  plotResiduals = TRUE)
plot(iso$dim2, t="n")
text(iso$dim2, labels=type, col=colors[typeId])
title("ISOMAP")

# TSNE
tsneM = tsne(correlation, perplexity=7, max_iter=2000)
plot(tsneM, t="n")
text(tsneM, labels=type, col=colors[typeId])
title("TSNE")

I use PCA, ISOMAP and T-SNE for a 2 dimension reduction. Are any of these algorithms able to create groups in data without knowing the type tags? I create these 3 plots:

PCA ISOMAP tsne

In this case T-SNE doesn’t perform as well as in the other example. PCA puts data in a better order in relation with the type tags. Maybe because this technique is defined in such a way that the two first principals dimensions have the largest possible variance, and that’s what we are looking for.

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Email this to someone

add a comment

wpDiscuz