post list
QuantDare
categories
artificial intelligence

Neural Networks

alarije

artificial intelligence

Stochastic portfolio theory, revisited!

P. López

artificial intelligence

“Past performance is no guarantee of future results”, but helps a bit

ogonzalez

artificial intelligence

K-Means in investment solutions: fact or fiction

T. Fuertes

artificial intelligence

What is the difference between Artificial Intelligence and Machine Learning?

ogonzalez

artificial intelligence

Random forest: many are better than one

xristica

artificial intelligence

Classification trees in MATLAB

xristica

artificial intelligence

Applying Genetic Algorithms to define a Trading System

aparra

artificial intelligence

Graph theory: connections in the market

T. Fuertes

artificial intelligence

Data Cleansing & Data Transformation

psanchezcri

artificial intelligence

Learning with kernels: an introductory approach

ogonzalez

artificial intelligence

SVM versus a monkey. Make your bets.

P. López

artificial intelligence

Clustering: “Two’s company, three’s a crowd”

libesa

artificial intelligence

Euro Stoxx Strategy with Machine Learning

fjrodriguez2

artificial intelligence

Visualizing Fixed Income ETFs with T-SNE

j3

artificial intelligence

Hierarchical clustering, using it to invest

T. Fuertes

artificial intelligence

Markov Switching Regimes say… bear or bullish?

mplanaslasa

artificial intelligence

“K-Means never fails”, they said…

fjrodriguez2

artificial intelligence

What is the difference between Bagging and Boosting?

xristica

artificial intelligence

Outliers: Looking For A Needle In A Haystack

T. Fuertes

artificial intelligence

Machine Learning: A Brief Breakdown

libesa

artificial intelligence

Stock classification with ISOMAP

j3

artificial intelligence

Sir Bayes: all but not naïve!

mplanaslasa

artificial intelligence

Returns clustering with k-Means algorithm

psanchezcri

artificial intelligence

Confusion matrix & MCC statistic

mplanaslasa

artificial intelligence

Reproducing the S&P500 by clustering

fuzzyperson

artificial intelligence

Random forest vs Simple tree

xristica

artificial intelligence

Clasificando el mercado mediante árboles de decisión

xristica

artificial intelligence

Árboles de clasificación en Matlab

xristica

artificial intelligence

Redes Neuronales II

alarije

artificial intelligence

Análisis de Componentes Principales

j3

artificial intelligence

Vecinos cercanos en una serie temporal

xristica

artificial intelligence

Redes Neuronales

alarije

artificial intelligence

Caso Práctico: Multidimensional Scaling

rcobo

artificial intelligence

Non-parametric Estimation

T. Fuertes

01/02/2017

No Comments
Non-parametric Estimation

How can we predict future returns of a series? Many series contain enough information in their own past data to predict the next value, but how much information is useable, and which data points are the best for the prediction?

Is it enough to use only the most recent data points? How much information can we extract from past data?

Once we have answered all these questions we should think about which model best fits our data, and then estimate the parameters in order to predict. In other words, we add more and more steps before reaching what we are really interested in: the prediction. Why not avoid including the parameter estimation in this already complex task of predicting values? Furthermore, in estimating the model parameters we run the risk of overfitting according to the data history.

So here we take a look at an alternative: non-parametric estimation.

The general idea of non-parametric estimation is to use the past information that most closely resembles the present without establishing any concrete prediction model.

Reference block

Let’s suppose that we have a series with autoregression of order 3. That is, the three observations previous to today contain enough information to predict the current one. We name this set of observations the reference block. Now, we analyse how similar this block is compared to all the blocks with 3 observations in the whole history. We assume that the more similar the blocks, the more similar the observations following the blocks.

How do we decide how similar the blocks are?

This is the most important decision. If we require them to be very similar, it will be difficult to find blocks like the reference, and if we are too lenient all the blocks will be considered alike. We need to find a balance.

To establish a balance we use a parameter known as the smoothing window, h. This provides bands around the reference block. If one of the blocks being compared falls within these bands it will be taken into consideration in the prediction and if not, it will be discarded. In the previous example, we see there are two blocks that are similar to the reference block, both of them contained within the bands created by the smoothing window. Therefore, these two blocks will be used in the prediction of the next value in the series.

This parameter choice is very important in the estimation and there are many methods to achieve them: cross validation or empirical methods. There exists quite a good approximation:Empirical methods

h=σn·n-1/(k+4)   

where σn is the standard deviation of the series, and n is the number of data points.

How do we predict?

Once we determine how to consider if a block is similar to the reference, we focus on the value that follows each block and calculate a weighted average. The more similar the blocks, the more weight is given to the following observation. The weights are determined with a smoothing function or kernel. There are many types of functions (Uniform, Gaussian, Epanechnikov, Dirichlet,…) and the differences in the predictions are minimal if we change the function. We choose a Gaussian kernel since it’s the most popular:

We need a k-dimensional version of the function, where k is the number of lags established for our series; this is the block size. The more similar the blocks are to the reference, the greater the weight assigned to the data.

Gaussian

Practical Example

We use the weekly returns of the gold spot series. Using returns ensures that our data is stationary, with mean and variance constant in time. We use 12 lags; hence, our blocks have 12 dimensions.

Using the aforementioned approximation we achieve a value of h=0.0176 for our smoothing window.

We consider the necessity to use the whole sample available until each moment, or if on the contrary, there exists an optimum number of past blocks on which to base our prediction. The predictions made are not very good (around 50% of the return signs are correctly estimated), but we look for an optimum number of periods using the percentage of correctly estimated return signs and the prediction error (correct sign should be as large as possible and the error as small as possible).

optimum number of periodsThe best balance can be found with 261 blocks, with 52% correctly predicted signs and an error less than 0.0009. The estimations achieved with 261 blocks and with the whole previous sample are very similar, but we can see a slightly better relationship with blocks of 261 days.

Prediction

 ¡Léeme en español!

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Email this to someone

add a comment

wpDiscuz