How can we predict future returns of a series? Many series contain enough information in their own past data to predict the next value, but **how much information is useable,** and which data points are the best for the prediction?

Is it enough to use only the most recent data points? How much information can we extract from past data?

Once we have answered all these questions we should think about **which model best fits our data**, and then estimate the parameters in order to predict. In other words, we add more and more steps before reaching what we are really interested in: the prediction. Why not avoid including the parameter estimation in this already complex task of predicting values? Furthermore, in estimating the model parameters we run the risk of **overfitting** according to the data history.

So here we take a look at an alternative: **non-parametric estimation**.

The general idea of non-parametric estimation is to use the past information that most closely resembles the present without establishing any concrete prediction model.

Let’s suppose that we have a series with autoregression of order 3. That is, the three observations previous to today contain enough information to predict the current one. We name this set of observations the **reference block**. Now, we analyse how similar this block is compared to all the blocks with 3 observations in the whole history. We assume that the more similar the blocks, the more similar the observations following the blocks.

**How do we decide how similar the blocks are? **

This is the most important decision. If we require them to be very similar, it will be **difficult to find blocks like the reference**, and if we are too lenient all the blocks will be considered alike. We need to **find a balance**.

To establish a balance we use a parameter known as the **smoothing window, h**. This provides bands around the reference block. If one of the blocks being compared falls within these bands it will be taken into consideration in the prediction and if not, it will be discarded. In the previous example, we see there are two blocks that are similar to the reference block, both of them contained within the bands created by the smoothing window. Therefore, these two blocks will be used in the prediction of the next value in the series.

This parameter choice is very important in the estimation and there are many methods to achieve them: cross validation or empirical methods. There exists quite a good approximation:

h=σ_{n}·n^{-1/(k+4) }^{ }

where σ_{n} is the standard deviation of the series, and n is the number of data points.

**How do we predict?**

Once we determine how to consider if a block is similar to the reference, we focus on the value that follows each block and calculate a weighted average. The more similar the blocks, the more weight is given to the following observation. The weights are determined with a **smoothing function or kernel**. There are many types of functions (Uniform, Gaussian, Epanechnikov, Dirichlet,…) and the differences in the predictions are minimal if we change the function. We choose a Gaussian kernel since it’s the most popular:

We need a k-dimensional version of the function, where k is the number of lags established for our series; this is the block size. The more similar the blocks are to the reference, the greater the weight assigned to the data.

**Practical Example**

We use the weekly returns of the gold spot series. Using returns ensures that our data is stationary, with mean and variance constant in time. We use 12 lags; hence, our blocks have 12 dimensions.

Using the aforementioned approximation we achieve a value of h=0.0176 for our smoothing window.

We consider the necessity to use the whole sample available until each moment, or if on the contrary, there exists an optimum number of past blocks on which to base our prediction. The predictions made are not very good (around 50% of the return signs are correctly estimated), but we look for an **optimum number of periods** using the percentage of correctly estimated return signs and the prediction error (correct sign should be as large as possible and the error as small as possible).

The best balance can be found with 261 blocks, with 52% correctly predicted signs and an error less than 0.0009. The estimations achieved with 261 blocks and with the whole previous sample are very similar, but we can see a slightly better relationship with blocks of 261 days.