In financial time series it is very common to make predictions of single points such as expected future prices or returns. But is there any other way of adding more information in our forecasts?
In today’s post we will be making probabilistic forecasts for time series data using recurrent neural networks with pytorch.
Introduction
Even though point forecasting gives us information about tomorrow’s expected value (such as the returns of an asset), it does not give us information about the uncertainty of those predictions. On the other hand,probabilistic forecasting in the form of quantiles or intervals, can provide us with more information about unexpected fluctuations and lead to better decision making.
For this reason, probabilistic forecasting has shown to work well for problems such as weather forecasting [5], inventory optimization [7] and for economic purposes [8].
It is more scientific and honest to be allowed occasionally to say ‘I feel very doubtful about the weather for tomorrow’ . . . and it must be . . . useful to the public if one is allowed occasionally to say ‘It is practically certain that the weather will be so-and-so tomorrow’
Cooke, 1906
We now know that probabilistic forecasting dates back to more than 200 years ago, and that it was first applied in the meteorological field, but it was not until 1906 when the Australian astronomer Cooke first explicitly tried to solve the problem of weather forecasts [5].
Quantile probabilistic forecasting
Although there can be many ways of making probabilistic forecasts, in today’s post we will create them from quantile predictions. In order to do so, we will use the pinball or quantile loss function in our recurrent neural networks.
It all started so normally…
In statistics robustness is used to describe a statistic that is resilient or resistant to errors in the results produced by deviations from the assumptions of hypothetical models.
For example: Many times in order to keep things simple, we tend to assume that certain variables follow a normal distribution. But this reduction of complexity comes at a cost and can lead us to poor performance in our estimators. This is the case of the least squares estimator, which has an extreme sensitivity to outliers and thus to many non-Gaussian distributions.
It was Koenker and Bassett [6] who, in 1975, searching for a more robust alternative to the least squares estimator for the linear model, first described a new class of statistics which they named “regression quantiles”. These regression quantiles made use of the loss function that we will be using in our post.
A few gross errors occurring with low probability can cause serious deviations from normality: to dismiss the possibility of these occurrences almost invariably requires a leap of Gaussian faith into the realm of pure speculation
Roger Koenker, 1975
Quantile loss function
As described above, quantile or pinball loss function was first used in quantile regressions, and it can be expressed as:
Given an error:
$$
\xi = y – \hat{y}
$$
And a quantile \( \tau \), the pinball loss function would be:
$$
p_{\tau}( \xi ) = \begin{cases}
\tau \xi , & \mbox{if } \xi \geq 0\\
(\tau – 1) \xi , & \mbox{if } \xi < 0 \\
\end{cases}
$$
As you can see above, the function penalizes more when the predicted upper/lower quantiles \( \hat{y} \), are below/above the real value \( y\). You can get a better intuition by looking at the following plots, where the loss is shown for \( y=0\) and the quantiles 0.1, 0.5 and 0.9.


So, as you can see, predicting certain quantiles can give us a robust approximation about the uncertainty and interval values of the future target.
Smooth Pinball loss function
Even though the above function is used to compute the quantile regression, we can see that the function is non-differentiable in \( \xi=0 \). In order to train our RNN we will use a differentiable approximation of the quantiles loss, the smooth pinball loss function, described in [3].
$$
S_{\tau, \alpha}(\xi) = \tau \xi + \alpha log(1 + e^{-\frac{\xi}{\alpha}})
$$
Again, the following plot can help us better visualize what the loss function is doing. Note: in this example we have set \( \alpha=0.001\) .

From quantile predictions to cumulative distributions
One of the main advantages of the quantile predictions is that we can derive the distribution at every single point, as shown in the graph below. The graph on the right is the representation of the cumulative distribution obtained from the quantiles in two concrete points of y shown by the two vertical lines in the graph on the left.

Experiment
Having done a brief introduction to the smooth pinball loss function, it’s time to see some results! The code used in this experiment can be found in this GitHub repository.
In our example we will be using the following data:
- The hourly returns of electricity prices in Spain. We have decided to introduce this series due to its seasonality component.
- The daily returns of Apple share prices. In this case we will be using the VIX as another input for the model.

Furthermore, similar to what Alejandro Pérez Sanjuán did in his post, we will be using two models:
- Vanilla LSTM: A simple LSTM model.
- Attention LSTM: Attention mechanism added to the LSTM.
We will not describe these models in detail since they are not the main subject of the post, but in case you need it you can see [2] and [1] for more information about the LSTM and the Attention mechanism, respectively.
The input used for the two models will be the last 20 observations of each series and in the case of the AAPL share prices we will add in the inputs the last 20 observations of the VIX series.
How will we measure uncertainty?
For the purpose of this experiment we will try to predict the following 8 quantiles: 0.025, 0.05, 0.1, 0.15, 0.85, 0.9, 0.95, 0.975. And pairing them with its counterpart quantile we will have 4 intervals (example: 0.025 with 0.975 will make one interval, including the 95% of the data): 95%, 90%, 80% and 70%.
Interval score
Furthermore, we will be using the interval score to compare the results of both models. The interval score as described in [3] is expressed as follows:
$$
IS = \frac{2}{NM} \sum_{t=1}^N \sum_{i=1}^{\frac{M}{2}} (u_t^{\beta_i} – l_t^{\beta_i}) + \frac{2}{\beta_i}(l_t^{\beta_i} – y_t) c(l_t^{\beta_i}) + \frac{2}{\beta_i}(y_t – u_t^{\beta_i})c(u_t^{\beta_i})
$$
Where:
$$
c(l) = \begin{cases}
1, & \mbox{if } y < l \\
0, & \mbox{if } y \geq l \\
\end{cases}
$$
And
$$
c(u) = \begin{cases}
1, & \mbox{if } y > u \\
0, & \mbox{if } y \leq u \\
\end{cases}
$$
Results of the experiments
Electricity prices
As you can see from the results obtained for the electricity prices, the LSTM model seems to work better than the LSTM with an attention mechanism. Furthermore, we can see how both models have captured the seasonal component of the series and observed returns lay most of the time inside our predictive intervals, which are sharper or wider depending on the hour of the day.


Apple
The first think we can notice is that it seems harder to capture any pattern in the series using this data, and thus most of the time the predictive intervals maintain the same values. Again, the vanilla LSTM seems to perform better than the LSTM with the attention mechanism.
Finally, we can notice how the model vanilla LSTM updates the interval during some big movements in mid 2019, and it those cases it would probably (depending on your risk level) be better to hedge your Apple positions or stay outside of the market.


Conclusions
In this post we have used a variant of the Pinball loss function to predict uncertainty in the movement of electricity and Apple share prices.
It seems the models have captured correctly the seasonality of the electricity prices. Nevertheless, for the stock prices, models have had a hard time to detect any clear pattern, but it shows some wider intervals during high volatility moments.
Thanks for reading and see you in the following post!
References
- Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, Lukasz and Polosukhin, Illia – Attention is All you Need.
- Christopher Olah – Understanding LSTM Networks
- Kostas Hatalis, Alberto J. Lamadrid, Katya Scheinberg, Shalinee Kishore – Smooth Pinball Neural Network for Probabilistic Forecasting of Wind Power.
- Roger Koenker and Kevin F. Hallock – Quantile Regression.
- Allan H. Murphy – The Early History of Probability Forecasts: Some Extensions and Clarifications.
- Roger Koenker and Gilbert Bassett – Regression Quantiles.
- Vasilis Assimakopoulos -Product sales probabilistic forecasting: An empirical evaluation using the M5 competition data.
- Shaun P. Vahey – Moving towards probability forecasting.