Artificial Intelligence

Improving data diversity. Synthetic Financial Time Series Generator

mplanaslasa

09/05/2018

2

When dealing with data we (almost) always would like to have better and bigger sets. But if there’s not enough historical data available to test a given algorithm or methodology, what can we do? Our answer has been: creating it. How? By developing our own Synthetic Financial Time Series Generator.

Let’s start using a metaphor to make our purpose clearer: imagine you are a researcher of such an important disease as cancer, and you are trying on patients a specific treatment. If you could choose, would you travel to the past, when many advances in medicine hadn’t yet been discovered? Or would you rather have the opportunity to go to the future and know more about different patient’s reactions or possible side effects of your treatment? I mean, wouldn’t you prefer to enrich the diversity of your data adding multiple different scenarios that your medicine can lead to? That’s what we’d like to share in this post: a method to enrich our data (fortunately not about illnesses but about markets’ prices) in order to improve our ‘treatments’ (in this case, investment strategies).

We’ll introduce this with an example: here we have two-time series. Which one of these time series would you bet that is real and which one is synthetic?

Time series: one is real and the other synthetic

The answer is:

Time series

the purple one.

But, if the turquoise one is not real, how has it been created? Don’t think about classical models such as Geometric Brownian Motion, ARIMA, ARCH models… This example is just one out of the huge amount that the new tool developed in our company can create. It is called the Synthetic Financial Time Series Generator (from now on SFTSG).

Financial data is short. Why don’t make it longer?

As quantitative investment strategies’ developers, the main problem we have to fight against is the lack of data diversity, as the financial data history is relatively short. Trying to solve this, we started a research project in collaboration with Universidad Autónoma de Madrid. The result was the SFTSG. It allows us to originate unseen scenarios reproducing realistic inter-asset relationships. So we can produce endless series, fight against overfitting and consequently design more robust algorithms. This is a way to design strategies prepared for more varied market situations.

Data

This way we improve the data our algorithms need to be better tested, just as the cancer researcher would do if he could enlarge the known list of reactions to his remedy. The task is challenging, as financial markets present a quite complex behaviour. In fact, creating synthetic time series isn’t only about catching individual characteristics of time series or instruments, but also about reproducing relations between many elements in a big financial ecosystem.

What makes the STSFG better than other tools?

We’ll briefly show the advantages the SFTSG has when comparing with possible ‘peers’: classical and ‘state-of-the-art’ models created to simulate time series.

Kurtosis

Most of the models that simulate financial time series assume normality of returns as the Bible. But anyone who is used to work with this kind of data knows that this is not true. Kurtosis tends to be above value 3, indeed:

Graphic with Gaussian distribution                                                               Real Series

But if we simulate time series under a Geometric Brownian Motion (GBM) model, it shows kurtosis values near 3, as Normal Distribution actually states.

 

                                                                                                                                                                                                 GBM Simulation

Fortunately, the series generated by the SFTSG are more realistic regarding this subject.

                                                               SFTSG Simulation

 

So… GBM clearly fails to reproduce the excess kurtosis of stock returns, as all the return time series is drawn from the same Gaussian distribution with fixed parameters. Instead, the SFTSG is able to reproduce such an important feature.

Volatility clustering

To measure this characteristic we can compare our SFTSG to BEKK models, which are a fair choice because of their novelty. They are the multivariate version of a GARCH model, but they show a lack of goodnesses when comparing to SFTSG results.

Note that a GARCH model is appropriate to well reproduce the volatility clustering, but it has the limitation of being univariant. We could assume then, that a BEKK model will also catch the groups of volatility, and besides in a ‘multivariate’ way. But it doesn’t.

Let’s prove it. In the following example, the BEKK model implemented in the MFE Toolbox (Sheppard, 2013) has been used with parameters p = 1 and q = 1 to simulate asset returns from multivariate data. Three different virtual scenarios have been simulated with a different number of stocks: 2, 5 and 10.

The correspondent sample autocorrelation of absolute returns is plotted among these three virtual scenarios:

Multivariate model: BEKK

As you can see, when the number of assets in the model is incremented, the volatility clustering is worse reproduced, as reflected by a lower autocorrelation of absolute returns. Conversely, the SFTSG reproduces volatility clustering perfectly, even outperforming the GBM model mentioned in the previous point, as shown below:

Real vs Simulated vs GBM

Computational cost & freedom of constraints

Last but not least, the computational cost is a relevant aspect when simulating hundreds of time series. As we mentioned, BEKK is one of the current best versions of simulation models, but it has a much higher computational cost than the SFTSG. In addition, for simulating only 10 time-series using BEKK, apart from needing more time than with the SFTSG, we’d also need exactly 10 real time-series to estimate the model. However, the SFTSG system doesn’t need any specific number of assets to be trained. Users can pick as many real-time series as they want, and they will lead to as many synthetic time series as they desire.

To know more about the advantages that this tool has in comparison with current simulation models, we invite you to read a recent paper in which we explain in detail all aspects related to the SFTSG.

Thanks for reading!

1 Comment threads
1 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
2 Comment authors
Darío

Hi,

Congratulations for your good job.

Have you shared any code about it?

Regards,