The scarcity of historical financial data is a big hindrance in algorithmic trading model development. In the ever-changing economic reality we live in, countless models are tried and evaluated. Most of these models seek to extract information from the market by measuring a set of reasonable variables. Through backtesting, an overwhelming amount of these models are seen not to perform and are thus routinely discarded; some, however, do appear to work well. Out of these models, how many of their performances are just a product of overfitting? Given the lack of available historical data, this is a well-founded concern.
We already discussed the necessity of generating synthetic financial data in one of our posts. At the time we developed a feature-extraction and feature-reproduction algorithm to carry out our generation, we were content with the results, as our model outperformed many of the State-of-the-Art models such as BEKK and GARCH, but found it difficult to improve it further.
Finding how to change our procedure (based on extracting parameters for a semi-stochastic generation) turned out to be way too cumbersome. There were features of real series our model didn’t perfectly reflect that a certain change could improve, this change, however, turned out to break up another feature previously present, forcing us to discard it. There were also features we could not come up with changes that could replicate them. Finally, there are probably features of financial series that we cannot even grasp, or mathematically describe, given the complexity of the generation process behind financial series and our limited analysis capabilities.
Machine Learning, here to save the day!
All hope in improving our methodology, however, is not lost. With the recent discoveries in Generative Adversarial Networks, GANs for short, we saw a new and completely novel methodology that could solve all our previous worries, since GANs completely automatize the feature-selection and feature-reproduction tasks. If this is the first time you’re hearing about GANs I suggest you visit the following post in order to get acquainted with them. Once you feel you grasp the basics check out these awesome applications as well as some trippy training videos!
We took an improved version of the original vanilla GAN, Wasserstein’s GAN with Gradient Penalty (you can read the motivation behind it here) originally programmed to generate pictures and modified its entrails (the discriminator and generator networks) to generate synthetic financial universes instead.
We trained our WGAN-GP with a dataset of our interest, the Chicago Board of Options Exchange Volatility Index, more commonly known as the VIX. The VIX is considered an approximation of the markets’ expectation of index volatility over a 30-day time period index. Many papers have been written about ways to model its dynamics ( see: ,,) yet the VIX remain a somewhat untamable beast. It presents huge and sudden jumps, with seemingly no “warning” behaviour. Given the flexibility of GANs, we think trying to model the VIX is a great first challenge, so we set ourselves to the task.
First, we take the VIX price series and calculate the daily returns. From the daily returns, we take segments of 1000 days rolling forward 100 days at a time, so that all segments share 100 days with the previous and following segment. In this way, we get a set of different behaviours of the VIX over time and we can ask our GAN model to learn the underlying structure of this behaviour.
We train our GAN for 100.000 iterations on a computer with a GeForce GTX 1070 Ti and the GPU enabled version of Tensorflow.
After training, we ask our model to produce realizations of the VIX and get outputs like the following:
All of which seem like believable realizations of the VIX, as they share a lot of properties with the original. They have sudden jumps and drops as well as “curve-shaped” downtrends. Visualizing some statistics about the synthetic series and comparing them to those of the original could be a good exercise to verify the robustness of our methodology, but we, however, will leave that for another post.
Using the enlarged dataset produced by our GAN we could better backtest our strategies based on the index, to avoid overfitting as well as gaining insight on possible model improvements.
GANs seem to be a great methodology to capture dynamics of financial assets and forecast future movements. I can think of applying them to Derivatives Pricing, Portfolio Hedging and Risk Management out the top of my head, so I bet they’ll definitely become a crucial tool in the future.
Stay tuned for more developments and projects using GANs!