# Variational autoencoder as a method of data augmentation

### T. Fuertes

#### 03/06/2020

In this blog we’ve talked about autoencoders several times, both as outliers detection and as dimensionality reduction. Now, we present another variation of them, variational autoencoder, which makes possible data augmentation. If you have ever faced Machine Learning problems, you will have dealt with the lack of data to train models. Well, this method will give you an interesting way of getting new data to fit models best.

You can find an effective method to generate new financial data by using GANS in this blog. Now, however, we delve into autoencoders to know how to use them as generative models.

## Variational autoencoder

As a kind reminder, an autoencoder network is composed of a pair of two connected networks: an encoder and a decoder. The procedure starts with the encoder compressing the original data into a shortcode ignoring the noise. Then, the decoder uncompresses that code to generate data as close as possible to the original input.

However, when building a generative model, the key point isn’t to replicate the inputs but randomly generate variations on this input from a continuous space. But the hidden layer in autoencoders may not be continuous, which might make difficult interpolation. Then, Variational Autoencoder (VAE) appears to help. Its useful property is that its latent space (related to hidden layer) is continuous, by design.

VAE achieves this by outputting a 2-dimensional vector (mean and variance) from a random variable. This vector is used to get a sampled encoding which is passed to the decoder. As encodings are generated from a distribution with the same mean and variance as those of the inputs, the decoder learns from all nearby points referred to the same latent space.

We control the divergence between probability distributions by using KL divergence. Minimizing the KL divergence means optimizing the probability distribution parameters (mean and variance) to closely resemble that of the target distribution.

Optimizing the two parts together (reconstruction loss – decoding- and KL divergence loss) results in the generation of a latent space that maintains the similarity of nearby encodings on the local scale.

VAEs are highly powerful generative tools because they work with remarkably diverse types of data: sequential or non-sequential, continuous or discrete, even labeled or completely unlabeled.

## Practical case

The trial in this post is based on an example of how to generate new images from an initial set of images. However, here we apply this process to generate new financial series. Generally, our main goal is to develop algorithms that model financial series in order to get investment rules. Nevertheless, sometimes there isn’t so much historical data as desired, so it would be pretty useful to widen the data.

Let’s take 600 fund prices from 2013 to now, more than 7 years. The returns of those series are the inputs for the encoder layer. Then, applying the process described we generate 5 new returns series. We’ve noticed that the method doesn’t keep the original mean once decoding the hidden layer, so we force new series to have a mean equal to zero.