As a continuation to our last post on Time Series Signatures and our running list of posts regarding GANs and synthetic data we now want to present the Signature Conditional Wasserstein GAN, shortened as SigCWGAN, a new GAN architecture presented in [1] that is specifically designed to generate time series of arbitrary length and dimensions.
2. Properties of the Signature
The SigCWGAN wields the properties of Signatures in order to learn the properties of the training time series. As we described in our earlier post, the Signature of a time series is a mathematical object similar to the Taylor expansion of a function in the sense that it provides a description of the path.
The Signature has properties that make it specially attractive as the base of a GAN metric. Let \( \Omega_{0}^{1}(J, \mathbb{R}^{d}) \) denote the space of time augmented paths, i.e. \( \Omega_{0}^{1}(J, \mathbb{R}^{d}) =\{t \rightarrow (t,x_t) | x\in C_0^1(J,\mathbb{R}^d) \}\) then:
- Universality: non-linear continuous functions of the un-parameterized data streams are universally approximated by linear functionals in the signature space, that is, we can approximate a continuous function on signatures via a linear combination of coordinate signatures. See [2] for more details.
- Uniqueness: the signature of a path determines the path up to time parameterization. More specifically, when restricted the path space to \(\Omega_{0}^{1}(J, \mathbb{R}^{d})\) the signature map is bijective.
2. The Signature Wasserstein-1 metric (Sig-\( W1 \) )
The Signature and the Wasserstein-1 \( W1 \) metric can be used together to define a metric on the path space \( \mathcal{X}=\Omega_{0}^{1}\left(J, \mathbb{R}^{a}\right)
\) . Let \( \mu \) and \( \nu \) be two measures on the path space \( X \) with a compact support \( K \) . The Kantorovich and Rubinstein dual representation of Wasserstein-1 metric is given by
$$W_{1}(\mu, \nu)=\sup \{\int f(x) d(\mu-\nu)(x)\} \mid \text { continuous } f: \mathcal{X} \rightarrow \mathbb{R}, \operatorname{Lip}(f) \leq 1\}$$
where \( Lip(f) \) denotes the Lipschitz constant of f. From this definition and the universality of the signature map, it is natural to embed the path to the signature space and consider the distance between \( \mu \) and \( \nu \) by
$$\operatorname{sig}-W_{1}(\mu, \nu):= \sup _{|L| \leq 1, L \text { is a linear functional }} L(\mathbb{E}_{\mu}[S(X)]-\mathbb{E}_{\nu}[S(X)])$$
where \( \mathbb{E}_{\mu} \) , \( \mathbb{E}_{\nu} \) are the expectations taken under \( \mu \) and \( \nu \) respectively. We can approximate \( \operatorname{sig}-W_{1}(\mu, \nu) \) by using the truncated signature up to a finite degree M, i.e.
$$\operatorname{Sig}-W_{1}^{(M)}(\mu, \nu):=\sup_{|L| \leq 1, L \text { is a linear functional }} L(\mathbb{E}{\mu}\left[S_{M}(X)\right]-\mathbb{E}{\nu}\left[S_{M}(X)\right])$$
When the norm of \( L \) is chosen as the \( l_2 \) norm of the linear coefficients of \( L \) , this reduced optimization problem admits the analytic solution
$$\operatorname{Sig}-W_{1}^{(M)}(\mu, \nu)=|\mathbb{E}{\mu}[S_{M}(X)]-\mathbb{E}{\nu}[S_{M}(X)]|$$
where \(|.|\) is the \( l_2 \) norm.
3. The Signature-based Conditional WGAN (SigCWGAN)
In order to set up the SigCWGAN, we assume inspired by classical auto-regressive time series models, that a \( \mathbb{R}^d \) -valued time series \( (X_t)_{t=1}^T\) satisfies \( X_{t+1}=f(X_{t-p+1: t})+\varepsilon_{t}, \text { where } \mathbb{E}[\varepsilon_{t+1} \mid \mathcal{F}{t}]=0, \mathcal{F}{t} \) is the information up to time t and \( f: \mathbb{R}^{p\times d}\rightarrow \mathbb{R}^d\) is a continuous but unknown function. The objective of the Signature based Conditional Generator for time series (SigCWGAN) is to generate the joint distribution of the future time series \( x_{future}= X_{t:t+q}\) given the past time series \( x_{past}= X_{t-p+1:t}\).
To do this, we can use a conditional autoregressive generator \(G^\theta : \mathbb{R}^{d\times p} \times \mathcal{Z}\rightarrow \mathbb{R}^d\) that aims to take the past path \(x_{past}= X_{t-p+1:t}\) and the noise vector \(Z_{t+1}\) to generate a random variable in \(\mathbb{R}^d\) whose conditional distribution of the next step forecast is as close as possible to \(\mathbb{P}\left(X_{t+1}|X_{t-p+1}=x\right)\). Iteratively generating values via \({X}^{(t)}_{t+1} = G^\theta(X{t-p+1:t},Z_{t+1})\) we obtain the step-q estimator \({X}^{(t)}_{t+1:t+q}\). Generating the time series in this way allows us to generate arbitrary lengths while accounting for temporal dependency of the series’ values
Now for the discriminator, we want to have generated values, that approximate the expected value of the signature of real paths, hence we define the loss as the summation of the \(l_2\) norm of the error between the conditionally expected signature of the future true path and future path generated by the generator given the past path over each time t, i.e.
$$L(\theta)=\sum_{t}|\mathbb{E}_{\mu}[S_{M}(X_{t+1: t+q} \mid X_{t-p+1: t}]-\mathbb{E}_{\nu}[S_{M}(\hat{X}{t+1: t+q}^{(t)}) \mid X{t-p+1: t}]|$$
where \(\mu\) and \(\nu\) denote the conditional distribution induced by the real data and synthetic generator respectively, \(G^\theta\) is the generator, \(\left(\hat{X}_{t+1: t+q}^{(t)}\right)\) is the q-step forecasted by \(G^\theta\).
Thanks to the universality property of signatures, since the expectations are continuous functions on signatures, they can be estimated via linear regression, both on real and generated data, so all the quantities above can be calculated without the need of having a discriminator network!
By matching expected signatures, the SigCWGAN manages to generate time series of arbitrary length and dimensions, which is a very big breakthrough in time-series generation. More details on the SigCWGAN training and the official implementation on PyTorch can be found in [1].
References:
[1] : Hao Ni, Lukasz Szpruch, Magnus Wiese, Shujian Liao, Baoren Xiao, Conditional Sig-Wasserstein GANs for Time Series Generation. arXiv:2006.05421
[2]: Daniel Levin, Terry Lyons, and Hao Ni. Learning from the past, predicting the statistics for the future, learning an evolving system.arXiv e-prints, page arXiv:1309.0260, September 2013