The Signature of a time series is a universal description for a stream of data derived from the theory of controlled differential equations. Over the last years, this technique has been used successfully applied in a wide array of Machine Learning tasks dealing with sequential data, such as the chinese character recognition problem or extracting information from the signature of a financial data stream.

In fact, quantitative finance is one of the most natural applications of the signature method, because of its ability to describe the interactions of complex oscillatory systems which is often the case for financial time series.

## 1. Iterated Integrals

For a path \(X : [a,b] \rightarrow \mathbb{R}^d\) we denote the coordinate paths by \((X_t^1,…,X_t^d)\) where each \(X^i : [a,b] \rightarrow \mathbb{R}^d\) is a real-valued path. For any single index \(i \in \{1,…,d\}\), let us define the quantity

$$

S(X)_{a, t}^{i}=\int_{a<s<t} d X_{s}^{i}=X_{t}^{i}-X_{0}^{i}

$$

which is the increment of the i-th coordinate of the path at time \( t \in [a,b]\). Now for any pair \( i, j ∈ {1, . . . , d} \), let us define the *double-iterated* integral

$$

S(X){a, t}^{i, j}=\int_{a<s<t} S(X){a, s}^{i} d X_{s}^{j}=\int_{a<r<s<t} d X_{r}^{i} d X_{s}^{j}

$$

Likewise for any triple \(

i, j, k \in \{1, . . . , d\} \) we define the *triple-iterated* integral

$$

S(X){a, t}^{i, j,k}=\int_{a<s<t} S(X){a, s}^{i,j} d X_{s}^{k}=\int_{a<q<r<s<t} d X_{q}^{i} d X_{r}^{j}d X_{s}^{k}

$$

We can continue recursively, and for any integer \(k ≥ 1\) and collection of indexes \(i_1, . . . , i_k \in \{1, . . . , d\}\), we define

$$

S(X){a, t}^{i_{1}, \ldots, i_{k}}=\int_{a<s<t} S(X)_{a, s}^{i{1}, \ldots, i_{k-1}} d X_{s}^{i_{k}}

$$

The real number \(S(X)^{i_1,…,i_k}_{a,b}\) is called the k-fold iterated integral of X along the indexes \(i_1, . . . , i_k\).

## 2. Definition (Signature)

The *signature *of a path \(X : [a,b] \rightarrow \mathbb{R}^d\) denoted by \(S(X)_{a,b}\) is the collection (infinite series) of all the iterated integrals of \(X\) Formally, \(S(X)_{a,b}\) is the sequence of real numbers

$$

S(X)_{a, b}=\left(1, S(X)_{a, b}^{1}, \ldots, S(X)_{a, b}^{d}, S(X)_{a, b}^{1,1}, S(X)_{a, b}^{1,2}, \ldots\right)

$$

where the “zeroth” term, by convention, is equal to 1, and the superscripts run along with the set of all multi-indexes

$$

W=\{\left(i_{1}, \ldots, i_{k}\right) \mid k \geq 1, i_{1}, \ldots, i_{k} \in{1, \ldots, d}\}

$$

The set W above is also frequently called the set of words on the alphabet \(A = {1, . . . , d}\) consisting of \(d\) letters.

#### Example

Consider an alphabet **consisting of three letters** only: \(\{1, 2, 3\}\). There is infinite number of words which could be composed from this alphabet, namely:

$$

\{1, 2, 3\} \rightarrow \left(1, 2, 3, 11, 12, 13, 21, 22, 23, 31, 32, 33, 111, 112, 113, 121, . . .\right).$$

Each collection of terms of a signature \(

S(X)_{a, t}^{i_{1}, \ldots, i_{k}}\) where the multi-index is of length *k *is referred to as a *level*. Note that each level of a signature has \(d^k\) elements.

## 3. Picard iterations: motivation for the signature

The signature is a concept that arises naturally in the **classical theory of ordinary differential equations** (ODEs), derived from the analysis of Picard’s approximation method.

Consider a path \(X:[a, b] \mapsto \mathbb{R}^{d}. \text { Let } \mathbf{L}\left(\mathbb{R}^{d}, \mathbb{R}^{e}\right)\) denote the vector space of linear maps from \(\mathbb{R}^d\) to \(\mathbb{R}^e\). Equivalently,\(\mathbf{L}\left(\mathbb{R}^{d}, \mathbb{R}^{e}\right)\) can be regarded as the vector space of d × e real matrices. For a path \( Z:[a,b] \rightarrow \mathbf{L}\left(\mathbb{R}^{d},\mathbb{R}^{e}\right)\), note that we can define the integral

$$ \int_a^b Z_tdX_t$$

as an element of \(\mathbb{R}^e\) in exactly the same way as the usual path integral. For a function \(V :\mathbb{R}^{e} \rightarrow

\mathbf{L}\left(\mathbb{R}^{d}, \mathbb{R}^{e}\right)

\) and a path \( Y : [a, b] \rightarrow

\mathbb{R}^{e}\) , we say that Y solves the controlled differential equation

$$d Y_{t}=V\left(Y_{t}\right) d X_{t}, \quad Y_{a}=y \in \mathbb{R}^{e}$$

precisely when for all times \(t \in [a,b]\)

$$Y_{t}=y+\int_{a}^{t} V\left(Y_{s}\right) d X_{s}$$

The map V in the above expression is often called a collection of *driving vector fields*, the path X is called the *control *or the *driver*, and Y is called the *solution *or the *response*.

A standard procedure to obtain a solution to the controlled differential equation above is through **Picard iterations**. For an arbitrary path \( Y : [a, b] \rightarrow \mathbb{R}^{e}\) , define a new path \( F(Y ) : [a, b] \rightarrow \mathbb{R}^{e}\) by

$$F(Y){t}=y+\int_{a}^{t} V\left(Y_{s}\right) d X_{s}$$

Observe that \(Y\) a solution to the controlled differential equation if and only if \(Y\) is a fixed point of \(F\) . Consider the sequence of paths \(Y_n^t = F(Y^{n−1} )_t\) with initial arbitrary path \(Y_0^t\) (often taken as the constant path \(Y_0^t = y\)).

Under suitable assumptions, one can show that \(F\) possesses a unique fixed point \(Y\) and that \(Y_n^t\) converges to

\(Y\) as \(n\rightarrow\infty\).

Consider now the case when \( V: \mathbb{R}^{e} \mapsto \mathbf{L}\left(\mathbb{R}^{d}, \mathbb{R}^{e}\right)\) is a linear map. Note that we may equivalently treat \(V\) as a linear map \( \mathbb{R}^{d} \mapsto \mathbf{L}\left(\mathbb{R}^{e}, \mathbb{R}^{e}\right)\), where \(\mathbb{R}^{e}, \mathbb{R}^{e}\) is the space of all e×e real matrices. Let us start the Picard iterations with the initial constant path

\(Y_0^t = y\) for all \(t \in [a, b]\). Denoting by \(I_e\) the identity operator (or matrix) in \(

\mathbf{L}\left(\mathbb{R}^{e}, \mathbb{R}^{e}\right)\) , it follows that the iterates of \(F\) can be expressed as follows:

$$

Y_{t}^{0}=y

$$

$$

Y_{t}^{1}=y+\int_{a}^{t} V\left(Y_{s}^{0}\right) d X_{s}=\left(\int_{a}^{t} d V\left(X_{s}\right)+I_{e}\right)(y) $$

$$

Y_{t}^{2}=y+\int_{a}^{t} V\left(Y_{s}^{1}\right) d X_{s}=\left(\int_{a}^{t} \int_{a}^{s} d V\left(X_{u}\right) d V\left(X_{s}\right)+\int_{a}^{t} d V\left(X_{s}\right)+I_{e}\right)(y) $$

$$

\vdots $$

$$

Y_{t}^{n}=y+\int_{a}^{t} V\left(Y_{s}^{n-1}\right) d X_{s}=\left(\sum_{k=1}^{n} \int_{a<t_{1}<\ldots<t_{k}<t} d V\left(X_{t_{1}}\right) \ldots d V\left(X_{t_{k}}\right)+I_{e}\right)$$

Due to the fact that \(\mathbf{L}\left(\mathbb{R}^{e}, \mathbb{R}^{e}\right)\) is an algebra of matrices, each quantity

$$ \int_{a<t_{1}<\ldots<t_{k}<t} d V\left(X_{t_{1}}\right) \ldots d V\left(X_{t_{k}}\right)$$

can naturally be defined as an element of

\(\mathbf{L}\left(\mathbb{R}^{e}, \mathbb{R}^{e}\right)\) , which, one can check, is completely determined (in a linear way) by the k-th level of the signature \(S(X)_{a,t}\) of \(X\) at time \(t \in [a, b]\). The conclusion of this is that the solution \(Y_t\) is completely determined by the signature \(S(X)_{a,t}\). In a way, the signature is the equivalent of a Taylor series approximation to a time series.

In upcoming posts, we will expand upon the applications of signature data to ML problems, specially on a new kind of GAN that generates time-series data using properties of the signature, stay tuned!

## References

- Ilya Chevyrev and Andrey Kormilitzin. A primer on the signature method in machine learning. arXiv preprint arXiv:1603.03788, 2016.