Deep Reinforcement Learning applications in finance are still largely unknown. Nonetheless, recent developments in other fields have pushed researchers towards exciting new horizons.
I believe that there is a huge potential for Reinforcement Learning in finance. As investment guru Ray Dalio, founder of Bridgewaters, defends in his book Life and Work Principles, investment is an iterative process. You make your bets, fail (sometimes painfully), learn something new and try again. During this struggling process, you improve your own decision making by constant trial-and-error. This principle, very obvious for us since Darwin shed light into the way nature works, is also true for investment decision-making. Luckily, this intuitive idea has its own chapter in the Artificial Intelligence encyclopedia and it is called Reinforcement Learning.
How Reinforcement Learning works
Simply put, Reinforcement Learning (RL) is a framework where an agent is trained to behave properly in an environment by performing actions and seeing the results. It is different from other Machine Learning systems, such as Deep Learning, in the way learning happens: it is an online process, as the agent actively interacts with its environment.
During this iterative process, the agent performs actions over the environment and observes the immediate result; this feedback is used to improve the following action taken and the process starts again.
More specifically, to tell the agent what action to take under what circumstances we learn a policy, which is represented by a so-called Quality Table with all possible states and actions. Each time the agent selects an action at, observes a reward rt, and enters a new state st+1 , the Q-Table is updated in the following way:
The discount factor ɣ determines the importance of future rewards. If it is 0 our agent will only learn to consider current rewards, while a ɣ of 1 will make it strive for a long-term high reward.
More advanced implementations of RL include Google Deep Mind‘s Deep Reinforcement Learning. The technique adds deep neural networks to approximate, given a state, the different Q-values for each action. This allows the model to map between a state and the best possible action without needing to store all possible combinations:
Let’s get down to business
Apart from just playing Atari games, it seems reasonable for such a framework to have meaningful applications in finance and trading because of several reasons:
- The size of the quantitative description of the environment in finance may be large or even continuous.
- Actions may have long-term consequences, not directly measurable by other supervised learning techniques.
- Your trader actions affect current market conditions (though usually, this effect is negligible)
Recently OpenAI, a non-profit AI research company, released OpenAI Gym, a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like pong or pinball. Though its applications on finance are still rare, some people have tried to build models based on this framework. One example is Q-Trader, a deep reinforcement learning model developed by Edward Lu aimed to achieve stock trading short-term profits. The implementation of this Q-learning trader is shown below:
The model implements a very interesting concept called experience replay. This technique, used in the famous AlphaGo, improves model stability by storing the agent’s past experiences and randomly replaying them.
We have conducted a test with the Tesla Inc. stock. Our training period is from 2012-11-25 to 2017-11-25, where we have run 200 episodes with a price window of 10 days:
> python train.py TSLA_train 10 200
After the (long) training period, we have tested the agent during the date range from 2017-11-26 to 2018-11-26:
> python evaluate.py TSLA_test model_ep200
The model executes 16 trades (8 buys/8 sells) with a total profit of -$0.36.
You can try out the model in other stocks and play with different window length and number of episodes or check out more examples here. Even though this example showed non profitable, this DRL framework is a great starting point to develop more powerful models. As Ray used to say: in order to succeed you should learn to struggle well.
Thanks for reading!