Deep Reinforcement Trading

Deep Reinforcement Learning applications in finance are still largely unknown. Nonetheless, recent developments in other fields have pushed researchers towards exciting new horizons.

I believe there is a huge potential for Reinforcement Learning in finance. As investment guru Ray Dalio, founder of Bridgewater, defends in his book Life and Work Principles, investment is an iterative process. You make your bets, fail (sometimes painfully), learn something new and try again. During this struggling process, you improve your own decision making by constant trial-and-error. This principle, obvious to us since Darwin shed light into the way nature works, is also true for investment decision-making. Luckily, this intuitive idea has its own chapter in the Artificial Intelligence encyclopedia and it is called Reinforcement Learning.

How Reinforcement Learning works

Simply put, Reinforcement Learning (RL) is a framework where an agent is trained to behave properly in an environment by performing actions and adapting to the results. It is different from other Machine Learning systems, such as Deep Learning, in the way learning happens: it is an interactive process, as the agent actions actively changes its environment.

Reinforcement Learning diagram

During this iterative process, the agent performs actions over the environment and observes the immediate result; this feedback is used to improve the following action taken and the process starts again.

More specifically, to tell the agent what action to take under which circumstances we learn a policy, which is represented by a so-called Quality Table with all possible states and actions. Each time the agent selects an action a_t, observes a reward r_t, and enters a new state s_t+1 , the Q-Table is updated in the following way:

Quality Table

The discount factor ɣ determines the importance of future rewards. If it is 0 our agent will only learn to consider current rewards, while a ɣ of 1 will make it strive for a long-term high reward.

More advanced implementations of RL include Google Deep Mind‘s Deep Reinforcement Learning. The technique adds deep neural networks to approximate, given a state, the different Q-values for each action. This allows the model to map between a state and the best possible action without needing to store all possible combinations:

Deep Reinforcement Learning diagram

Let’s get down to business

Apart from just playing Atari games, it seems reasonable for such a framework to have meaningful applications in finance and trading due to the following reasons:

The size of the quantitative description of the environment in finance may be large or even continuous.
Actions may have long-term consequences, not directly measurable by other supervised learning techniques.
Your trader actions affect current market conditions (though usually, this effect is negligible)

Recently OpenAI, a non-profit AI research company, released OpenAI Gym, a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents all sorts of activities, from walking to playing games like pong or pinball. Though its applications on finance are still rare, some people have tried to build models based on this framework. One example is Q-Trader, a deep reinforcement learning model developed by Edward Lu. The implementation of this Q-learning trader, aimed to achieve stock trading short-term profits, is shown below:

The model implements a very interesting concept called experience replay. This technique, used in the famous AlphaGo, improves model stability by storing the agent’s past experiences and randomly replaying them.

We have conducted a test with the Tesla Inc. stock. Our training period ranges from 2012-11-25 to 2017-11-25, where we have run 200 episodes with a price window of 10 days:

> python train.py TSLA_train 10 200

After the (long) training period, we have tested the agent during the date range from 2017-11-26 to 2018-11-26:

> python evaluate.py TSLA_test model_ep200

Tesla Inc. stock

The model executes 16 trades (8 buys/8 sells) with a total profit of -$0.36.

You can try out the model in other stocks and play with different window length and number of episodes or check out more examples here. Even though this example showed non profitable, this DRL framework is a great starting point to develop more powerful models. As Ray used to say: in order to succeed you should learn to struggle well.

Thanks for reading!

4 respuestas a “Deep Reinforcement Trading”

Juan dice:

11/01/2019 a las 6:45 pm

Hello, interesting article. If I understood well, reinforcement learning is an artificial intelligence technique that decides what to do given a certain situation, depending on past experiences of the same kind? Congratulations, nice article!
- lcampos dice:
  
  21/01/2019 a las 10:29 am
  
  Hi Juan,
  Yes, if you want to get a more general understanding about the topic, you can take a look at our post https://quantdare.com/machine-learning-a-brief-breakdown/
  Thanks a lot for your comment!
Paris Pitman dice:

29/11/2018 a las 12:13 pm

Nice introduction to Deep RL in finance!
In my humble opinion, there are four reasons why these techniques will fail in finance (trust me, I’ve made a huge effort to make them work before miserably giving up):
1) It takes millions of episodes from thousands of games to teach an agent to play arcade video-games or chess. In your case, you only have a bit more than 1000 data. Only in intra-day trading applications you could gather an amount of data comparable to such examples.
2) As you point out in the post, the agent is assume to interact with the environment, modifying it. Unless the trading orders are big enough to move the prices, we are unable to learn through modifying the environment to turn it in our favor, so that reinforcement learning loses much of its meaning.
3) These techniques perform well in very high signal-to-noise scenarios. Again, in the video-game examples, the input to the model is the content of the screen, free of noise. In financial markets we have quite the opposite: most of the observable magnitude is noise.
4) The discounted reward is a very powerful mechanism in Q-trading. It assumes that the reward comes in “jumps”, such as reaching milestones. In your particular application, however, the long-term return is just the accumulation of daily returns, so there’s no need to perform a discount.

Congrats for the nice job, anyway!
- lcampos dice:
  
  29/11/2018 a las 3:41 pm
  
  Thanks for your comment! Your intuition is great and identifies the main difficulties ML practitioners encounter when implementing Deep Reinforcement models in production. Let me add some comments:
  1) It is true that for a retail trader, the impact of the agent on prices is not generally an issue. Even though, for large investment firms implementing ML models this effect might not be negligible.
  2) As you point out, to improve the robustness of the model is crucial to increase the sampling frequency. Recent techniques used in HFT include volume-clock sampling, where you dynamically change the sampling frequency of your model (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2034858)
  
  First thing to overcome obstacles is to identify weakspots. Thank you so much and don’t give up!

Daring to quantify the markets |

All

Deep Reinforcement Trading

Luis Campos

28/11/2018

How Reinforcement Learning works

Let’s get down to business

related posts

4 respuestas a “Deep Reinforcement Trading”

How Reinforcement Learning works

Let’s get down to business

related posts

AI case study: Long/Short Strategy

Javier Cárdenas

A new way to train neural networks: the Forward-Forward algorithm

Alejandro Pérez

Unlocking Wealth and Diversification: The Powerful Advantages of Investing in Conglomerate Stocks

Konstantinos Pappas

Clustering Forex Market

aporras

4 respuestas a “Deep Reinforcement Trading”