Understanding classification of market regimes is fairly important in finance. It all comes down to correctly predicting the way prices are going to move. But prediction isn’t the only crucial thing; knowing how to describe what has already happened is also of great importance.
In this QuantDare post we look at types of classification of markets. We concentrate on their differences and suggest possible data science techniques to achieve them.
Future Prediction vs Past Description
First, we must determine the nature of our problem. The classification of markets can be broken down in two:
- Future prediction
- Past description
When faced with prediction in finance, you can only use past information available up to, and not including, the period you wish to predict. This is ensured after implementing a strategy in real-time but it is critical in backtesting when it can often be hard to avoid and easily forgotten.
With past description, you can use all the data available to define historical markets. This difference may seem obvious but it is surprising how often the two are misidentified.
The final aim is usually to accurately predict market movements. But before predicting something, you have to be very clear of the possible outcomes. And only then can you choose an appropriate Machine Learning algorithm.
Also, we have to consider that the type of past description affects how easy it could be to carry out the future prediction. This makes defining past market regimes even more important.
Known vs Unknown Regime Definitions
Defining historical markets isn’t trivial. There are lots of decisions to make. First, we’re going to split past description into two cases:
- Well-defined regimes
- Undefined regimes
What do we mean? In classification our classes have names but they may not tell us anything about the regime itself. Compare these two cases:
Both depict classifications of the S&P 500 Price Index. The first has two “known” classes: up trends and down trends. Each class is clearly defined and we know how to act in each case.
The second has 6 “unknown” classes: Regimes 0 to 5. The days assigned to the same market regime have similar characteristics but it isn’t clear what each one represents or the differences between them. Without further investigation, we wouldn’t immediately know how to act.
Let’s delve further into these two cases.
Although these types of regimes have clear definitions, you still have to choose the number of classes you want to divide the series into. For example:
- 2 classes: Up vs Down
- 3 classes: Up, Down, Lateral
- Multiple: Discrete scale from Up to Down
And the choices don’t stop there. When using 3 classes there are multiple options for defining up, down or lateral markets when using daily returns. A possibility is to use positive, negative or zero returns for the classes. Another could be to use certain limits to define the classes: >1%, <-1% and -1%≤r≤1% (or asymmetric limits).
In any case, defining every daily movement isn’t that useful. Market regimes are more long-term, persistent states that can be utilised for making investments or trading decisions. Today’s market regime does not depend solely on what happens today but also on the days preceding and succeeding it.
We could use the direction and magnitude of centred averages over various days to define the market. Although measuring a static duration before and after may be too simplistic. The regime depends also on the magnitude and sequence of the movements. One idea is to include the price evolution.
We use an algorithm that separates the series according to the magnitude of its movements in one direction or another. We could choose smaller or larger limits to regulate the sensitivity and achieve more or less trend changes:
An issue here is choosing an appropriate magnitude. The desired regimes depend on circumstance and point of view. A clear up and then down trend to one person may be a lateral market for another.
What about a discrete scale of values to define the market regime? We could combine separations and create multiple classes with labels between -1 and 1. Extreme downtrend (dark red) represented by class values -1 and +1 for extreme up trend (dark green). The closer to zero the less clear the direction of the price series (lighter tones to yellow).
The actual class values could look something like this:
Let’s go back to the second case of “unknown” regime definitions. For example, to describe the Euro Stoxx 50 Price Index evolution we could assign each day to a regime as follows:
How is this done? We start by calculating a variety of characteristics calculated over the price series. We apply PCA to reduce dimensions, removing repetition and keeping the most relevant information. Then we apply k-means clustering to the principal components of the characteristics.
In this case, we assign 6 clusters (regimes). Choosing the optimal number of clusters is an issue in itself with a variety of techniques. Plotting the first against the second component, the resulting 6 clusters seem coherent:
Individual days are grouped by their proximity in the characteristic space. Hence, days in the same regime have similar behaviour. Although depending on the characteristics used, this may not imply that the direction of the price series is similar.
To check this we have accumulated the returns of the different market regimes (ignoring their dates and joining different time periods together). We are interested to see if each regime has an identity.
There is some indication of similar movements within regimes but it is not completely clear. Regimes 1, 3 and 4 appear upward moving. Regimes 0 and 2 are slightly more downward moving. Regime 5 has an upward bias towards the end but it is more lateral than the rest.
Aside from which type of past description of the price series is more useful, which do you think is easier to predict? We could try assigning new days to regimes with a clear definition (e.g. up versus down movements) or to market regimes with no concrete meaning. For now, I’ll let you decide…