Machine Learning

Clustering Forex Market



No Comments

The Forex Market is the global marketplace where currencies are bought and sold. It is the largest and most liquid financial market in the world, with trillions of dollars traded daily.

A currency pair is an asset composed of two currencies traded on the financial market. Its price represents the relative value of one currency against the other. For example, in the case of the EUR/USD pair, a price of 1.02 means that 1 euro is equivalent to 1.02 US dollars. This relative price varies over time, depending on the supply and demand for the two currencies involved in the cross.


In the following case study we will use a dataset of currency indexes. These series indicate the evolution over time of a given currency, eliminating the reference to another particular currency. Specifically, we will use returns calculated on a weekly frequency, from Wednesday to Wednesday, and corresponding to the period from 7/April/2010 to 21/September/2022. This time interval results in 651 weeks. 

The 35 currencies included are listed in the following table with their ISO 4217 code: 

AEDUnited Arab Emirates dirham CHFSwiss francCZKCzech koruna 
HKDHong Kong Dollar INRIndian RupeeMYRMalaysian Ringgit 
PLNPolish ZlotySEKSwedish KronaTWDNew Taiwanese Dollar
AUDAustralian DollarCLPChilean PesoDKKDanish krone 
HUFHungarian ForintJPYJapanese YenNOKNorwegian krone
RONRomanian LeuSGDSingapore DollarUSDUS dollar
BRLBrazilian RealCNYChinese YuanEUREuro
IDRIndonesian RupiahKRWKorean WonNZDNew Zealand dollar
RUBRussian RubleTHBThai BahtZARSouth African Rand
CADCanadian dollarCOPColombian PesoGBPPound sterling
ILSIsraeli New SechelMXNMexican PesoPHPPhilippine peso
SARSaudi Arabian RialTRYTurkish Lira


In this post we will perform a currency clustering analysis. Organizing data into homogeneous groups is one of the most fundamental ways of understanding and learning. Clustering analysis is the formal study of methods and algorithms for grouping objects based on the similarity of their characteristics. It is part of unsupervised Machine Learning, meaning that they do not use previously known labels that must be estimated. It aims to find structure in the data and is therefore exploratory in nature.

Returning to the set of currencies, the goal is to understand which ones show a similar evolution over time. Many countries have close relationships that end up resulting in currencies with very similar evolutions. It also happens in particular cases that the value of one currency is artificially linked to the value of another due to economic interests. In this study we want to determine which groups of currencies exist. 


First, we perform a brief description of the data. We construct the boxplot to check the main properties of the return’s series. This plot shows only the 5% and 95% data percentiles, as there are significant outliers that, if included, would hinder the visualization. The currencies have been sorted by standard deviation, from smallest to largest.

Returns of currencies

In addition to the presence of outliers, whose existence is due to exceptional situations occurring at some point in time, not to errors, the most remarkable thing is the difference in variability that we find from one index to another. As an example, we show the opposite poles of the available sample:

Example of RUB and SGD

The Singapore dollar and the Russian ruble are the currencies with the lowest and highest dispersion of returns respectively. In the graph above, we see how the orange line, corresponding to SGD, is negligible compared to the strong movements experienced by the RUB.

On the other hand, it is also worth noting that the median of these weekly returns is zero or practically zero for the vast majority of the series. In other words, the returns are positive in about 50% of the cases and negative in the other 50%.

We confirm the existence of outliers (very different median and mean), the 0 centrality of the series (the value of mean and median is not exactly 0, but its magnitude is irrelevant compared to the standard deviation) and the large variability of standard deviations between currencies as shown in the previous graph:

mean, median, standar deviation for currencies


We need to define our criterion of similarity between currencies that will give rise to the distance measure we will use to cluster the series. In our case, we will assess the correlation of weekly returns. Then, correlations close to 1 will be interpreted as minimum distance, while correlations around -1 will be considered as maximum distance. Therefore, the distance we define is proportional to the correlation, but sign reversed. We use the following calculation, which also limits the distances to the [0, 1] interval: 

\( dist = \frac{1 – corr(X, Y)}{2} \)

We present the result of this distance metric calculated for our dataset:

Distance between currencies

We can identify some strong relationships (zero distances) between European currencies and also of the USD against some Asian currencies. Since we do not have any prior idea about the appropriate number of clusters, we opt for the hierarchical clustering algorithm. 

In addition to the distance metric, the linkage method is required. We will use the full linkage, i.e. taking into account the maximum distance between all the currency pairs of cluster i and cluster j. Thus, the criterion will not allow linking clusters without taking into account all the correlations between the currencies included in both clusters.

Given the mentioned parameters, this is the currency dendogram:


Finally, we make a reading of the results. To do so, we set an intermediate distance level of 0.5, we found 5 distinct clusters:

1. The first cluster corresponds to European currencies (almost all of them). The currencies with the highest similarity are EUR and DKK. The Danish krone is arithmetically linked to the EUR, so it is to be expected that its evolution will be similar, since, in fact, it cannot be otherwise. Within this European cluster we observe two sub-clusters, the first corresponds to SEK and NOK (Sweden and Norway) and the second to the rest. 

2. A second cluster that includes CHF and JPY. Both the Swiss franc and the yen are recognized as safe-haven currencies. In times of crisis both values attract capital due to their historical stability, so their value tends to increase in these situations. Although there are some similarities between them, the resemblance is not too great. 

3. The next group is made up of a large number of currencies, most of them of an emerging nature, although there are also more established ones, such as CAD or AUD. The common denominator in this group could be the relationship with the price of commodities or some particular type of commodity, such as metallic commodities (gold, silver, copper, etc.). All currencies with higher standard deviation of returns belong to this group. Except for the AUD – NZD relationship (Australia and New Zealand), the rest are not particularly strong. Within this cluster we find some differentiated sub-groups, such as the Latin American currencies together with South Africa (ZAR), the one formed by Canada, Australia and New Zealand (CAD, AUD and NZD) slightly linked also with South Korea (KRW). Finally, although not with a particularly strong relationship between them, are Turkey and Russia (TRY, RUB). 

4. Cluster number 4 returned by the algorithm is solely composed of the pound sterling, GBP. This is because its evolution has not been sufficiently correlated with any of the other 34 currencies in the dataset. 

5. The last cluster, in which the main currency is the USD, we also find a long list of Asian currencies, almost all of them from the original set. AED, SAR, HKD and CNY are artificially linked to the USD, which is why their evolution is practically identical. In this cluster the most separated currency is that of Israel (ILS) and also the pair formed by India and Indonesia (INR, IDR).


Machine Learning techniques enable the analysis of large data sets in short times, avoiding repetitive and manual tasks, often infeasible, and providing efficiency and speed. The study of similarities between currencies presented above illustrates its practicality. The result of hierarchical clustering is a relationship structure of the Forex Market.

The conclusions drawn could be useful for the definition of a currency portfolio. When defining a basket of currencies in which to invest, it is important to take into account the existing relationships between potential assets, in order to ensure diversification. This information can be a good starting point to decide the number of currencies to include (number of clusters) and guide the selection of particular currencies to invest in (one representative currency per cluster).

Inline Feedbacks
View all comments