In this post, we’ll take a brief look at biclustering algorithms. They reveal easily interpretable patterns in our data and give us more information about the links between observations and features than simpler clustering algorithms usually do.

We’ve already reviewed a number of non-supervised clustering algorithms that group subsets of observations that are similar to each other and differ in some aspect from the rest. We either apply different models to each group based on these differences or just use this information to better comprehend our data structure.

For example, we can apply the *k-means* algorithm to the observations in the matrix on the left, and color each of them by their labels. We could alternatively want to cluster our features, identifying sets of variables that share a lot of information or show the same behavior throughout all the observations.

This corresponds to find a partition of the columns (features) of *k* clusters where columns in the same cluster are similar and can be easily told apart from the others.

$$$$

Once we carry out both of these two steps, we obtain disjoint sets of observations that *look alike* and also sets of features that share information. But are these partitions informative enough? Can we improve these partitions by taking into account the link between observations and features?

*Biclustering* algorithms –also called *block clustering*, *co-clustering*, *two-way clustering* or *two-mode clustering*– cluster observations and features simultaneously. In general, if we have the observations \(A=a_1, a_2,…, a_m\) and features \(B={b_1,b_2,…,b_n}\), the aim of these algorithms is to select a partition of A and a partition of B that **reveal patterns in our dataset matrix**.

As we see in the figure above, a convenient permutation of rows and columns of the synthetically generated data matrix makes it look like a checkerboard where each delimited submatrix contains similar values. The types of patterns we might want to look for are varied: from unusually high values –darker tones-, to low-variance submatrices –homogeneously colored areas- or observations following similar motifs.

In the field of genetics, these algorithms are used to identify patterns in gene expression data matrices where each value shows the expression / inhibition of a specific gene –observation- depending on different conditions –features-. This way, experts try to extract knowledge about the relationships between genes and their functions in some biological processes.

Similarly, we could be interested in the relationships between currency pairs’ spots and when they were stronger. We gathered 62 cross rate spots –features- and computed their monthly returns –observations- from January 2010 to December 2017. As expected, we get a noisy matrix with values around 0, with no clear structure.

Using the row and column labels given by this spectral biclustering algorithm from *sklearn*, we conveniently rearrange them. Now we can see the *hidden patterns* in the original matrix, which directly relate to the features we’re handling.

In the next graph, we highlight the series and months corresponding to the golden boxes of the previous matrix. Unsurprisingly, the algorithm has clustered cross rates between US Dollar and some European countries’ currencies. We could now focus not only on the relationships we detected between pairs, but also when these were more noticeable.

This is just an example of how biclustering can help us arrange a 2D matrix so that its observations and features give us information about the underlying structure in our data. Of course, not every dataset is presented as a neat collection of rows and columns. Applying this to tensors or higher-dimensional data structures could be an interesting way to extend these techniques. If you feel like using biclustering with your data, please let us know if you found it useful.