Unlock the Power of Quantitative Strategies: Explore Our Cutting-Edge Website Today!

# A new way to train neural networks: the Forward-Forward algorithm

### Alejandro Pérez

#### 27/07/2023

Learn about the new Forward-Forward algorithm to train neural networks without backpropagation.

Modern deep learning models and tools own most of their success to the backpropagation algorithm. Although many research has been done around this technique, some researchers have worked on new methods to train neural networks. The forward-forward algorithm (Hinton, G. (2022)) is one of the latest ways of training neural networks. In this post, we are going to explain how it works.

## 1. Computational Graphs

We’ve already talked in this blog how modern deep learning frameworks work. To summarize: neural networks are build by tracking all operations that take place in a directed graph where each node represents a particular operation.

A multiply operation, $$Mul(A, B) = A \cdot B$$, looks like this:

And a more complex function, $$Sum(Mul(A, B), C) = Mul(A, B) + C$$, can be represented as it follows:

By recording the operations in a graph we ease the computation of derivatives, since we only need to care about implementing the derivative of each function and then applying the chain rule to achieve backpropagation (Olah (2015)).

## 2. Backpropagation

The backpropagation algorithm allows to backpropagate the error (loss) to previous operations. Multiplying the derivatives in reverse order gives us the gradient, thanks to the chain rule, we need to update our parameters using the gradient descent algorithm or any other variant.

You have probably noticed that backpropagation needs a forward computation step of the model in order to generate to corresponding derivatives; this may be problematic, since we will always need to make an inference step recording the used functions. Inserting a black box in the forward pass will render out the model unusable under the backpropagation framework.

Furthermore, the brain doesn’t seem to perform 2 steps to optimize itself; as Hinton, G. (2022) states: “There is no convincing evidence the cortex explicitly propagates error derivatives or stores neural activities for use in a subsequent backward pass”. The backpropagation algorithm does not seem to be a good computational model when it comes to emulate how the brain cortex learns.

## 3. Forward-Forward

The Forward-Forward or FF (Hinton, G. (2022) ) algorithm is a new method to train neural networks. Instead of making an inference step and then an optimization step it works by performing two forward steps: one with “positive data” and one with “negative data”.

What “positive” and “negative” data mean in this context? Positive data means data that is correctly labeled and negative data means the opposite, data with the wrong label.

The positive pass aims to adjust weights in order to increase a given measure of goodness in every hidden layer while the negative pass targets the decrease of goodness in every hidden layer.

The learning process consists in making goodness be well above certain threshold for real data and well below for negative data.

Any measure of goodness can be used, but in the paper Hinton talks about the sum of squared neural activities.

## 4. Implementation code

Code for both PyToch (Pezeshki (2023)) and Keras (Mukherjee (2023)) can be found on the References section, thanks to Mohammad Pezeshki and Suvaditya Mukherjee, respectively.

A couple of points that are worth to add:

• The function overlay_y_on_x (it appears on both versions) embeds the label y into the features x, since there is no loss computation using the predictions against the real labels at the end of the graph. For positive data, the label is the True label while for negative data, the used label is wrong.
• Each layer is normalized before computing the loss, both for positive and negative data.

## 5. Conclusions

In this post we’ve talked about a new way of training neural networks: Forward-Forward algorithm. It consists of 2 steps, one with positive data and one with negative data, that try to maximize and minimize, respectively, a given goodness measure.

Much work needs to be done on this research line but; on a short-term, it’s very unlikely this algorithm is going to replace backpropagation in real world applications since it is really slow and it does not scalate very well.