Gradient Descent is one of the most important algorithms in Machine Learning. It is an iterative method to find the minimum of a given function. That is the reason why today we will go through the intuition behind it and cover a practical application.
Concepts to keep in mind
Let’s start. For any kind of model that we want to create we will have different parameters to fit. Different sets of parameters will create different errors for our model. So, how can we measure that error? The answer is through a cost function which quantifies the model error for each selected parameter set.
As you might have guessed at this point, we are talking about optimisation, but before continuing we should compare three different kinds of cost functions:
Convex functions: they look like a bowl, and the most interesting fact for us is that the minimum point of the functions corresponds to the global minimum.
Concave functions: they look like a hill, and in this case, the maximum point of the functions ensures us that we are in the global maximum. So… if we multiply that function by -1, will we have a convex one? The answer is yes! So, again, the problem is to find the global minimum (to summarise, the inverse of a concave function is a cost function to minimise).
Non-convex functions: there are plenty of cost functions that are not convex (or concave) functions. Being at a minimum in these functions does not ensure us to be in a global minimum. We can have different points that make it more difficult to achieve the minimums of the function as local maximums, inflection points… or as you are imagining at this point: any point with a 0 slope.
Given their variety, cost functions in Machine Learning models need an efficient solution to reach the minimums for non-convex cost functions. But, as we know, Machine Learning models usually have more than 2 dimensions, so we will work with hyperplanes or surfaces.
This is what Gradient Descent is about: Optimisation!
To understand the intuition behind the algorithm, let’s do a survival experiment. Imagine that you are close to a volcano crater that is about to erupt. You have little time to save your life but on top of that, suppose you are blind and you don’t have any kind of tool to know where you are. Your survival sense allows you to think fast and you realise that you can evaluate with your feet the slope of the hill (given that you start outside the crater). With that sense, you will always take the opposite direction to the highest slope until you reach the lowest point possible to save yourself from the lava.
Translating this into mathematical equations, we should evaluate our position through partial derivatives of the surface (the error in our model). Once we have calculated this vector, we can know the gradient (direction in which the slope increases) so we should take the -1*Gradient
In the code below, we implement a self-developed Batch Gradient Descent to solve a simple regression problem, and we test the results comparing them to Linear Regression and SGDescent algorithms of the Sklearn package:
As the results show us, we can appreciate how we can get a nice approximation to the analytic solution of the linear regression. Moreover, we have the flexibility of being able to solve more complex problems through the use of partial derivatives to minimise the error of our model or our cost functions.
Are you using this method to optimise your investment strategies? Let’s us know in the comments! See you next time!