Machine Learning

Gradient Descent

Descend to the lowest point in the loss function such that the parameters of a model is optimized in a way that best fits the data.

How It Works

  1. Calculate the gradient of the loss function i.e. take the derivative with respect to each of the parameters.

  2. Select random values for the parameters.

  3. Plug the parameter values into the derivatives i.e. the gradient.

  4. Calculate the step size.

  1. Calculate the new parameters.
  1. Repeat until step size is very small (close to zero) or when the maximum number of steps has been reached.

Beware

When there are millions of data points, gradient descent can take a long time which is where stochastic gradient descent comes in.

References

StatQuest Gradient Descent, Step-by-Step