Definition
An optimization algorithm that iteratively adjusts parameters to minimize a cost function by moving in the direction of steepest descent.
Detailed Explanation
Gradient descent works by calculating the derivative of the cost function with respect to each parameter, then updating parameters in the opposite direction of the gradient. The learning rate determines the size of steps taken. Various variants (SGD, mini-batch) trade off between computation speed and update accuracy. The process continues until convergence or a maximum number of iterations is reached.
Use Cases
Training all types of neural networks and machine learning models