Definition
A variation of gradient descent that updates parameters using a single randomly chosen training example at each iteration.
Detailed Explanation
SGD approximates the true gradient using a single sample, making it faster but noisier than standard gradient descent. It often converges faster and can help escape local minima due to its stochastic nature. The algorithm includes techniques like momentum and adaptive learning rates to improve convergence.
Use Cases
1. Large-scale neural networks 2. Online learning systems 3. Real-time model updates 4. Deep learning optimization