Definition
A compromise between batch and stochastic gradient descent that updates parameters using a small random subset of training data.
Detailed Explanation
Mini-batch gradient descent combines the efficiency of SGD with the stability of batch gradient descent. It processes small batches of examples (typically 32-256) at each iteration, providing a better estimate of the gradient than SGD while maintaining computational efficiency.
Use Cases
1. Deep learning training 2. Large dataset processing 3. Distributed learning systems 4. Neural network optimization
