Definition
A model compression technique where a smaller model learns to mimic a larger more complex model's behavior. This allows for creating more efficient models while maintaining performance.
Detailed Explanation
Knowledge distillation involves training a smaller student model to replicate the behavior of a larger teacher model. The student learns from both the ground truth labels and the soft probability distributions output by the teacher. This process often transfers not just the correct answers but also the learned relationships and generalizations from the teacher model.
Use Cases
Model deployment on mobile devices Edge computing applications Real-time AI systems Model efficiency optimization