Definition
Techniques used to reduce the size and computational requirements of neural networks while maintaining performance. This enables deployment on resource-constrained devices.
Detailed Explanation
Model compression encompasses various techniques including pruning quantization knowledge distillation and low-rank approximation. These methods reduce model size by removing redundant parameters reducing numerical precision or finding more efficient representations of the model's knowledge. The goal is to maintain model performance while reducing memory usage and computational requirements.
Use Cases
Mobile applications Edge devices IoT deployments Real-time systems