Definition
Techniques (quantization, pruning, compilation) to make trained AI models run faster and more efficiently.
Detailed Explanation
Techniques (like quantization, pruning, model compilation) used to make trained AI models run faster and more efficiently during deployment (inference) on target hardware.
Use Cases
Deploying AI models on edge devices with limited resources, reducing latency for real-time AI applications, lowering cloud inference costs, improving energy efficiency.