Overview
Phi Silica is Microsoft’s on-device small language model (~3.3B params) built to run locally on Copilot+ PC NPUs. It’s pre-tuned, 4-bit–quantized, streams fast (≈230 ms first token, up to ~20 tok/s), and currently offers a 2K context (with 4K coming). Available to developers via the Windows App SDK’s Phi Silica APIs.
Description
Under the hood, Phi Silica targets NPU efficiency: 4-bit weight quantization, low idle memory, and NPU-based context processing. Microsoft reports ~230 ms time-to-first-token for short prompts, throughput up to ~20 tokens/sec, a 2K context window (with 4K “coming shortly”), and significantly reduced power draw on the NPU versus CPU.
Model scale is in the “small but capable” range—about 3.3B parameters—optimized for Windows distribution and real-time interactivity on device. Media coverage and Microsoft materials position it as the SLM foundation for Copilot+ features on PCs.
As of April 2025, Microsoft has demonstrated vision-based multimodal extensions for Phi Silica (image+text), broadening local use cases like document and UI understanding entirely on device.
About Microsoft
No company description available.
