Phi Silica | AI Model

Overview

Phi Silica is Microsoft’s on-device small language model (~3.3B params) built to run locally on Copilot+ PC NPUs. It’s pre-tuned, 4-bit–quantized, streams fast (≈230 ms first token, up to ~20 tok/s), and currently offers a 2K context (with 4K coming). Available to developers via the Windows App SDK’s Phi Silica APIs.

Description

Microsoft’s Phi Silica is a sister line to the Phi family, designed specifically for on-device use on Copilot+ PCs (Snapdragon X-series NPUs). It ships as a pre-tuned SLM that apps call locally through new Windows App SDK APIs, enabling chat, math, code help, and reasoning without cloud calls. Access uses a Limited Access Feature flow for developers.
Under the hood, Phi Silica targets NPU efficiency: 4-bit weight quantization, low idle memory, and NPU-based context processing. Microsoft reports ~230 ms time-to-first-token for short prompts, throughput up to ~20 tokens/sec, a 2K context window (with 4K “coming shortly”), and significantly reduced power draw on the NPU versus CPU.
Model scale is in the “small but capable” range—about 3.3B parameters—optimized for Windows distribution and real-time interactivity on device. Media coverage and Microsoft materials position it as the SLM foundation for Copilot+ features on PCs.
As of April 2025, Microsoft has demonstrated vision-based multimodal extensions for Phi Silica (image+text), broadening local use cases like document and UI understanding entirely on device.

About Microsoft

No company description available.

Location: Redmond, WA, US

Website: news.microsoft.com

View Company Profile

Related Models

Last updated: October 15, 2025

Overview

Description

About Microsoft

Related Models

OPT

Cosmos Nemotron VLM

Ling-1T

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool