ERNIE 4.5 VL 28B A3B Thinking

Overview

A multimodal MoE model that “looks, reads, and reasons” across images, video, and text. It adds tool use and a Thinking with Images mode, supports long context, and activates about 3B parameters per token for flagship-level VLM quality at practical latency.

Description

ERNIE-4.5-VL-28B-A3B-Thinking upgrades Baidu’s ERNIE VLM with stronger visual reasoning, grounding, and STEM problem solving. It trains on large visual-language reasoning corpora, then stabilizes MoE with GSPO and IcePop strategies, plus dynamic difficulty sampling for efficient learning. The model can zoom into images, call tools like image search for long-tail facts, and handle multi-image, chart, and video understanding while keeping answers grounded. It offers 128k context, Apache 2.0 licensing, and production paths via transformers, vLLM, and FastDeploy, with fine-tuning recipes in ERNIEKit. Inference typically runs with about 3B active parameters per token, which helps balance quality and latency in enterprise multimodal assistants.

About Baidu

Baidu is a Chinese multinational technology company specializing in internet-related services, products, and artificial intelligence.

Industry: Internet

Company Size: 10001+

Location: Beijing, CN

Website: https://baidu.com

View Company Profile

Related Models

Last updated: November 13, 2025

Overview

Description

About Baidu

Related Models

DeepSeek V3.1 Terminus

GPT-Neo

LLaMA 2 Chat

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool