TAAFT
Free mode
100% free
Freemium
Free Trial
Deals
Create tool

ERNIE 4.5 VL 28B A3B Thinking

By Baidu
New Text Gen 7
Released: November 13, 2025

Overview

A multimodal MoE model that “looks, reads, and reasons” across images, video, and text. It adds tool use and a Thinking with Images mode, supports long context, and activates about 3B parameters per token for flagship-level VLM quality at practical latency.

Description

ERNIE-4.5-VL-28B-A3B-Thinking upgrades Baidu’s ERNIE VLM with stronger visual reasoning, grounding, and STEM problem solving. It trains on large visual-language reasoning corpora, then stabilizes MoE with GSPO and IcePop strategies, plus dynamic difficulty sampling for efficient learning. The model can zoom into images, call tools like image search for long-tail facts, and handle multi-image, chart, and video understanding while keeping answers grounded. It offers 128k context, Apache 2.0 licensing, and production paths via transformers, vLLM, and FastDeploy, with fine-tuning recipes in ERNIEKit. Inference typically runs with about 3B active parameters per token, which helps balance quality and latency in enterprise multimodal assistants.

About Baidu

Baidu is a Chinese multinational technology company specializing in internet-related services, products, and artificial intelligence.

Industry: Internet
Company Size: 10001+
Location: Beijing, CN
View Company Profile

Related Models

Last updated: November 13, 2025