Baidu
Follow
Visit website
Models
-
PP-OCRv6 is PaddlePaddle/Baidu’s lightweight universal OCR system for multilingual text detection and recognition across edge, mobile, desktop, and server deployments.NewMultimodalReleased 12d ago
-
ERNIE-5.1 is Baidu’s new preview flagship language model, built for stronger general text capability, better cost efficiency, and improved creative performance. It is positioned as the top-ranked Chinese model on the LMArena Text leaderboard and as a high-efficiency upgrade over ERNIE-5.0 with much lower training cost at its scale.NewTextReleased 1mo ago
-
Qianfan-OCR is a 4B end-to-end document intelligence vision-language model that performs direct image-to-Markdown conversion and supports prompt-driven document tasks like table extraction, chart understanding, document QA, and key information extraction.MultimodalReleased 3mo ago
-
Production ready OCR and document AI toolkit that turns images and PDFs into structured data, with multilingual OCR, layout analysis and VLM based document parsing.TextReleased 4mo ago
-
ImageReleased 7mo ago
-
A multimodal MoE model that “looks, reads, and reasons” across images, video, and text. It adds tool use and a Thinking with Images mode, supports long context, and activates about 3B parameters per token for flagship-level VLM quality at practical latency.TextReleased 7mo ago
-
ERNIE 5 is Baidu’s next-gen general model for reasoning, coding, and multimodal understanding. It supports long context, tool and function calling, reliable JSON, streaming, and enterprise guardrails, making it a strong default for RAG, agents, and document or chart analysis.TextReleased 7mo ago
-
PaddleOCR-VL is a vision-language model built around PaddleOCR that reads documents, forms, tables, charts, and screenshots. It combines strong OCR with reasoning over layout and content, then answers in text or structured JSON for multimodal RAG and automation.MultimodalReleased 8mo ago
-
Qianfan-VL-3B is Baidu’s lightweight VLM for cost-sensitive, real-time multimodal apps. It processes images plus text and returns grounded answers with basic OCR and layout understanding, long context, tool/function calling, and JSON outputs—optimized for speed and efficiency.TextReleased 9mo ago
-
Qianfan-VL-8B is Baidu’s mid-size vision-language model. It reads images (docs, charts, screenshots, photos) alongside text and returns grounded answers with solid OCR, layout understanding, multi-image reasoning, long context, tool/function calling, and reliable JSON outputs—balanced for quality and latency.TextReleased 9mo ago
-
Qianfan-VL 70B is Baidu’s large vision-language model on the Qianfan platform. It ingests images (docs, charts, screenshots, photos) with text and produces grounded answers, featuring strong OCR and layout understanding, long context, tool/function calling, streaming, and reliable JSON outputs for multimodal RAG and enterprise apps.TextReleased 9mo ago
-
ERNIE 4.5 Turbo is Baidu’s high-throughput, cost-optimized variant of ERNIE 4.5. It delivers strong reasoning and coding with long-context options, tool/function calling, JSON outputs, and streaming—ready for production via ERNIE Bot and the Qianfan API.TextReleased 9mo ago
