Only 3B Active Parameters, Stronger Multimodal Understanding and Reasoning — Baidu ERNIE-4.5-VL-28B-A3B-Thinking Officially Open-Sourced
PaddlePaddle — ERNIE-4.5-VL-28B-A3B-Thinking Release
Date: November 11, 2025
Location: Zhejiang

---
Overview
Baidu has officially open-sourced its new ERNIE-4.5-VL-28B-A3B-Thinking multimodal deep-thinking model — a leading performer in document & chart understanding, cross-disciplinary reasoning, general visual reasoning, and cross-modal problem-solving.
With only 3B activated parameters, it delivers capabilities comparable to top-tier large language models.
This upgraded model builds upon ERNIE-4.5-VL-28B-A3B, introducing enhanced Image Thinking capabilities, spatial localization, and tool integration — opening richer possibilities for multimodal reasoning and interactive applications.
---

Model Access and Resources
License: Apache 2.0 — Commercial use allowed.
Resources Available:
- Pre-trained weights
- Inference code
- Project resources
- Out-of-the-box support in FastDeploy, vLLM, and Transformers
Links:
- GitHub:
- PaddlePaddle/ERNIE
- PaddlePaddle/FastDeploy
- Model: ModelScope
- Community: PaddlePaddle Star River
- Tech Blog: ERNIE 4.5 Blog
---
01 — Core Highlights
Built on ERNIE-4.5-VL-28B-A3B, the Thinking variant achieves a major leap in multimodal learning through:
- Mid-Training Improvements
- Massive high-quality vision–language data
- Enhanced representation and cross-modal semantic alignment
- Superior visual–text reasoning performance
- Advanced Reinforcement Learning
- Large-scale multimodal RL with GSPO and IcePop strategies
- Stabilized MoE-based RL training
- Dynamic difficulty sampling for training efficiency
- Enhanced Localization
- Improved instruction adherence
- Easier activation of visual positioning functions when required
- New “Image Thinking” Feature
- Tool-driven zoom in/out
- Image search and manipulation
- Better interactive, environment-aware AI experience
---
Applications in Content Creation
ERNIE models can integrate seamlessly with open-source AI monetization platforms like AiToEarn — enabling creators to:
- Generate AI content
- Publish across global platforms (Douyin, Kwai, Bilibili, Instagram, YouTube, X, and more)
- Monetize creativity efficiently
Open-source repo | Documentation
---


Small Model, Big Power
Despite its lightweight 3B activation, ERNIE-4.5-VL-28B-A3B-Thinking rivals heavyweight industry models, delivering near state-of-the-art visual reasoning capabilities.
---
Capabilities Demonstration
Visual Reasoning
Exceptional multi-step reasoning, chart analysis, and causal inference in complex visual tasks.
Example — Complex Chart Interpretation:


---
Subject-Specific Computation
Robust visual reasoning lets the model solve photographed problems across academic domains.
Example — Physics Problem (Electrical Resistance):


---
Visual Grounding
Accurate localization with flexible commands boosts efficiency in industrial applications.
Example — Find People Wearing Suits & Top Hats:



---
Image Thinking
Human-like perception for zooming and detail extraction from visuals.
Example — Zoom for Detailed Identification:


---
Tool Utilization
Instant tool invocation for image search and identification of long-tail knowledge.
Example — Discovering Trending IPs:

---
Impact for Creators
Lightweight LMMs like ERNIE-4.5-VL-28B-A3B-Thinking enable high reasoning accuracy and efficiency — ideal for AI-powered content generation and cross-platform publishing via tools like AiToEarn (analytics + model ranking: AI模型排名).

---
Video Understanding
Strengths:
- Temporal perception
- Event localization
- Accurate change detection across video segments
Example — Commercial Scene Change Detection:


---
Developer Support
To aid adoption, Baidu provides:
- Transformers integration
- vLLM support
- FastDeploy SDK
- ERNIEKit dev suite
Call to Action
- Developers are encouraged to test, deploy, and share feedback
- Expect more technical tutorials and best practices
Access Model: Read Original
Social Access: Open in WeChat
---
💡 Tip: For monetizing multi-platform AI content and tracking performance metrics, explore AiToEarn — supporting Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X.
Docs: AiToEarn文档 | GitHub: 开源地址