Only 3B Active Parameters, Stronger Multimodal Understanding and Reasoning — Baidu ERNIE-4.5-VL-28B-A3B-Thinking Officially Open-Sourced

Only 3B Active Parameters, Stronger Multimodal Understanding and Reasoning — Baidu ERNIE-4.5-VL-28B-A3B-Thinking Officially Open-Sourced

PaddlePaddle — ERNIE-4.5-VL-28B-A3B-Thinking Release

Date: November 11, 2025

Location: Zhejiang

image

---

Overview

Baidu has officially open-sourced its new ERNIE-4.5-VL-28B-A3B-Thinking multimodal deep-thinking model — a leading performer in document & chart understanding, cross-disciplinary reasoning, general visual reasoning, and cross-modal problem-solving.

With only 3B activated parameters, it delivers capabilities comparable to top-tier large language models.

This upgraded model builds upon ERNIE-4.5-VL-28B-A3B, introducing enhanced Image Thinking capabilities, spatial localization, and tool integration — opening richer possibilities for multimodal reasoning and interactive applications.

---

image

Model Access and Resources

License: Apache 2.0 — Commercial use allowed.

Resources Available:

  • Pre-trained weights
  • Inference code
  • Project resources
  • Out-of-the-box support in FastDeploy, vLLM, and Transformers

Links:

---

01 — Core Highlights

Built on ERNIE-4.5-VL-28B-A3B, the Thinking variant achieves a major leap in multimodal learning through:

  • Mid-Training Improvements
  • Massive high-quality vision–language data
  • Enhanced representation and cross-modal semantic alignment
  • Superior visual–text reasoning performance
  • Advanced Reinforcement Learning
  • Large-scale multimodal RL with GSPO and IcePop strategies
  • Stabilized MoE-based RL training
  • Dynamic difficulty sampling for training efficiency
  • Enhanced Localization
  • Improved instruction adherence
  • Easier activation of visual positioning functions when required
  • New “Image Thinking” Feature
  • Tool-driven zoom in/out
  • Image search and manipulation
  • Better interactive, environment-aware AI experience

---

Applications in Content Creation

ERNIE models can integrate seamlessly with open-source AI monetization platforms like AiToEarn — enabling creators to:

  • Generate AI content
  • Publish across global platforms (Douyin, Kwai, Bilibili, Instagram, YouTube, X, and more)
  • Monetize creativity efficiently

Open-source repo | Documentation

---

image
image

Small Model, Big Power

Despite its lightweight 3B activation, ERNIE-4.5-VL-28B-A3B-Thinking rivals heavyweight industry models, delivering near state-of-the-art visual reasoning capabilities.

---

Capabilities Demonstration

Visual Reasoning

Exceptional multi-step reasoning, chart analysis, and causal inference in complex visual tasks.

Example — Complex Chart Interpretation:

image
image

---

Subject-Specific Computation

Robust visual reasoning lets the model solve photographed problems across academic domains.

Example — Physics Problem (Electrical Resistance):

image
image

---

Visual Grounding

Accurate localization with flexible commands boosts efficiency in industrial applications.

Example — Find People Wearing Suits & Top Hats:

image
image
image

---

Image Thinking

Human-like perception for zooming and detail extraction from visuals.

Example — Zoom for Detailed Identification:

image
image

---

Tool Utilization

Instant tool invocation for image search and identification of long-tail knowledge.

Example — Discovering Trending IPs:

image

---

Impact for Creators

Lightweight LMMs like ERNIE-4.5-VL-28B-A3B-Thinking enable high reasoning accuracy and efficiency — ideal for AI-powered content generation and cross-platform publishing via tools like AiToEarn (analytics + model ranking: AI模型排名).

image

---

Video Understanding

Strengths:

  • Temporal perception
  • Event localization
  • Accurate change detection across video segments

Example — Commercial Scene Change Detection:

image
image

---

Developer Support

To aid adoption, Baidu provides:

  • Transformers integration
  • vLLM support
  • FastDeploy SDK
  • ERNIEKit dev suite

Call to Action

  • Developers are encouraged to test, deploy, and share feedback
  • Expect more technical tutorials and best practices

Access Model: Read Original

Social Access: Open in WeChat

---

💡 Tip: For monetizing multi-platform AI content and tracking performance metrics, explore AiToEarn — supporting Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X.

Docs: AiToEarn文档 | GitHub: 开源地址

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.