OCR

World’s Top OCR Model Only 0.9B! Baidu Wenxin Derivative Just Sweeps 4 SOTAs

Honghao Wang

17 Oct 2025 — 4 min read

PaddleOCR-VL: Baidu’s Lightweight Multimodal OCR Model Takes Global #1

Baidu has delivered a major surprise in the global AI multimodal race with the release of PaddleOCR-VL — a lightweight, self-developed document parsing model that has immediately set new industry benchmarks.

With just 0.9B parameters, PaddleOCR-VL scored 92.6 on the authoritative OmniDocBench V1.5 leaderboard, claiming #1 globally in overall performance while setting SOTA in all four key capabilities:

Text Recognition
Formula Recognition
Table Understanding
Reading Order

And it’s open source from day one.

Within just 16 hours of release, it shot to the #1 trending position worldwide on Hugging Face.

---

What Makes PaddleOCR-VL Stand Out

Key Achievements:

First place in all four evaluated dimensions on OmniDocBench V1.5
Highly efficient — runs comfortably on a personal computer
Exceptional adaptability — handles complex PDFs and images with messy formatting
Industry-ready — easy integration into existing workflows

Real-world capabilities:

Understands logical document structure
Recognizes mathematical expressions, tables, and multi-column layouts
Maintains accuracy with distorted, noisy, or partially obscured samples

---

The Four Core Capabilities & SOTA Scores

1. Text Recognition — Score: 96.5

Supports 109 languages, including Chinese, English, French, Arabic, and more
Handles handwriting, vertical text, and artistic fonts
Overcomes traditional OCR limitations of “print-only” recognition

Example: Even a crumpled receipt in poor lighting with folded text is handled perfectly:

---

2. Formula Recognition — Score: 91.4

Achieved CDM score: 0.9453
Precise LaTeX generation for complex formulas in papers, textbooks, and tests
Outperformed community favorites like MinerU and MonkeyOCR-pro-3B
Only model scoring above 90 in formula recognition capability

---

3. Table Understanding — Score: 89.8

Accurately parses nested tables and merged cells
Converts unstructured image data into analyzable structured formats

---

4. Reading Order — Error Rate: 0.043 (Lowest)

Reads like a human by reconstructing logical reading flow
Predicts order for titles, body text, images, and captions

---

More Than Just OCR — Broad Competence

Beyond its four core strengths, PaddleOCR-VL excels in:

Chart extraction
Multi-column layout reconstruction
Multi-page structured analysis

---

Under the Hood — Why It’s So Effective

Architectural Innovation

Two-stage design:
PP-DocLayoutV2: Layout analysis, semantic region detection, reading order prediction
PaddleOCR-VL-0.9B: Fine-grained recognition of text, formulas, tables, charts
Reduces hallucinations and misalignment common in multimodal models

Training Strategy

Over 30 million samples covering text, tables, formulas, charts
Data sourced from public datasets, synthetic generation, web sampling, and proprietary Baidu sets
Difficult-case mining ensures robustness

Performance Metrics

1881 tokens/s on a single A100 GPU
Text edit distance: 0.035
Formula CDM: 91.43
Table TEDS: 89.76
Reading order error: 0.043

---

Real-World Testing: Wang Xingxing’s MS Thesis

Tested on Unitree Robotics founder Wang Xingxing’s thesis — a challenging mix of:

Inline and standalone formulas
Charts and illustrations
Complex layouts

PaddleOCR-VL handled:

Page logic reconstruction
Diagram extraction
Mixed content pages

---

Industry Impact

Key insights:

Breaks the myth: “Big models are always better” — shows small, specialized models can lead in accuracy and efficiency
Ideal for document-heavy industries: finance, education, government, research
Integrates well into RAG systems for high-quality, controllable knowledge inputs

OCR has shifted from being a “text recognition tool” to a strategic entry point for AI to understand real-world unstructured data. Precision in OCR directly impacts the reliability of downstream AI tasks in search, Q&A, and analytics.

---

Open Source & Resources

GitHub: https://github.com/PaddlePaddle/PaddleOCR
Technical Report: https://arxiv.org/pdf/2510.14528
Demo: https://aistudio.baidu.com/application/detail/98365

---

PaddleOCR-VL in AI Content Ecosystem

Platforms like AiToEarn enable creators to integrate PaddleOCR-VL into AI-powered workflows:

Multi-platform content publishing (Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X)
Analytics and monetization
Open-source toolkits: AiToEarn GitHub
Model ranking insights: AI模型排名

By combining high-precision OCR parsing with publishing and analytics frameworks, content creators can streamline workflows from document analysis to cross-platform monetization.

---

Bottom Line

PaddleOCR-VL stands out as:

Lightweight yet powerful
Industry adaptable
Global benchmark setter

It is not only advancing the OCR field but also showing how targeted design and smart engineering can challenge — and outperform — much larger multimodal models.