Hunyuan OCR Model Open-Sourced with Only 1B Parameters, Achieving Multiple SOTA Capabilities

Hunyuan OCR Model Open-Sourced with Only 1B Parameters, Achieving Multiple SOTA Capabilities
# Tencent HunyuanOCR โ€” Achieving Multiple Industry-Leading SOTA in OCR Benchmarks

**Date:** 2025-11-25 20:24 Zhejiang  

![image](https://blog.aitoearn.ai/content/images/2025/11/img_001-593.jpg)

Tencent Hunyuan has officially released its **self-developed OCR model**, achieving multiple **State-of-the-Art (SOTA)** results in leading OCR application benchmarks.  

---

## ๐Ÿ†• Introduction to HunyuanOCR

On **November 25**, Tencent Hunyuan launched the open-source **HunyuanOCR** model:

- **1B parameters** โ€” small yet powerful  
- **Native multimodal architecture** โ€” built from the ground up for integrated vision and language processing  
- Achieved **SOTA** results across multiple OCR benchmarks

![image](https://blog.aitoearn.ai/content/images/2025/11/img_002-556.jpg)

### Key Advantages

- **Compact size** โ€” easy deployment across devices  
- **End-to-end architecture** โ€” all functions use **single forward inference**  
- **Efficiency over cascaded solutions** โ€” reduces complexity and cost  
- **Native multimodal design** โ€” consistent and optimized results

---

## ๐Ÿ” Architecture

The **HunyuanOCR expert model** consists of:

1. **Native-resolution video encoder**
2. **Adaptive visual adapter**
3. **Lightweight Hunyuan language model**

Unlike traditional cascaded OCR systems, HunyuanOCR:

- Uses **fully end-to-end** training and inference
- Leverages **large-scale, high-quality, application-oriented data**
- Employs **online reinforcement learning**
- Provides robust and stable **end-to-end reasoning**

![image](https://blog.aitoearn.ai/content/images/2025/11/img_003-529.jpg)

---

## ๐Ÿ“Š Benchmark Results

**Core Capabilities:**

- **OmniDocBench** (complex document parsing): **94.1 score** โ€” exceeds Google Gemini3-pro  
- **Text detection & recognition**: Self-built benchmark across 9 major scenarios  
- **OCRBench leaderboard** (models under 3B parameters): **860 score** with only 1B parameters  
- **Low-resource language translation**: Supports 14 languages; **ICDAR2025 winner** in small-model track

**Coverage of 9 Major Scenarios:**
- Documents
- Art text
- Street scenes
- Handwriting
- Advertising
- Receipts
- Screenshots
- Games
- Videos

---

## ๐ŸŒ Language Translation

HunyuanOCR supports **bidirectional translation** between **Chinese/English** and **14 high-frequency languages**:

German, Spanish, Turkish, Italian, Russian, French, Portuguese, Arabic, Thai, Vietnamese, Indonesian, Malay, Japanese, Korean.

![image](https://blog.aitoearn.ai/content/images/2025/11/img_004-502.jpg)

---

## ๐Ÿ’ผ Application Scenarios

### 1. Complex Multilingual Document Parsing
Digitizes scanned or photographed documents:
- Respects **reading order**
- Uses **LaTeX** for formulas
- Uses **HTML** for complex tables

![image](https://blog.aitoearn.ai/content/images/2025/11/img_006-412.jpg)

### 2. Receipt Field Extraction
Extracts key fields (name, address, organization) from documents like IDs or receipts into **standard JSON** format.

![image](https://blog.aitoearn.ai/content/images/2025/11/img_007-382.jpg)

### 3. Video Subtitle Extraction
Automatically detects and extracts **single-language or bilingual subtitles**.

![image](https://blog.aitoearn.ai/content/images/2025/11/img_008-356.jpg)

### 4. Photo Translation
Supports on-the-fly multilingual translation from images.

![image](https://blog.aitoearn.ai/content/images/2025/11/img_009-327.jpg)

---

## ๐Ÿ“ฅ Download & Try

- **Web:** [https://hunyuan.tencent.com/vision/zh?tabIndex=0](https://hunyuan.tencent.com/vision/zh?tabIndex=0)  
- **Mobile:** [https://hunyuan.tencent.com/open_source_mobile?tab=vision&tabIndex=0](https://hunyuan.tencent.com/open_source_mobile?tab=vision&tabIndex=0)  

**Open Source Repositories:**
- GitHub: [https://github.com/Tencent-Hunyuan/HunyuanOCR](https://github.com/Tencent-Hunyuan/HunyuanOCR)  
- ModelScope: [https://modelscope.cn/models/Tencent-Hunyuan/HunyuanOCR](https://modelscope.cn/models/Tencent-Hunyuan/HunyuanOCR)  

**Official Links:**  
[Read Original](https://modelscope.cn/models/Tencent-Hunyuan/HunyuanOCR)  
[Open in WeChat](https://wechat2rss.bestblogs.dev/link-proxy/?k=98d7bb3a&r=1&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%3F__biz%3DMzk3NTc1NTU0Mw%3D%3D%26mid%3D2247502657%26idx%3D1%26sn%3D81f4a7251540a32b57530f34a1db70ef)

---

## ๐Ÿš€ Related Platform โ€” AiToEarn

For creators and developers integrating OCR into their workflow, **[AiToEarn](https://aitoearn.ai/)** offers:

- **Open-source global AI content monetization**  
- Multi-platform publishing (Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X/Twitter)  
- Integrated **AI generation, publishing, analytics, and model ranking**  

Explore AiToEarn:
- Blog: [AiToEarnๅšๅฎข](https://blog.aitoearn.ai)  
- Model ranking: [AIๆจกๅž‹ๆŽ’ๅ](https://rank.aitoearn.ai)  
- GitHub: [https://github.com/yikart/AiToEarn](https://github.com/yikart/AiToEarn)

---

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. ๅ“ˆไฝ›ๅคงๅญฆ R ็ผ–็จ‹่ฏพ็จ‹ไป‹็ป

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. Weโ€™re excited to announce the release of Harvard CS50โ€™s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.