Tongyi DeepResearch
Understand Tongyi DeepResearch Thoroughly Through 3 Core Questions

Honghao Wang

16 Oct 2025 — 4 min read
![image](https://blog.aitoearn.ai/content/images/2025/10/img_001-205.jpg)

# Tongyi DeepResearch: A Detailed Breakdown

## 1. Introduction

Alibaba's **Tongyi Lab** has released an **Agent** project named **Tongyi DeepResearch** — without a press conference or grand announcements. Yet, on the day it appeared on GitHub, it **shot to the top of the daily trending list**.  

While the release fueled widespread curiosity, the documentation contained intimidating terms — *post-training*, *tool calling*, *reinforcement learning* — that left some readers puzzled.

Since I’ve been exploring similar research directions, I’ll act as a guide to unpack **three core questions**:

1. **What does DeepResearch include, and how can you use it?**
2. **How was the DeepResearch model trained?**
3. **Which design aspects are worth referencing in your own work?**

---

### Intended Audience & Reading Guide

- **AI Application Developers** → Focus on Chapter 2 (modules & architecture).
- **AI Researchers** → Chapter 3 (data construction, training strategies, fine details).
- **Tech Managers / Architects** → Chapter 4 (consensus and debates on design).

---

**Key Links**  
- **Project GitHub:** [https://github.com/Alibaba-NLP/DeepResearch](https://github.com/Alibaba-NLP/DeepResearch)  
- **Open-source model:** [Tongyi-DeepResearch-30B-A3B](https://ModelScope.cn/models/iic/Tongyi-DeepResearch-30B-A3B/)

> *Disclaimer:* Content may contain inaccuracies due to limited knowledge. Corrections welcome. Differences between the [technical report (2)] and [ArXiv paper (12)] will be noted without resolving.

---

## 2. What’s in DeepResearch and How to Use It?

Released on **2025‑09‑16**, Tongyi DeepResearch is an **open-source Web Agent model** achieving **SOTA performance**:

- **Humanity’s Last Exam (HLE):** 32.9  
- **BrowseComp:** 45.3  
- **xbench‑DeepSearch:** 75.0

These scores **surpass proprietary models** like *OpenAI’s Deep Research*.

![image](https://blog.aitoearn.ai/content/images/2025/10/img_002-196.jpg)

---

### Three Key Attributes

**1. Open-source**  
Against a backdrop of strong closed-source offerings (GPT‑5, Claude, Grok), quality open-source projects remain crucial for transparent AI development, especially in Agent research.

**2. High-performance**  
Every released model self-claims “high performance”, but survivorship bias lurks behind the scenes. DeepResearch’s benchmark results are genuinely noteworthy.

**3. Web Agent**  
A Web Agent can proactively request and process information from the internet. Related concepts include *search-augmented LLMs*, *Deep Search* tools (Perplexity AI), and *AI search features*.  

Categorization depends on both:
- Product scope (feature sets)
- Core technical ability (web interaction)

![image](https://blog.aitoearn.ai/content/images/2025/10/img_003-182.jpg)

---

### Project Components

- **[Model]** DeepResearch‑30B‑A3B  
  - **30B MoE architecture**, 3B active parameters per run → deployable even on a MacBook Pro M2.
- **[Inference Code]**  
  - Built with `Qwen3MoeForCausalLM` → same deployment as Qwen3 models via `Vllm` or `ollama`.
- **[Evaluation Code]**  
  - ByteDance’s `Sandbox_fusion` for reproducing experimental results. Includes an **AgentUse** mode: *ReAct*.
- **[Agent Inference]**  
  - No direct Agent code; compatible with Qwen‑Agent. IterResearch mode not included in public repo.

![image](https://blog.aitoearn.ai/content/images/2025/10/img_004-173.jpg)

---

#### Recommended Usage Combos
| Interest Area            | Components to Use |
|--------------------------|-------------------|
| **Model research**       | 1 + 2 + 3         |
| **Agent research**       | 3                 |
| **App experience/dev**   | 4                 |
| **Cosmic curiosity**     | Email us with a 🐶 |

---

## 3. How Was the DeepResearch Model Produced?

Like many reasoning models, development follows **three stages**:

1. **Incremental training** + data synthesis (Two CPT phases)
2. **Supervised Fine-Tuning (SFT)** + data synthesis (*cold start*)
3. **Reinforcement Learning (RL)**

![image](https://blog.aitoearn.ai/content/images/2025/10/img_005-162.jpg)

---

### Why These Three Stages?

**Consensus trends:**
- High-quality data creation remains **worth every effort**.
- Incremental training is broadly accepted — though implementations vary.
- Cold start + RL is increasingly seen as optimal for Agents.

---

### Stage 1: Incremental Training & Data Synthesis

**Core Question:** *What exactly needs enhancement in Agent CPT?*  
Possibilities include:
- **Knowledge enhancement**
- **Reasoning enhancement**

CPT data here falls into:
- **General high-quality datasets** (web crawls, knowledge graphs, private sets)
- **Synthetic trajectory data** → key for learning planning, reasoning, decision-making.  

**Challenges:**  
- **Low acceptance rate** for usable samples  
- **Slow expansion**, especially during tool calls, even with multithreading.

![image](https://blog.aitoearn.ai/content/images/2025/10/img_006-148.jpg)

---

#### IterResearch Paradigm
Two core elements:
1. **Core report** for reasoning
2. **Workspace** for tool results

**Advantages over ReAct:**
- Avoids rapid context consumption.
- Reduces *context pollution* in long-horizon usage.

> Public repo only includes *ReAct* implementations.

---

### Stage 2: Supervised Fine-Tuning (Cold Start)

Cold start serves to bootstrap RL effectiveness.  
**WebFrontier** synthesis approach:
1. **Seed data** from cleaned web pages/docs
2. **Iterative complexity upgrade** using Agents + toolset
3. **Quality control** — filter too-easy or unsolvable items

Binding training data tightly to toolsets ensures relevance but risks **scenario overfitting**.

---

### Stage 3: Reinforcement Learning

Algorithm: **GSPO** (*Group Sequence Policy Optimization*)  
Adjustments:
- Split trajectories into individual rounds → more training samples (**G × T**).
- **Group-level advantage normalization** for balanced reasoning across process stages.
- **Minimal-loss downsampling** for consistent batch sizes.
- **Dual-environment strategy** (simulation + real).

---

## 4. Which Designs Should You Reference?

### Consensus:
- High-quality data creation → critical.
- Closed toolsets aid stable pipeline development.
- Multi-stage training (CPT + SFT + RL) effective.

### Non-Consensus:
- Whether to use specialized models vs general LLMs for Agents.
- Necessity of CPT; viable skips focus on SFT + RL.
- Transferability of web-specific data methods across non-web Agents.

---

### Example Considerations:
- **Specialized models** excel at tool invocation long-range reasoning.
- **General models** offer flexibility; performance gap depends on domain.
- **Trajectory diversity** is expensive but necessary for robust Agents.
- Toolset **extensibility** should be factored into data synthesis plans.

---

## Final Thoughts

**Tongyi DeepResearch** is more than a model — it’s a well-documented case study in Agent training optimization. It provides practical insights for tackling debated aspects in the field.

Reading it feels like opening the fridge after a hot day to find a cold beer and marinated beef: **absolute satisfaction**.  

---

## References

For full reference list, see:
- [GitHub Repo](https://github.com/Alibaba-NLP/DeepResearch)
- [ModelScope Page](https://ModelScope.cn/models/iic/Tongyi-DeepResearch-30B-A3B)
- [ArXiv Papers](https://arxiv.org/pdf/2509.13309)

Additional related works: *WebSailor*, *WebExplorer*, *GSPO* methodology, and toolset data (*WebShaper*).

---
Understand Tongyi DeepResearch Thoroughly Through 3 Core Questions

Honghao Wang

Read more

People Stop Buying Porsches, Decade-Long CEO Steps Down

The Cutest New Land Cruiser FJ Launch — Could This Be Equation Leopard’s Long-Lost Brother in Japan?

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. ChatGPT Atlas 发布，AI 浏览器大乱斗...

Express Update | OpenAI’s Japanese Rival Sakana in Talks for Funding at $2.5 Billion Valuation