Understand Tongyi DeepResearch Thoroughly Through 3 Core Questions

Understand Tongyi DeepResearch Thoroughly Through 3 Core Questions
![image](https://blog.aitoearn.ai/content/images/2025/10/img_001-205.jpg)

# Tongyi DeepResearch: A Detailed Breakdown

## 1. Introduction

Alibaba's **Tongyi Lab** has released an **Agent** project named **Tongyi DeepResearch** — without a press conference or grand announcements. Yet, on the day it appeared on GitHub, it **shot to the top of the daily trending list**.  

While the release fueled widespread curiosity, the documentation contained intimidating terms — *post-training*, *tool calling*, *reinforcement learning* — that left some readers puzzled.

Since I’ve been exploring similar research directions, I’ll act as a guide to unpack **three core questions**:

1. **What does DeepResearch include, and how can you use it?**
2. **How was the DeepResearch model trained?**
3. **Which design aspects are worth referencing in your own work?**

---

### Intended Audience & Reading Guide

- **AI Application Developers** → Focus on Chapter 2 (modules & architecture).
- **AI Researchers** → Chapter 3 (data construction, training strategies, fine details).
- **Tech Managers / Architects** → Chapter 4 (consensus and debates on design).

---

**Key Links**  
- **Project GitHub:** [https://github.com/Alibaba-NLP/DeepResearch](https://github.com/Alibaba-NLP/DeepResearch)  
- **Open-source model:** [Tongyi-DeepResearch-30B-A3B](https://ModelScope.cn/models/iic/Tongyi-DeepResearch-30B-A3B/)

> *Disclaimer:* Content may contain inaccuracies due to limited knowledge. Corrections welcome. Differences between the [technical report (2)] and [ArXiv paper (12)] will be noted without resolving.

---

## 2. What’s in DeepResearch and How to Use It?

Released on **2025‑09‑16**, Tongyi DeepResearch is an **open-source Web Agent model** achieving **SOTA performance**:

- **Humanity’s Last Exam (HLE):** 32.9  
- **BrowseComp:** 45.3  
- **xbench‑DeepSearch:** 75.0

These scores **surpass proprietary models** like *OpenAI’s Deep Research*.

![image](https://blog.aitoearn.ai/content/images/2025/10/img_002-196.jpg)

---

### Three Key Attributes

**1. Open-source**  
Against a backdrop of strong closed-source offerings (GPT‑5, Claude, Grok), quality open-source projects remain crucial for transparent AI development, especially in Agent research.

**2. High-performance**  
Every released model self-claims “high performance”, but survivorship bias lurks behind the scenes. DeepResearch’s benchmark results are genuinely noteworthy.

**3. Web Agent**  
A Web Agent can proactively request and process information from the internet. Related concepts include *search-augmented LLMs*, *Deep Search* tools (Perplexity AI), and *AI search features*.  

Categorization depends on both:
- Product scope (feature sets)
- Core technical ability (web interaction)

![image](https://blog.aitoearn.ai/content/images/2025/10/img_003-182.jpg)

---

### Project Components

- **[Model]** DeepResearch‑30B‑A3B  
  - **30B MoE architecture**, 3B active parameters per run → deployable even on a MacBook Pro M2.
- **[Inference Code]**  
  - Built with `Qwen3MoeForCausalLM` → same deployment as Qwen3 models via `Vllm` or `ollama`.
- **[Evaluation Code]**  
  - ByteDance’s `Sandbox_fusion` for reproducing experimental results. Includes an **AgentUse** mode: *ReAct*.
- **[Agent Inference]**  
  - No direct Agent code; compatible with Qwen‑Agent. IterResearch mode not included in public repo.

![image](https://blog.aitoearn.ai/content/images/2025/10/img_004-173.jpg)

---

#### Recommended Usage Combos
| Interest Area            | Components to Use |
|--------------------------|-------------------|
| **Model research**       | 1 + 2 + 3         |
| **Agent research**       | 3                 |
| **App experience/dev**   | 4                 |
| **Cosmic curiosity**     | Email us with a 🐶 |

---

## 3. How Was the DeepResearch Model Produced?

Like many reasoning models, development follows **three stages**:

1. **Incremental training** + data synthesis (Two CPT phases)
2. **Supervised Fine-Tuning (SFT)** + data synthesis (*cold start*)
3. **Reinforcement Learning (RL)**

![image](https://blog.aitoearn.ai/content/images/2025/10/img_005-162.jpg)

---

### Why These Three Stages?

**Consensus trends:**
- High-quality data creation remains **worth every effort**.
- Incremental training is broadly accepted — though implementations vary.
- Cold start + RL is increasingly seen as optimal for Agents.

---

### Stage 1: Incremental Training & Data Synthesis

**Core Question:** *What exactly needs enhancement in Agent CPT?*  
Possibilities include:
- **Knowledge enhancement**
- **Reasoning enhancement**

CPT data here falls into:
- **General high-quality datasets** (web crawls, knowledge graphs, private sets)
- **Synthetic trajectory data** → key for learning planning, reasoning, decision-making.  

**Challenges:**  
- **Low acceptance rate** for usable samples  
- **Slow expansion**, especially during tool calls, even with multithreading.

![image](https://blog.aitoearn.ai/content/images/2025/10/img_006-148.jpg)

---

#### IterResearch Paradigm
Two core elements:
1. **Core report** for reasoning
2. **Workspace** for tool results

**Advantages over ReAct:**
- Avoids rapid context consumption.
- Reduces *context pollution* in long-horizon usage.

> Public repo only includes *ReAct* implementations.

---

### Stage 2: Supervised Fine-Tuning (Cold Start)

Cold start serves to bootstrap RL effectiveness.  
**WebFrontier** synthesis approach:
1. **Seed data** from cleaned web pages/docs
2. **Iterative complexity upgrade** using Agents + toolset
3. **Quality control** — filter too-easy or unsolvable items

Binding training data tightly to toolsets ensures relevance but risks **scenario overfitting**.

---

### Stage 3: Reinforcement Learning

Algorithm: **GSPO** (*Group Sequence Policy Optimization*)  
Adjustments:
- Split trajectories into individual rounds → more training samples (**G × T**).
- **Group-level advantage normalization** for balanced reasoning across process stages.
- **Minimal-loss downsampling** for consistent batch sizes.
- **Dual-environment strategy** (simulation + real).

---

## 4. Which Designs Should You Reference?

### Consensus:
- High-quality data creation → critical.
- Closed toolsets aid stable pipeline development.
- Multi-stage training (CPT + SFT + RL) effective.

### Non-Consensus:
- Whether to use specialized models vs general LLMs for Agents.
- Necessity of CPT; viable skips focus on SFT + RL.
- Transferability of web-specific data methods across non-web Agents.

---

### Example Considerations:
- **Specialized models** excel at tool invocation long-range reasoning.
- **General models** offer flexibility; performance gap depends on domain.
- **Trajectory diversity** is expensive but necessary for robust Agents.
- Toolset **extensibility** should be factored into data synthesis plans.

---

## Final Thoughts

**Tongyi DeepResearch** is more than a model — it’s a well-documented case study in Agent training optimization. It provides practical insights for tackling debated aspects in the field.

Reading it feels like opening the fridge after a hot day to find a cold beer and marinated beef: **absolute satisfaction**.  

---

## References

For full reference list, see:
- [GitHub Repo](https://github.com/Alibaba-NLP/DeepResearch)
- [ModelScope Page](https://ModelScope.cn/models/iic/Tongyi-DeepResearch-30B-A3B)
- [ArXiv Papers](https://arxiv.org/pdf/2509.13309)

Additional related works: *WebSailor*, *WebExplorer*, *GSPO* methodology, and toolset data (*WebShaper*).

---

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes.

ChatGPT Atlas 发布,AI 浏览器大乱斗...

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. ChatGPT Atlas 发布,AI 浏览器大乱斗...

# AI Browsers: When LLM Companies Step In 原创 lencx · 2025-10-22 07:00 · 上海 --- ## Overview Large Language Model (LLM) companies are making moves into the **AI browser** space. From new entrants like **Dia**[1], **Comet**[2], and **ChatGPT Atlas**[3], to established browsers like **Chrome** and **Edge** (which now feature

By Honghao Wang