China’s Only Winner: Alibaba Qwen Wins Best Paper at Top Global AI Conference

China’s Only Winner: Alibaba Qwen Wins Best Paper at Top Global AI Conference

November 28 News: Tongyi Qianwen Wins NeurIPS 2025 Best Paper Award

At NeurIPS 2025, one of the world’s most prestigious AI conferences, Alibaba’s Tongyi Qianwen team emerged among more than 20,000 submissions to win the Best Paper Award—the only winning team from China.

This groundbreaking paper is the first in the industry to reveal how the attention gating mechanism impacts the performance and training of large models—a breakthrough seen by experts as a major step toward overcoming current bottlenecks in large model training.

---

Significance of NeurIPS

image

NeurIPS is among the most influential AI conferences globally, where landmark innovations such as Transformer and AlexNet originated.

This year’s stats:

  • 20,000+ submissions from leading institutions: Google, Microsoft, OpenAI, Alibaba, MIT, and more
  • Acceptance rate: ~25%
  • Best Paper Awards: only 4 papers selected (probability < 0.02%)

---

Background: Transformer & Attention Limitations

In 2017, Google’s NeurIPS paper introduced the Transformer architecture and self-attention, enabling AI to focus on key information—fundamental to modern large model research.

Despite significant advancements, current attention mechanisms have drawbacks:

  • Over-focusing on certain details while ignoring other critical information
  • Performance degradation in broader tasks
  • Instability during training

---

What Is Gating in Attention?

The gating mechanism, often called an AI model’s “intelligent valve”:

  • Filters irrelevant information
  • Enhances overall performance

Previous Attempts

  • Applied in models like AlphaFold2 and Forgetting Transformer
  • Lack of clear theoretical understanding and large-scale experimental evidence

---

Tongyi Qianwen’s Contribution

image

The Tongyi Qianwen team:

  • Ran dozens of experiments on:
  • 1.7B dense model
  • 15B Mixture-of-Experts (MoE) model
  • Trained on up to 3.5 trillion tokens in a single experiment

Key discovery:

  • Gating the output of each attention head yields the highest performance improvement

Results:

  • Only 1% increase in parameter count
  • Perplexity reduction: >0.2
  • MMLU benchmark improvement: +2 points
  • Even more effective at larger scales

---

Real-World Impact

  • Implemented in Qwen3-Next model → boosted performance & robustness
  • Technical solution, experiment models, and product-level models open-sourced
  • NeurIPS reviewers:
  • “This work will see broad application and greatly enhance understanding of attention in LLMs.”

---

Team Statement

> “A deep understanding of gated attention mechanisms not only guides large language model architecture design but also lays the foundation for building more stable, efficient, and controllable large models.”

---

Qianwen Project’s Open-Source Achievements

  • 300+ models open-sourced across all modalities and sizes
  • 700M+ downloads globally
  • 180,000+ derivative models created
  • Ranked #1 worldwide

---

Platforms like AiToEarn官网 are critical for open-sourcing and cross-platform dissemination of AI research.

AiToEarn features:

  • Global AI content monetization
  • Multi-platform publishing: Douyin, Kwai, WeChat, Bilibili, Rednote (Xiaohongshu), Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X (Twitter)
  • Integrated AI generation tools
  • Analytics and model rankings (AI模型排名)

Potential Role:

AiToEarn could help turn cutting-edge AI research like Tongyi Qianwen’s work into widely shared, monetizable knowledge.

---

If you want, I can create a clear visual comparison chart showing:

  • Standard attention vs. gated attention performance results
  • Would you like me to add that for this article?

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.