reinforcement learning

Weng Li’s “Elegant” Approach to Strategy Distillation — How It Redefines Cost and Efficiency | New Paper Analysis

On-policy distillation

Weng Li’s “Elegant” Approach to Strategy Distillation — How It Redefines Cost and Efficiency | New Paper Analysis

A Leap of Imagination AI Future Compass — a paper-interpretation column breaking down top conference and journal highlights with frontline perspectives and accessible language. --- Breaking the "Impossible Triangle" For years, post-training of models has been trapped in an impossible triangle: Researchers want models to have strong capabilities, low

By Honghao Wang
Today’s Open Source (2025-10-27): PRIME-RL Breakthrough — Multi-Stage RL and Coevolutionary System Achieve IPhO Gold-Level Physics Reasoning

open-source AI

Today’s Open Source (2025-10-27): PRIME-RL Breakthrough — Multi-Stage RL and Coevolutionary System Achieve IPhO Gold-Level Physics Reasoning

Open-Source AI Model Series & Frameworks Overview This document highlights several cutting-edge open-source AI projects, frameworks, and tools across physics reasoning, multimodal intelligence, reinforcement learning, inference acceleration, and agent-based deep research. --- 🏆 Base Models ① P1 Project — Physics Reasoning at Olympiad Level Key Highlights: * First open-source model series from PRIME-RL. * Designed

By Honghao Wang
Thinking Machine's New Study Goes Viral: Combining RL + Fine-Tuning for More Cost-Effective Small Model Training

On-policy distillation

Thinking Machine's New Study Goes Viral: Combining RL + Fine-Tuning for More Cost-Effective Small Model Training

Thinking Machine’s Breakthrough: On-Policy Distillation for Efficient LLM Training Thinking Machine’s latest research is generating intense discussion in the AI community. After being personally reposted by Mira Murati — founder and former OpenAI CTO — many prominent figures praised its research value: According to Murati’s summary, the team has

By Honghao Wang
New Paradigm for Large Model Inference Learning: ExGRPO Framework — From Blind Practice to Smart Review

ExGRPO

New Paradigm for Large Model Inference Learning: ExGRPO Framework — From Blind Practice to Smart Review

2025-10-24 00:01 Jilin Beyond Traditional Online-Policy RLVR Methods --- Large Model Intelligence|Sharing Source: Quantum Bits A joint research team from Shanghai Artificial Intelligence Laboratory, University of Macau, Nanjing University, and The Chinese University of Hong Kong has introduced a novel experience management and learning framework — ExGRPO. Goal: Scientifically

By Honghao Wang