Reinforcement Learning

Nature reveals Google IMO gold medal model technical details: Core team of only 10 generates 80 million math problems for AI training in a year

AlphaProof

Nature reveals Google IMO gold medal model technical details: Core team of only 10 generates 80 million math problems for AI training in a year

Google DeepMind Unveils AlphaProof — IMO Gold Medal-Winning AI Google DeepMind’s latest breakthrough in mathematical reasoning, AlphaProof, has been fully disclosed — including both its architecture and training methods. Continuing DeepMind’s naming tradition, AlphaProof builds upon earlier successes like AlphaZero and now joins the ranks of Nature-published research. --- Behind

By Honghao Wang
Tencent Youtu Introduces Training-Free GRPO: Reinforcement Learning for DeepSeek-V3.2 for Just $8

Reinforcement Learning

Tencent Youtu Introduces Training-Free GRPO: Reinforcement Learning for DeepSeek-V3.2 for Just $8

Reinforcement Learning for Ultra-Large Models — at a Fraction of the Cost Richard Sutton — known as the “Father of Reinforcement Learning” and a Turing Award winner — predicts that the next generation of intelligent agents will achieve superhuman capabilities primarily by learning from experience, rather than relying solely on supervised learning with

By Honghao Wang
Running Highly Scalable Reinforcement Learning for Large Language Models on GKE

Reinforcement Learning

Running Highly Scalable Reinforcement Learning for Large Language Models on GKE

Reinforcement Learning for LLMs: Scalable Infrastructure on Google Cloud As Large Language Models (LLMs) advance, Reinforcement Learning (RL) is becoming essential for aligning these models with human preferences and complex task goals. Yet, enterprises face significant hurdles when implementing RL at scale: * Memory contention from hosting multiple large models simultaneously

By Honghao Wang
New LLM Reinforcement Learning Framework: UCSD Multi-Agent Training Boosts LLM Tool-Use Capability by 5.8×

LLM

New LLM Reinforcement Learning Framework: UCSD Multi-Agent Training Boosts LLM Tool-Use Capability by 5.8×

Reinforcement Learning Framework for Large Language Model Agents First Implementation of Universal Multi-Agent Group Reinforcement --- Background Numerous studies show that multi-agent workflows with large language models (LLMs) often outperform single-agent systems — even without targeted training. Yet, most current LLM agent training frameworks are limited to single-agent training, leaving universal

By Honghao Wang
Taming Masked Diffusion Language Models with More Consistent Trajectories and Fewer Decoding Steps for Major Gains in Inference Performance and Efficiency

Diffusion LLMs

Taming Masked Diffusion Language Models with More Consistent Trajectories and Fewer Decoding Steps for Major Gains in Inference Performance and Efficiency

# Advancing Diffusion-Based Large Language Models (LLMs) Diffusion-based LLMs have progressed rapidly. - **February 2025** — *Mercury* (Inception Labs) became the first *commercial-scale* diffusion LLM. - **Renmin University** launched *LLaDA*, the first **open-source 8B parameter diffusion LLM**, followed by *Gemini Diffusion* in May. These innovations signal that **diffusion LLMs could rival autoregressive

By Honghao Wang