reinforcement learning

Tencent Youtu Introduces Training-Free GRPO: Reinforcement Learning for DeepSeek-V3.2 for Just $8

reinforcement learning

Tencent Youtu Introduces Training-Free GRPO: Reinforcement Learning for DeepSeek-V3.2 for Just $8

Reinforcement Learning for Ultra-Large Models — at a Fraction of the Cost Richard Sutton — known as the “Father of Reinforcement Learning” and a Turing Award winner — predicts that the next generation of intelligent agents will achieve superhuman capabilities primarily by learning from experience, rather than relying solely on supervised learning with

By Honghao Wang
Running Highly Scalable Reinforcement Learning for Large Language Models on GKE

reinforcement learning

Running Highly Scalable Reinforcement Learning for Large Language Models on GKE

Reinforcement Learning for LLMs: Scalable Infrastructure on Google Cloud As Large Language Models (LLMs) advance, Reinforcement Learning (RL) is becoming essential for aligning these models with human preferences and complex task goals. Yet, enterprises face significant hurdles when implementing RL at scale: * Memory contention from hosting multiple large models simultaneously

By Honghao Wang
New LLM Reinforcement Learning Framework: UCSD Multi-Agent Training Boosts LLM Tool-Use Capability by 5.8×

LLM

New LLM Reinforcement Learning Framework: UCSD Multi-Agent Training Boosts LLM Tool-Use Capability by 5.8×

Reinforcement Learning Framework for Large Language Model Agents First Implementation of Universal Multi-Agent Group Reinforcement --- Background Numerous studies show that multi-agent workflows with large language models (LLMs) often outperform single-agent systems — even without targeted training. Yet, most current LLM agent training frameworks are limited to single-agent training, leaving universal

By Honghao Wang
Taming Masked Diffusion Language Models with More Consistent Trajectories and Fewer Decoding Steps for Major Gains in Inference Performance and Efficiency

Diffusion LLMs

Taming Masked Diffusion Language Models with More Consistent Trajectories and Fewer Decoding Steps for Major Gains in Inference Performance and Efficiency

# Advancing Diffusion-Based Large Language Models (LLMs) Diffusion-based LLMs have progressed rapidly. - **February 2025** — *Mercury* (Inception Labs) became the first *commercial-scale* diffusion LLM. - **Renmin University** launched *LLaDA*, the first **open-source 8B parameter diffusion LLM**, followed by *Gemini Diffusion* in May. These innovations signal that **diffusion LLMs could rival autoregressive

By Honghao Wang
Today’s Open Source (2025-11-3): Kuaishou and Nanjing University Lab Co-Develop HiPO for Hybrid Strategy Optimization in LLM Dynamic Inference, Dual-Mode Switching Balances Accuracy and Efficiency

LLM optimization

Today’s Open Source (2025-11-3): Kuaishou and Nanjing University Lab Co-Develop HiPO for Hybrid Strategy Optimization in LLM Dynamic Inference, Dual-Mode Switching Balances Accuracy and Efficiency

🏆 Foundational Models ① Project: HiPO HiPO-8B is a novel reinforcement learning framework based on Hybrid Policy Optimization, enabling dynamic reasoning capabilities in large language models (LLMs). Key Highlights: * Developed by KwaiKAT team at Kuaishou in collaboration with NJU-LINK Laboratory (Nanjing University) and ARiSE Laboratory. * Features “think-on” and “think-off” mode switching to

By Honghao Wang

LLM reasoning

HKUST Proposes New Algorithm to Revolutionize LLM Reasoning: Random Strategy Evaluation Emerges as a Breakthrough in Mathematical Reasoning

2025-10-31 · Beijing “Simplify, Don’t Complicate” — The Real Key to Advancing Performance Authors & Affiliations * He Haoran — PhD student at The Hong Kong University of Science and Technology (HKUST), specializing in reinforcement learning and foundation models. * Ye Yuxiao — First-year PhD student at HKUST (Co-first author). * Pan Ling — Assistant Professor, Department

By Honghao Wang