reinforcement learning

Tencent Youtu Introduces Training-Free GRPO: Reinforcement Learning for DeepSeek-V3.2 for Just $8

Reinforcement Learning for Ultra-Large Models — at a Fraction of the Cost Richard Sutton — known as the “Father of Reinforcement Learning” and a Turing Award winner — predicts that the next generation of intelligent agents will achieve superhuman capabilities primarily by learning from experience, rather than relying solely on supervised learning with

reinforcement learning

Running Highly Scalable Reinforcement Learning for Large Language Models on GKE

Reinforcement Learning for LLMs: Scalable Infrastructure on Google Cloud As Large Language Models (LLMs) advance, Reinforcement Learning (RL) is becoming essential for aligning these models with human preferences and complex task goals. Yet, enterprises face significant hurdles when implementing RL at scale: * Memory contention from hosting multiple large models simultaneously

Artificial Intelligence

Profound Insights on Large Models and AGI — Gathering the Wisdom of Top AI Experts

Large Model Intelligence | Insights & Synthesis Expert Perspectives on AI's Future Drawing from interviews with leading figures in the AI community: * Andrej Karpathy — Former Tesla Autopilot Director: AGI is still a decade away. * Richard Sutton — Father of Reinforcement Learning: LLMs may be a dead end. * Wu Yi — Former

AI coding

Cursor Reveals for the First Time: “Training Is the Product” — The Secret Weapon Using Reinforcement Learning to Make AI Coding 4× Faster

Why AI Programming Assistants Often Feel "Off" Have you noticed that AI programming assistants are either smart but slow, or fast but inaccurate? I wrestled with this contradiction—until Sasha Rush from Cursor presented at Ray Summit 2025. Their team unveiled Cursor Composer, a model trained with reinforcement

robot training

Robot Training: Beijing Man Masters New Skill Gameplay

# COLA: Sensor-Free Human–Robot Collaboration *(Still, it has to be college students who know how to have fun — doge!)* --- ## A Robot Sidekick with Campus Energy While casually surfing the web, I stumbled upon something fascinating: A male college student has a **robot teammate** — and it’s *incredibly clingy* (well,

LLM

New LLM Reinforcement Learning Framework: UCSD Multi-Agent Training Boosts LLM Tool-Use Capability by 5.8×

Reinforcement Learning Framework for Large Language Model Agents First Implementation of Universal Multi-Agent Group Reinforcement --- Background Numerous studies show that multi-agent workflows with large language models (LLMs) often outperform single-agent systems — even without targeted training. Yet, most current LLM agent training frameworks are limited to single-agent training, leaving universal

reinforcement learning

The Godfather of Reinforcement Learning Returns — Is the Era of Generative AI Ending?

# **New Intelligence Report: Richard Sutton Joins ExperienceFlow.AI to Advance Experience-driven AI** ![image](https://blog.aitoearn.ai/content/images/2025/11/img_001-170.jpg) --- ## **Overview** Over the past two years, AI has swept the world by **imitating humans** through generative models. But Richard Sutton — the father of reinforcement learning

Diffusion LLMs

Taming Masked Diffusion Language Models with More Consistent Trajectories and Fewer Decoding Steps for Major Gains in Inference Performance and Efficiency

# Advancing Diffusion-Based Large Language Models (LLMs) Diffusion-based LLMs have progressed rapidly. - **February 2025** — *Mercury* (Inception Labs) became the first *commercial-scale* diffusion LLM. - **Renmin University** launched *LLaDA*, the first **open-source 8B parameter diffusion LLM**, followed by *Gemini Diffusion* in May. These innovations signal that **diffusion LLMs could rival autoregressive

LLM optimization

Today’s Open Source (2025-11-3): Kuaishou and Nanjing University Lab Co-Develop HiPO for Hybrid Strategy Optimization in LLM Dynamic Inference, Dual-Mode Switching Balances Accuracy and Efficiency

🏆 Foundational Models ① Project: HiPO HiPO-8B is a novel reinforcement learning framework based on Hybrid Policy Optimization, enabling dynamic reasoning capabilities in large language models (LLMs). Key Highlights: * Developed by KwaiKAT team at Kuaishou in collaboration with NJU-LINK Laboratory (Nanjing University) and ARiSE Laboratory. * Features “think-on” and “think-off” mode switching to

AEPO

AEPO: Entropy-Balanced Strategy Optimization for More Stable Exploration and Deeper Reasoning

AEPO: Balancing Exploration and Stability in Agentic RL In the rapidly evolving field of agentic reinforcement learning (RL), balancing exploration and training stability has become a central challenge in multi-turn agent training. Mainstream entropy-driven RL approaches encourage models to explore uncertain reasoning paths, but excessive reliance on entropy can lead

RIVAL

RIVAL: Iterative Adversarial Reinforcement Learning for Machine Translation

RIVAL Framework: Solving Distribution Shift in RLHF for Conversational Subtitle Translation Original AI 2025‑10‑31 12:04 — Shanghai This article introduces the RIVAL framework, designed to address the distribution shift problem in RLHF for conversational subtitle translation via adversarial iterative optimization. --- 1. Overview We present RIVAL (Reinforcement Learning

LLM reasoning

HKUST Proposes New Algorithm to Revolutionize LLM Reasoning: Random Strategy Evaluation Emerges as a Breakthrough in Mathematical Reasoning

2025-10-31 · Beijing “Simplify, Don’t Complicate” — The Real Key to Advancing Performance Authors & Affiliations * He Haoran — PhD student at The Hong Kong University of Science and Technology (HKUST), specializing in reinforcement learning and foundation models. * Ye Yuxiao — First-year PhD student at HKUST (Co-first author). * Pan Ling — Assistant Professor, Department