reinforcement learning - aitoearn (Page 3)

AgentFlow

AI Online Reinforcement Learning “Learn While Doing”: Stanford Team Boosts 7B Model to Surpass GPT-4o

AgentFlow: A New Framework for Adaptive, Multi‑Agent Reasoning Overview Stanford and collaborators have introduced AgentFlow, a paradigm leveraging online reinforcement learning to help agentic systems "achieve more with less" — in some cases surpassing models like GPT‑4o. Core Concept: AgentFlow continuously enhances agents’ reasoning capabilities when tackling

ExGRPO

New Paradigm for Large Model Reasoning: ExGRPO Framework — From Blind Practice to Smart Review

Large Models in Reinforcement Learning Finally Understand Which Experiences Are Most Valuable! A research team from Shanghai Artificial Intelligence Laboratory, University of Macau, Nanjing University, and The Chinese University of Hong Kong has proposed a groundbreaking experience management and learning framework — ExGRPO. By identifying, storing, filtering, and learning truly valuable

reinforcement learning

RewardMap: Solving Sparse Rewards in Fine-Grained Visual Reasoning via Multi-Stage Reinforcement Learning

# RewardMap: Tackling Sparse Rewards in Fine-Grained Visual Reasoning ## Research Collaboration This work is led by the **ENCODE Lab at Westlake University** in collaboration with: - **Tongji University** - **Zhejiang University** - **National University of Singapore** The team has strong expertise in **large model reinforcement learning** and **multimodal reasoning**. --- ## Background

GPT-5

GPT-5 ≈ o3.1! OpenAI Reveals Thinking Mechanism: RL + Pretraining as the True Path to AGI

GPT‑5 as “o3.1” — Insights from Jerry Tworek OpenAI’s Vice President of Research, Jerry Tworek, shared a fascinating perspective in his first podcast interview: > In a sense, GPT‑5 can be regarded as o3.1. As one of the key creators behind the o1 model, Jerry views

AI agents

AK Latest Podcast Reflection: Forgetting as a Trait of Wisdom to Prevent Rigid Thinking

Andrej Karpathy on the “Decade of Agents” I just finished listening to Andrej Karpathy’s latest podcast — it’s packed with provocative insights and bold predictions about AI’s future. --- Key Takeaways * We’re not in the “First Year of Agents” — we’re entering the “Decade of Agents.” * Current

AGI

Andrej Karpathy’s Latest Long-Form Interview: AGI Needs 10 More Years, RL Is Flawed, and AGI Won’t Trigger an Economic Boom

--- Andrej Karpathy’s latest 10,000-word interview is here — a full two-hour deep dive. For anyone interested in AI, it’s a must-watch. Consider it a weekend mental massage, and here’s a summary to share. In his in-depth conversation with Dwarkesh Patel, Karpathy outlines his core views on

AI

Karpathy: Reinforcement Learning Is Terrible, but Everything Else Is Worse

Kaparthy’s Latest In‑Depth Interview As Tesla’s former Director of AI and a founding member of OpenAI, Andrej Kaparthy spent nearly two and a half hours answering thought‑provoking questions, including: * Why reinforcement learning performs poorly (but alternatives perform even worse) * Why general artificial intelligence will sustain about

AI agents

AI Agents in the New Era: Andrej Karpathy’s Decade Vision and the Education Revolution

![image](https://blog.aitoearn.ai/content/images/2025/10/img_001-281.jpg) ## Andrej Karpathy: *We’re Not Making Animals, We’re Summoning Ghosts* --- ## Overview This document summarizes a deep-dive conversation between **Andrej Karpathy** — co‑founder of OpenAI and former head of Tesla Autopilot — and **Dwarkesh Patel**. It covers Karpathy’

Boston Dynamics

Boston Dynamics Robot Dog Gogo Is Back — “Five Legs” Working in Sync

# Robot Dog Moves Heavy Tires with Coordinated "Five-Limb" Action ![image](https://blog.aitoearn.ai/content/images/2025/10/img_001-8.gif) The **Boston Dynamics AI Institute** has unveiled an innovative method — **Combining Sampling and Learning for Dynamic Whole-Body Manipulation** — enabling the robot dog **Spot** to lift a **15

Xiaomi AI

Xiaomi’s Latest Breakthrough in Large Models! Luo Fuli Appears

# Xiaomi & Peking University Unveil Breakthrough in MoE Reinforcement Learning Xiaomi’s latest advance in **large model research** has just been revealed. Recently, the **Xiaomi AI team** and **Peking University** jointly published a paper focusing on **Mixture of Experts (MoE)** and **reinforcement learning**. ![image](https://blog.aitoearn.ai/content/images/

AI document parsing

AI Algorithm Open Source | Logics-Parsing: End-to-End Structured Processing for Complex PDF Documents

Logics-Parsing: Advanced Document Parsing for Complex Layouts In both work and study, extracting usable content from images or PDFs is often frustrating — especially when tools struggle with: * Converting messy handwritten content into clean notes * Importing tables from references into presentation slides * Editing papers with specialized formats (e.g., chemistry) Even

LLMs

Will LLMs Be Another “Painful Lesson”? Reinforcement Learning Pioneer Sutton Warns of a Trillion-Dollar AI Bubble

📌 Sutton’s Latest Interview: “Have LLMs Learned the Bitter Lesson?” Richard Sutton — famed for his “Bitter Lesson” concept and previous claim that “LLMs are a dead end” — expands his critique in a high-profile panel discussion. Discussion Participants: * Richard Sutton * Sendhil Mullainathan — MacArthur Fellow, MIT Professor * Niamh Gavin — Applied AI Scientist,