reinforcement learning

AI Online Reinforcement Learning “Learn While Doing”: Stanford Team Boosts 7B Model to Surpass GPT-4o

AgentFlow

AI Online Reinforcement Learning “Learn While Doing”: Stanford Team Boosts 7B Model to Surpass GPT-4o

AgentFlow: A New Framework for Adaptive, Multi‑Agent Reasoning Overview Stanford and collaborators have introduced AgentFlow, a paradigm leveraging online reinforcement learning to help agentic systems "achieve more with less" — in some cases surpassing models like GPT‑4o. Core Concept: AgentFlow continuously enhances agents’ reasoning capabilities when tackling

By Honghao Wang
New Paradigm for Large Model Reasoning: ExGRPO Framework — From Blind Practice to Smart Review

ExGRPO

New Paradigm for Large Model Reasoning: ExGRPO Framework — From Blind Practice to Smart Review

Large Models in Reinforcement Learning Finally Understand Which Experiences Are Most Valuable! A research team from Shanghai Artificial Intelligence Laboratory, University of Macau, Nanjing University, and The Chinese University of Hong Kong has proposed a groundbreaking experience management and learning framework — ExGRPO. By identifying, storing, filtering, and learning truly valuable

By Honghao Wang
RewardMap: Solving Sparse Rewards in Fine-Grained Visual Reasoning via Multi-Stage Reinforcement Learning

reinforcement learning

RewardMap: Solving Sparse Rewards in Fine-Grained Visual Reasoning via Multi-Stage Reinforcement Learning

# RewardMap: Tackling Sparse Rewards in Fine-Grained Visual Reasoning ## Research Collaboration This work is led by the **ENCODE Lab at Westlake University** in collaboration with: - **Tongji University** - **Zhejiang University** - **National University of Singapore** The team has strong expertise in **large model reinforcement learning** and **multimodal reasoning**. --- ## Background

By Honghao Wang
AI Algorithm Open Source | Logics-Parsing: End-to-End Structured Processing for Complex PDF Documents

AI document parsing

AI Algorithm Open Source | Logics-Parsing: End-to-End Structured Processing for Complex PDF Documents

Logics-Parsing: Advanced Document Parsing for Complex Layouts In both work and study, extracting usable content from images or PDFs is often frustrating — especially when tools struggle with: * Converting messy handwritten content into clean notes * Importing tables from references into presentation slides * Editing papers with specialized formats (e.g., chemistry) Even

By Honghao Wang
Will LLMs Be Another “Painful Lesson”? Reinforcement Learning Pioneer Sutton Warns of a Trillion-Dollar AI Bubble

LLMs

Will LLMs Be Another “Painful Lesson”? Reinforcement Learning Pioneer Sutton Warns of a Trillion-Dollar AI Bubble

📌 Sutton’s Latest Interview: “Have LLMs Learned the Bitter Lesson?” Richard Sutton — famed for his “Bitter Lesson” concept and previous claim that “LLMs are a dead end” — expands his critique in a high-profile panel discussion. Discussion Participants: * Richard Sutton * Sendhil Mullainathan — MacArthur Fellow, MIT Professor * Niamh Gavin — Applied AI Scientist,

By Honghao Wang