AEPO
AEPO: Entropy-Balanced Strategy Optimization for More Stable Exploration and Deeper Reasoning
AEPO: Balancing Exploration and Stability in Agentic RL In the rapidly evolving field of agentic reinforcement learning (RL), balancing exploration and training stability has become a central challenge in multi-turn agent training. Mainstream entropy-driven RL approaches encourage models to explore uncertain reasoning paths, but excessive reliance on entropy can lead