reinforcement learning - aitoearn (Page 2)

Autonomous Driving

L4 Roadmap Unveiled: Li Auto’s Autonomous Driving Team Introduces New Paradigm at Global AI Summit

A Technological Breakthrough in AI and Autonomous Driving --- > AI is entering its “second half” — will advanced assisted driving lead the next wave of evolution? --- The Shift from Human Data to Experiential Learning Recent debate around AI large models hitting bottlenecks has intensified. Rich Sutton, regarded as the

Composer-1

Composer: Building Fast, Cutting-Edge Models with Reinforcement Learning

Composer: Building a Fast Frontier Model with RL Read the full blog post (Hacker News discussion) Cursor has announced Cursor 2.0, featuring: * A refreshed UI focused on agentic coding * Ability to run parallel agents * A new Cursor-exclusive model: Composer 1 --- First Impressions There’s currently no public API

AI model

SWE-1.5: Cognition AI Launches New Fast Agent Model

Introducing SWE‑1.5: A New Fast Agent Model Read the full announcement (via) --- Overview On the same day that Cursor released Composer‑1, Windsurf announced SWE‑1.5 — its latest frontier-size coding model boasting: * Hundreds of billions of parameters * Near state-of-the-art (SOTA) coding performance * Extreme speed: up to

Cursor AI

Cursor launches its first coding LLM: 250 tokens/sec code generation with reinforcement learning + MoE architecture

Cursor 2.0 Launch: First Native Coding Model Cursor has officially released Cursor 2.0, marking the debut of its first in-house large language model — Composer. Unlike previous versions powered by GPT or Claude, Composer is fully developed and trained internally. --- Why Composer is a Big Deal According to

On-policy distillation

Weng Li’s “Elegant” Approach to Strategy Distillation — How It Redefines Cost and Efficiency | New Paper Analysis

A Leap of Imagination AI Future Compass — a paper-interpretation column breaking down top conference and journal highlights with frontline perspectives and accessible language. --- Breaking the "Impossible Triangle" For years, post-training of models has been trapped in an impossible triangle: Researchers want models to have strong capabilities, low

open-source AI

Today’s Open Source (2025-10-27): PRIME-RL Breakthrough — Multi-Stage RL and Coevolutionary System Achieve IPhO Gold-Level Physics Reasoning

Open-Source AI Model Series & Frameworks Overview This document highlights several cutting-edge open-source AI projects, frameworks, and tools across physics reasoning, multimodal intelligence, reinforcement learning, inference acceleration, and agent-based deep research. --- 🏆 Base Models ① P1 Project — Physics Reasoning at Olympiad Level Key Highlights: * First open-source model series from PRIME-RL. * Designed

On-policy distillation

Thinking Machine's New Study Goes Viral: Combining RL + Fine-Tuning for More Cost-Effective Small Model Training

Thinking Machine’s Breakthrough: On-Policy Distillation for Efficient LLM Training Thinking Machine’s latest research is generating intense discussion in the AI community. After being personally reposted by Mira Murati — founder and former OpenAI CTO — many prominent figures praised its research value: According to Murati’s summary, the team has

image captioning

3B Image Captioning Powerhouse Launches, Performance on Par with Qwen2.5-VL-72B

# CapRL: Breakthrough in Dense Image Captioning via Reinforcement Learning **Date:** 2025-10-28 · **Location:** Sichuan The **model**, **dataset**, and **QA construction code** from the paper have been **fully open-sourced**. ![image](https://blog.aitoearn.ai/content/images/2025/10/img_001-570.jpg) ![image](https://blog.aitoearn.ai/content/images/2025/10/img_002-528.

Robotics

Making Robots “Think and Act Accurately”: VLA-R1 Brings “Reasoning + Action” into the Real World

2025-10-25 12:24 Beijing Letting the model both explain its reasoning process clearly and execute actions accurately --- Introduction In robotics and intelligent agents, a core challenge is bridging the gap between understanding instructions and performing precise actions. For example: * “Put the yellow bowl into the white empty basket.” * “Take

LoRA

RL is the new Fine-Tuning

LoRA, RL, and the Future of AI Model Optimization Interview with Kyle Corbitt, Founder of OpenPipe (Acquired by CoreWeave) In September 2025, Thinking Machines published its long-form article LoRA Without Regret, presenting a set of SFT (Supervised Fine-Tuning) and RL (Reinforcement Learning) experiments. Their conclusion: under certain conditions, LoRA can

open-source AI

Open-Source Model Wins First Physics Olympiad Gold: Shanghai AI Lab's 235B Model Beats GPT-5 and Grok-4

🏅 Open-source AI Model Wins Gold at International Physics Olympiad Historic Achievement The P1-235B-A22B model from Shanghai AI Lab has achieved a 21.2/30 score at the International Physics Olympiad (IPhO) — surpassing the gold medal threshold and making history as the first open-source model to win gold. In the HiPhO

ExGRPO

New Paradigm for Large Model Inference Learning: ExGRPO Framework — From Blind Practice to Smart Review

2025-10-24 00:01 Jilin Beyond Traditional Online-Policy RLVR Methods --- Large Model Intelligence｜Sharing Source: Quantum Bits A joint research team from Shanghai Artificial Intelligence Laboratory, University of Macau, Nanjing University, and The Chinese University of Hong Kong has introduced a novel experience management and learning framework — ExGRPO. Goal: Scientifically