Robotics

Making Robots “Think and Act Accurately”: VLA-R1 Brings “Reasoning + Action” into the Real World

Honghao Wang

25 Oct 2025 — 4 min read

2025-10-25 12:24 Beijing

Letting the model both explain its reasoning process clearly and execute actions accurately

---

Introduction

In robotics and intelligent agents, a core challenge is bridging the gap between understanding instructions and performing precise actions.

For example:

“Put the yellow bowl into the white empty basket.”
“Take the milk out of the microwave and place it on the dining table.”

This requires:

Environment understanding
Instruction parsing
Path planning & affordance reasoning
Grounding reasoning into exact physical actions

Current Vision-Language-Action (VLA) models often output direct action sequences without explicit reasoning about affordance–trajectory relationships, which increases error rates in complex scenarios.

Goal of VLA-R1:

Add a structured reasoning step, then use reinforcement learning to ensure accurate execution — making the robot explain first and act second.

---

VLA-R1: Overview

Paper: VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
Read Paper
Project Page: https://gigaai-research.github.io/VLA-R1/

Summary:

VLA-R1 is a reason-first, execute-second model that integrates:

Chain-of-Thought (CoT) supervision
Verifiable Reward Reinforcement Learning (RLVR, built on GRPO)

It optimizes both reasoning quality and execution accuracy, following the structure:

...
...

---

Key Innovations

1. Two-Stage Training (SFT + RL)

Stage 1 — SFT with CoT supervision: Teacher-guided fine-tuning
Stage 2 — RL with verifiable rewards (GRPO): Stable refinement from “can think” to “can do”
Uses group-wise normalized advantage
Enforces KL constraints

---

2. Three Verifiable Rewards (RLVR)

Ensures the model:

Sees correctly
Moves correctly
Formats correctly

Reward Types:

Spatial Alignment (GIoU):
Keeps gradients valid even when boxes don’t overlap.
Trajectory Consistency (ALHF Fréchet distance):
Considers position, tangent angles, and segment length ratios.
Output Format Reward:
Enforces `` and `` structuring.

---

3. VLA-CoT Data Engine & Dataset

Generated CoT data using Qwen2.5-VL-72B
13K samples aligned with visual/action data
Structured four-step reasoning prompt

---

Experimental Overview

Evaluation across:

In-Domain
Out-of-Domain
Simulation platforms
Real robots

Ablations: CoT vs. RL vs. both.

---

Benchmarks

In-Domain

Dataset: VLA-CoT-13K — Affordance Detection + Trajectory Generation.

Objects: bowls, cups, spoons, pens, boxes, baskets.

Results:

Affordance IoU: 36.51 (+17.78% over baseline)
Avg trajectory error: 91.74 (−17.25% improvement)

---

Out-of-Domain

Datasets: UMD (affordance labels) + VAIT (scene-instruction pairs)

Results:

Affordance IoU: 33.96 (UMD)
Trajectory error: 93.90 (VAIT)

---

Real Robot Experiments — 4 Tabletop Scenarios

Scenarios:

S1: Colored bowl pickup (similar colors challenge)
S2: Fruit pickup (same-category differentiation)
S3: Complex kitchen with occlusion
S4: Mixed clutter with multiple containers

Results:

Affordance SR: 62.5%
Trajectory SR: 75%

---

Simulation (Piper / UR5)

Tested cross-platform with diverse object/instructions.

Results:

Affordance SR: 60% / 50%
Trajectory SR: 80% / 60%

---

Ablation Study

Configurations:

Direct trajectory output
CoT only
CoT + RL

Findings:

CoT boosts IoU from 23.74 → 28.37
CoT+RL boosts IoU to 36.51 with lower trajectory error

---

Demo Display

Thought Process Showcase

Real Machine Platform

Simulation Platform

---

Application Prospects

Household Picking & Storage

Handles clutter, color similarity, uneven lighting
Resolves ambiguities before acting:
e.g., spoon → bowl, pen → white box

Warehouse Picking & Light Assembly

Explains why container/path chosen
Smooth, safe trajectories with MES/PLC integration

Educational & Evaluation

`` + `` format supports grading and teaching
Standard metrics for comparison between training methods

---

Platforms like AiToEarn let creators monetize AI-driven content.

Features:

Generate → Publish → Analyze → Monetize
Multi-platform delivery: Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X
Open-source model ranking & analytics

Further resources:

---

For repost authorization, please contact this account.

Original: Read here

Open in WeChat

Making Robots “Think and Act Accurately”: VLA-R1 Brings “Reasoning + Action” into the Real World

Honghao Wang

2025-10-25 12:24 Beijing

Introduction

VLA-R1: Overview