AI news

Exclusive Interview with DeepMind’s Tan Jie: Robots, World Models, and Google

Honghao Wang

28 Nov 2025 — 4 min read

Interview: Zhang Xiaojun × Tan Jie

Google DeepMind Robotics – Foundation Models, Reinforcement Learning, and the Future of General-Purpose Robots

Guest: Tan Jie, Senior Research Scientist & Technical Lead, Google DeepMind Robotics. He works on applying foundation models and deep reinforcement learning to robotics.

---

Overview

China’s robotics sector is often perceived as stronger in hardware, while the U.S. leads in developing the “brains” of robots. Tan Jie shares Google DeepMind’s perspective — including insights from their recent paper "Gemini Robotics 1.5 Brings AI Agents into the Physical World" — and discusses:

The parallels between computer graphics and robotics
Sim-to-real transfer and reinforcement learning breakthroughs
Large language models as the “brain” for robots
The evolving landscape of embodied intelligence and robotics foundation models
Data scarcity, cross-embodiment transfer, and motion transfer innovations
The competitive culture in Silicon Valley post-ChatGPT

---

1. From Computer Graphics to Robotics

Early Career Path

Undergraduate: Shanghai Jiao Tong University
PhD: Focus on computer graphics, animation at Pixar internship, physics-based character animation.
Founded a startup in Shanghai (similar to Kujiale/CoolHome).
Joined Lytro in Silicon Valley, worked on light field cameras.
Moved to Google Brain, later merged with DeepMind forming the Google DeepMind Robotics team.

Perspective Shift

> "Robotics is graphics in the real world; graphics is robotics in simulation."

Initially motivated to apply simulation-based techniques from graphics to real-world robotics.

---

2. Robotics Before Deep Reinforcement Learning

Dominated by rule-based traditional control methods like MPC (Model Predictive Control).
High barrier to entry: required PhD-level math.
Graphics simulations could show agile motions that real robots failed to achieve (DARPA Robotics Challenge robots vs. simulated robots doing flips).

Goal: Bring simulation capabilities to real-world robots for agile locomotion and manipulation.

---

3. Breakthrough: Sim-to-Real + Reinforcement Learning

First Google paper: "Sim-to-Real: Learning Agile Locomotion for Quadruped Robots"
Introduced deep RL methods (PPO) inspired by AlphaGo successes.
Pioneered RL in quadruped locomotion, influencing Unitree, Boston Dynamics, and others.

Paradigm Shifts in Robotics Over 10 Years:

Reinforcement Learning – solving gait and locomotion
Large Language Models (LLMs) – bringing language comprehension and common sense to robots

---

4. LLMs + RL = Brain + Cerebellum

LLMs: “Brain” – reasoning, planning, language understanding
RL: “Cerebellum” – execution, control, balance, precise movement
Both components are essential for advanced robotics.

---

5. Robotics Foundation Models — Independent Discipline?

Current Status:

Most work extends LLMs/multimodal models to output robot actions.
Lacks standalone robotics-specific pretraining paradigms.
May become independent in future with unique world models and data formats.

---

6. Data as the Primary Bottleneck

Why Data Is Scarce in Robotics

Real-world is unstructured; huge diversity of required experiences
No large-scale open datasets like in language modeling
Human teleoperation data is expensive

Data Pyramid in Robotics:

Massive low-quality internet-scale data
Egocentric human video data (YouTube, wearable cameras)
Simulation data (physics engines, synthetic environments)
Robot-specific high-fidelity data (teleoperation, task-specific collections)

---

7. Gemini Robotics 1.5 – Key Innovations

1. Adding “Thinking” into VLA Models

Allows multi-step reasoning before executing actions
Improves human-robot transparency and safety

Process Example: Sorting clothes by color

Identify item color
Plan which pile it belongs to
Execute placement and repeat

---

2. Cross-Embodiment Transfer via Motion Transfer

Enables using data collected on Robot A to train Robot B
Tested across:
Aloha: table-top dual-arm robot
Bi-arm Franka: industrial arms
Apptronik: humanoid robot
Result: Tasks requiring unseen workspace configurations transferred successfully between embodiments.

---

8. Technical Challenges and Solutions

Reasoning speed: Robots have tighter inference budgets (0.5–1s per decision) vs. LLMs (can think for 20s+)
Overfitting thinking traces: Need diverse annotations to generalize reasoning to new tasks
Reward function design in RL: Simple for locomotion, extremely hard for varied manipulation tasks
Embodiment gap: Larger physical differences reduce transfer effectiveness

---

9. Simulation vs. Real Data

Real Data:

+ Avoids sim-to-real gap

− Limited scalability

Simulation Data:

+ Scalable, cheaper in long run

− Initial performance gap, hallucinations in generative video simulation

New Direction:

Generative video-based simulation (VEO, Sora 2, Genie) may replace traditional physics simulation
Prompt-based scene generation scales faster than manually modeling environments

---

10. Future Directions

Scaling data via simulation, human video, and model-generated datasets
Bridging world models (Vision-Language-Vision) with action outputs
Incorporating additional modalities (tactile sensing critical for dexterous hands)
Moving from gripper era to dexterous-hand era robotics

---

11. Development Timelines

Predictions:

2–3 years: Robotics “GPT moment” with useful general-purpose models
5–10 years: Widespread deployment in industries and eventually homes
Specialization will be outperformed once true generalists mature

---

12. Notes on Silicon Valley Culture Shift

Post-ChatGPT: extreme competitiveness (“996” style now common)
Large-scale coordinated teams replacing small, independent research
Balancing top-down direction with bottom-up innovation
“Big effort” is necessary but needs smart innovation for breakthroughs

---

13. Talent & Leadership

AI talent costs soaring due to supply-demand imbalance
Mission alignment more important than money for top-tier hires
Significant Chinese representation (50–60%) in Google Robotics
Prediction: More Chinese leaders in Silicon Valley AI and robotics in coming years

---

14. Key Takeaways from Tan Jie’s Journey

Focus: Solve AGI in the physical world
Preferred Form Factor: Humanoid robots
Preferred Architecture: End-to-end unified models
Critical Bet: Scalable synthetic data
Collaboration between hardware-rich China and AI-rich U.S. could accelerate progress globally

---

Quick Recommendations

Books:
Start With Why
The 7 Habits of Highly Effective People
Key Papers:
Sim-to-Real: Learning Agile Locomotion for Quadruped Robots
RT‑1, RT‑2, RT‑X series
Gemini Robotics 1.5

---

Closing Perspective

> “When a true generalist robot arrives, specialists will struggle to survive. Whether in robotics or AI content creation, scalable multi-modal data, strong foundation models, and the right collaborations will be key to reaching that point.”

---

Related Resource:

For creators looking to monetize AI innovations (including robotics demos, simulations, or research insights), the AiToEarn官网 open-source platform connects:

AI content generation
Multi-platform publishing (Douyin, Kwai, WeChat, Bilibili, Facebook, Instagram, LinkedIn, YouTube, Pinterest, X/Twitter)
Analytics and AI模型排名

Similar to scalable simulation in robotics, AiToEarn helps ensure innovations reach and grow an audience across ecosystems.