Exclusive Interview with DeepMind’s Tan Jie: Robots, World Models, and Google

Exclusive Interview with DeepMind’s Tan Jie: Robots, World Models, and Google

Interview: Zhang Xiaojun × Tan Jie

Google DeepMind Robotics – Foundation Models, Reinforcement Learning, and the Future of General-Purpose Robots

image
image

Guest: Tan Jie, Senior Research Scientist & Technical Lead, Google DeepMind Robotics. He works on applying foundation models and deep reinforcement learning to robotics.

---

Overview

China’s robotics sector is often perceived as stronger in hardware, while the U.S. leads in developing the “brains” of robots. Tan Jie shares Google DeepMind’s perspective — including insights from their recent paper "Gemini Robotics 1.5 Brings AI Agents into the Physical World" — and discusses:

  • The parallels between computer graphics and robotics
  • Sim-to-real transfer and reinforcement learning breakthroughs
  • Large language models as the “brain” for robots
  • The evolving landscape of embodied intelligence and robotics foundation models
  • Data scarcity, cross-embodiment transfer, and motion transfer innovations
  • The competitive culture in Silicon Valley post-ChatGPT

---

1. From Computer Graphics to Robotics

Early Career Path

  • Undergraduate: Shanghai Jiao Tong University
  • PhD: Focus on computer graphics, animation at Pixar internship, physics-based character animation.
  • Founded a startup in Shanghai (similar to Kujiale/CoolHome).
  • Joined Lytro in Silicon Valley, worked on light field cameras.
  • Moved to Google Brain, later merged with DeepMind forming the Google DeepMind Robotics team.

Perspective Shift

> "Robotics is graphics in the real world; graphics is robotics in simulation."

  • Initially motivated to apply simulation-based techniques from graphics to real-world robotics.

---

2. Robotics Before Deep Reinforcement Learning

  • Dominated by rule-based traditional control methods like MPC (Model Predictive Control).
  • High barrier to entry: required PhD-level math.
  • Graphics simulations could show agile motions that real robots failed to achieve (DARPA Robotics Challenge robots vs. simulated robots doing flips).

Goal: Bring simulation capabilities to real-world robots for agile locomotion and manipulation.

---

3. Breakthrough: Sim-to-Real + Reinforcement Learning

  • First Google paper: "Sim-to-Real: Learning Agile Locomotion for Quadruped Robots"
  • Introduced deep RL methods (PPO) inspired by AlphaGo successes.
  • Pioneered RL in quadruped locomotion, influencing Unitree, Boston Dynamics, and others.

Paradigm Shifts in Robotics Over 10 Years:

  • Reinforcement Learning – solving gait and locomotion
  • Large Language Models (LLMs) – bringing language comprehension and common sense to robots

---

4. LLMs + RL = Brain + Cerebellum

  • LLMs: “Brain” – reasoning, planning, language understanding
  • RL: “Cerebellum” – execution, control, balance, precise movement
  • Both components are essential for advanced robotics.

---

5. Robotics Foundation Models — Independent Discipline?

Current Status:

  • Most work extends LLMs/multimodal models to output robot actions.
  • Lacks standalone robotics-specific pretraining paradigms.
  • May become independent in future with unique world models and data formats.

---

6. Data as the Primary Bottleneck

Why Data Is Scarce in Robotics

  • Real-world is unstructured; huge diversity of required experiences
  • No large-scale open datasets like in language modeling
  • Human teleoperation data is expensive

Data Pyramid in Robotics:

  • Massive low-quality internet-scale data
  • Egocentric human video data (YouTube, wearable cameras)
  • Simulation data (physics engines, synthetic environments)
  • Robot-specific high-fidelity data (teleoperation, task-specific collections)

---

7. Gemini Robotics 1.5 – Key Innovations

1. Adding “Thinking” into VLA Models

  • Allows multi-step reasoning before executing actions
  • Improves human-robot transparency and safety

Process Example: Sorting clothes by color

  • Identify item color
  • Plan which pile it belongs to
  • Execute placement and repeat

---

2. Cross-Embodiment Transfer via Motion Transfer

  • Enables using data collected on Robot A to train Robot B
  • Tested across:
  • Aloha: table-top dual-arm robot
  • Bi-arm Franka: industrial arms
  • Apptronik: humanoid robot
  • Result: Tasks requiring unseen workspace configurations transferred successfully between embodiments.

---

8. Technical Challenges and Solutions

  • Reasoning speed: Robots have tighter inference budgets (0.5–1s per decision) vs. LLMs (can think for 20s+)
  • Overfitting thinking traces: Need diverse annotations to generalize reasoning to new tasks
  • Reward function design in RL: Simple for locomotion, extremely hard for varied manipulation tasks
  • Embodiment gap: Larger physical differences reduce transfer effectiveness

---

9. Simulation vs. Real Data

Real Data:

+ Avoids sim-to-real gap

− Limited scalability

Simulation Data:

+ Scalable, cheaper in long run

− Initial performance gap, hallucinations in generative video simulation

New Direction:

  • Generative video-based simulation (VEO, Sora 2, Genie) may replace traditional physics simulation
  • Prompt-based scene generation scales faster than manually modeling environments

---

10. Future Directions

  • Scaling data via simulation, human video, and model-generated datasets
  • Bridging world models (Vision-Language-Vision) with action outputs
  • Incorporating additional modalities (tactile sensing critical for dexterous hands)
  • Moving from gripper era to dexterous-hand era robotics

---

11. Development Timelines

Predictions:

  • 2–3 years: Robotics “GPT moment” with useful general-purpose models
  • 5–10 years: Widespread deployment in industries and eventually homes
  • Specialization will be outperformed once true generalists mature

---

12. Notes on Silicon Valley Culture Shift

  • Post-ChatGPT: extreme competitiveness (“996” style now common)
  • Large-scale coordinated teams replacing small, independent research
  • Balancing top-down direction with bottom-up innovation
  • “Big effort” is necessary but needs smart innovation for breakthroughs

---

13. Talent & Leadership

  • AI talent costs soaring due to supply-demand imbalance
  • Mission alignment more important than money for top-tier hires
  • Significant Chinese representation (50–60%) in Google Robotics
  • Prediction: More Chinese leaders in Silicon Valley AI and robotics in coming years

---

14. Key Takeaways from Tan Jie’s Journey

  • Focus: Solve AGI in the physical world
  • Preferred Form Factor: Humanoid robots
  • Preferred Architecture: End-to-end unified models
  • Critical Bet: Scalable synthetic data
  • Collaboration between hardware-rich China and AI-rich U.S. could accelerate progress globally

---

Quick Recommendations

  • Books:
  • Start With Why
  • The 7 Habits of Highly Effective People
  • Key Papers:
  • Sim-to-Real: Learning Agile Locomotion for Quadruped Robots
  • RT‑1, RT‑2, RT‑X series
  • Gemini Robotics 1.5

---

Closing Perspective

> “When a true generalist robot arrives, specialists will struggle to survive. Whether in robotics or AI content creation, scalable multi-modal data, strong foundation models, and the right collaborations will be key to reaching that point.”

---

Related Resource:

For creators looking to monetize AI innovations (including robotics demos, simulations, or research insights), the AiToEarn官网 open-source platform connects:

  • AI content generation
  • Multi-platform publishing (Douyin, Kwai, WeChat, Bilibili, Facebook, Instagram, LinkedIn, YouTube, Pinterest, X/Twitter)
  • Analytics and AI模型排名

Similar to scalable simulation in robotics, AiToEarn helps ensure innovations reach and grow an audience across ecosystems.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.