Fei-Fei Li’s Latest Essay: AI Is Hot, but Possibly Headed Off Track

Fei-Fei Li’s Latest Essay: AI Is Hot, but Possibly Headed Off Track

From Words to Worlds

Why AI Needs Spatial Intelligence to Move Beyond Language

image

AI excels at talking, but struggles to truly understand the world.

Recently, Google unveiled Gemini 3 Pro, sparking a wave of online buzz. The questions came quickly: Does it have more parameters? Can it handle longer context windows? Are we getting closer to AGI (Artificial General Intelligence)?

Renowned computer scientist Fei-Fei Li — U.S. National Academy of Engineering member and Stanford professor — offers a reality check.

On November 10, she published a detailed article arguing:

> Bigger models and better algorithms aren’t enough. Without world understanding, AI will never reach true intelligence.

---

The Problem with Today’s Large Language Models

image

Like Well-Read Scholars Who’ve Never Gone Outside

Think of ChatGPT, Gemini, DeepSeek, or Doubao — all powered by Large Language Models (LLMs).

LLMs predict the next word in a sequence.

Example: You say “床前明月光”; the model guesses the next phrase is “疑是地上霜.”

This word-prediction ability, trained on massive text datasets, enables LLMs to:

  • Pass professional exams
  • Solve complex math problems

But they falter at simple, physical reasoning:

  • “How far is that car from the tree?”
  • “Will this box fit in the trunk?”

Sometimes, they make absurd predictions — e.g., assuming a cup will float upward.

LLMs may know formulas, but they lack physical common sense. Fei-Fei Li calls them “wordsmiths in the dark.”

---

Why Hallucinations Happen

LLMs rely on statistical patterns in text, not real-world experience.

They could claim “The sun rises from the west” because grammar and probability suggest it — even if physics says it’s impossible.

They’ve read thousands of books, but never touched the world outside.

---

Language Can Fabricate — The Physical World Doesn’t Lie

image

Enter Spatial Intelligence

Fei-Fei Li believes AI must develop spatial intelligence — the ability to understand and interact with the physical world.

Example: Drinking coffee

  • Vision: Judging cup-to-mouth distance
  • Motor control: Adjusting grip based on weight
  • Touch: Avoiding burns
  • Balance & motion: Keeping cup level

This process uses perception, imagination, and action — not verbal commands.

Spatial intelligence is key because true intelligence requires:

  • Prediction
  • Action
  • Goal achievement — in changing, uncertain environments.

---

Learning Through Interaction

  • Babies: Push over blocks → hear crash → learn cause & effect.
  • Scientists: Watson and Crick built physical DNA models to reveal the double helix — insight wasn’t in the “words” but in spatial arrangement.

---

From Predicting Words to Predicting Frames

image

Fei-Fei Li advocates shifting AI from predicting the next word → to predicting the next frame of reality.

Example: Letting go of a glass cup

  • Human mind predicts: fall → impact → shatter
  • Without reading about it, you know what happens.

Key Difference:

  • Word prediction = grammatical logic
  • Frame prediction = physical logic

This requires world models — spaces with consistent gravity, light, occlusion, and physics.

---

Challenges in Building World Models

Fei-Fei Li identifies two key obstacles:

  • Finding the formula: LLMs succeed with “predict next word” simplicity; can we find an equally elegant equation for spatial intelligence?
  • Finding the data: Requires massive spatial datasets; extracting 3D info from 2D video is an active research area.

---

Marble: A Glimpse of Spatial AI

image

Fei-Fei Li’s World Labs developed Marble — input text or a photo, get an explorable 3D space.

Testing it: Uploading a single image → Marble inferred chairs, desks, and room layout (still rough, but promising).

---

Potential Impact of Spatial Intelligence

  • Robots at Home:
  • Avoid fragile vases
  • Dry wet floors before walking
  • Assist elderly care
  • Industrial Applications:
  • Controllable video generation for ads & film
  • Virtual production efficiency (Sony partner reported 40× gains with Marble)
  • Consumer Products:
  • Interactive interior design
  • 3D memory albums
  • VR therapy for phobias
  • Synthetic Data Markets:
  • “Textbooks” for robots: domain-specific task data

---

AI Monetization in the Spatial Era

As spatial AI matures, tools that combine creation + cross-platform publishing will become essential.

Example: AiToEarn官网 offers:

  • AI content generation
  • Distribution across platforms like Douyin, Bilibili, Xiaohongshu, Instagram, YouTube, LinkedIn
  • Analytics and AI model ranking (AI模型排名)

This could help spatial AI creators share interactive environments instantly and monetize their innovations globally.

---

Final Takeaways

Why AI still makes simple mistakes:

It “thinks” in statistical patterns, not through cause-and-effect in reality.

Fei-Fei Li’s proposal:

Shift from predicting text → to predicting reality via spatial intelligence & world models.

Possible outcomes:

  • Household robots with genuine awareness
  • AI scientists finding new laws of nature
  • Fully controllable, physically consistent creative tools

Current status:

  • Marble is early-stage
  • Formula for world models unknown
  • Spatial datasets scarce

But now, the path to true intelligence looks clearer.

---

References

  • From Words to Worlds: Spatial Intelligence is AI’s Next Frontier
  • Google Developer Guide: Introduction to Large Language Models

---

If you'd like, I can also create a concise 10–15 bullet executive summary of this rewritten piece so it’s easier for leaders or investors to digest in under 2 minutes. Would you like me to prepare that?

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.