AI news

Fei-Fei Li’s Latest Essay: AI Is Hot, but Possibly Headed Off Track

Honghao Wang

24 Nov 2025 — 4 min read

From Words to Worlds

Why AI Needs Spatial Intelligence to Move Beyond Language

AI excels at talking, but struggles to truly understand the world.

Recently, Google unveiled Gemini 3 Pro, sparking a wave of online buzz. The questions came quickly: Does it have more parameters? Can it handle longer context windows? Are we getting closer to AGI (Artificial General Intelligence)?

Renowned computer scientist Fei-Fei Li — U.S. National Academy of Engineering member and Stanford professor — offers a reality check.

On November 10, she published a detailed article arguing:

> Bigger models and better algorithms aren’t enough. Without world understanding, AI will never reach true intelligence.

---

The Problem with Today’s Large Language Models

Like Well-Read Scholars Who’ve Never Gone Outside

Think of ChatGPT, Gemini, DeepSeek, or Doubao — all powered by Large Language Models (LLMs).

LLMs predict the next word in a sequence.

Example: You say “床前明月光”; the model guesses the next phrase is “疑是地上霜.”

This word-prediction ability, trained on massive text datasets, enables LLMs to:

Pass professional exams
Solve complex math problems

But they falter at simple, physical reasoning:

“How far is that car from the tree?”
“Will this box fit in the trunk?”

Sometimes, they make absurd predictions — e.g., assuming a cup will float upward.

LLMs may know formulas, but they lack physical common sense. Fei-Fei Li calls them “wordsmiths in the dark.”

---

Why Hallucinations Happen

LLMs rely on statistical patterns in text, not real-world experience.

They could claim “The sun rises from the west” because grammar and probability suggest it — even if physics says it’s impossible.

They’ve read thousands of books, but never touched the world outside.

---

Language Can Fabricate — The Physical World Doesn’t Lie

Enter Spatial Intelligence

Fei-Fei Li believes AI must develop spatial intelligence — the ability to understand and interact with the physical world.

Example: Drinking coffee

Vision: Judging cup-to-mouth distance
Motor control: Adjusting grip based on weight
Touch: Avoiding burns
Balance & motion: Keeping cup level

This process uses perception, imagination, and action — not verbal commands.

Spatial intelligence is key because true intelligence requires:

Prediction
Action
Goal achievement — in changing, uncertain environments.

---

Learning Through Interaction

Babies: Push over blocks → hear crash → learn cause & effect.
Scientists: Watson and Crick built physical DNA models to reveal the double helix — insight wasn’t in the “words” but in spatial arrangement.

---

From Predicting Words to Predicting Frames

Fei-Fei Li advocates shifting AI from predicting the next word → to predicting the next frame of reality.

Example: Letting go of a glass cup

Human mind predicts: fall → impact → shatter
Without reading about it, you know what happens.

Key Difference:

Word prediction = grammatical logic
Frame prediction = physical logic

This requires world models — spaces with consistent gravity, light, occlusion, and physics.

---

Challenges in Building World Models

Fei-Fei Li identifies two key obstacles:

Finding the formula: LLMs succeed with “predict next word” simplicity; can we find an equally elegant equation for spatial intelligence?
Finding the data: Requires massive spatial datasets; extracting 3D info from 2D video is an active research area.

---

Marble: A Glimpse of Spatial AI

Fei-Fei Li’s World Labs developed Marble — input text or a photo, get an explorable 3D space.

Testing it: Uploading a single image → Marble inferred chairs, desks, and room layout (still rough, but promising).

---

Potential Impact of Spatial Intelligence

Robots at Home:
Avoid fragile vases
Dry wet floors before walking
Assist elderly care
Industrial Applications:
Controllable video generation for ads & film
Virtual production efficiency (Sony partner reported 40× gains with Marble)
Consumer Products:
Interactive interior design
3D memory albums
VR therapy for phobias
Synthetic Data Markets:
“Textbooks” for robots: domain-specific task data

---

AI Monetization in the Spatial Era

As spatial AI matures, tools that combine creation + cross-platform publishing will become essential.

Example: AiToEarn官网 offers:

AI content generation
Distribution across platforms like Douyin, Bilibili, Xiaohongshu, Instagram, YouTube, LinkedIn
Analytics and AI model ranking (AI模型排名)

This could help spatial AI creators share interactive environments instantly and monetize their innovations globally.

---

Final Takeaways

Why AI still makes simple mistakes:

It “thinks” in statistical patterns, not through cause-and-effect in reality.

Fei-Fei Li’s proposal:

Shift from predicting text → to predicting reality via spatial intelligence & world models.

Possible outcomes:

Household robots with genuine awareness
AI scientists finding new laws of nature
Fully controllable, physically consistent creative tools

Current status:

Marble is early-stage
Formula for world models unknown
Spatial datasets scarce

But now, the path to true intelligence looks clearer.

---

References

From Words to Worlds: Spatial Intelligence is AI’s Next Frontier
Google Developer Guide: Introduction to Large Language Models

---

If you'd like, I can also create a concise 10–15 bullet executive summary of this rewritten piece so it’s easier for leaders or investors to digest in under 2 minutes. Would you like me to prepare that?