LLMs

Will LLMs Be Another “Painful Lesson”? Reinforcement Learning Pioneer Sutton Warns of a Trillion-Dollar AI Bubble

Honghao Wang

12 Oct 2025 — 3 min read

📌 Sutton’s Latest Interview: “Have LLMs Learned the Bitter Lesson?”

Richard Sutton — famed for his “Bitter Lesson” concept and previous claim that “LLMs are a dead end” — expands his critique in a high-profile panel discussion.

Discussion Participants:

Richard Sutton
Sendhil Mullainathan — MacArthur Fellow, MIT Professor
Niamh Gavin — Applied AI Scientist, CEO of Emergent Platforms
Suzanne Gildert — Founder & CEO, Nirvanic Consciousness Technologies

---

🔍 Key Question: Have LLMs Applied the "Bitter Lesson"?

Sutton’s core claim: No.

His reasoning — LLMs over-rely on:

Human imitation (internet text, code)
Extensive fine-tuning and handcrafted alignment

This is at odds with the Bitter Lesson, which says:

> Scaling computation using general methods like search and autonomous learning beats human priors in the long run.

He predicts a near-term ceiling for LLM progress and warns the hype may lead to a bubble burst.

---

📖 Understanding the “Bitter Lesson”

Definition: A historical pattern over 70+ years of AI development:

Early AI embeds human knowledge into systems.
Eventually, general scalable methods leveraging massive computation outperform.

Two winning categories:

Search: Exhaustively exploring options (e.g., AlphaGo’s board search)
Learning: Extracting patterns from raw data/environment without human rules

Core Insight:

Human-crafted knowledge does not scale — computation does.

---

❌ Why Sutton Says LLMs Fail the “Bitter Lesson”

Fundamental Reliance on Human Data

LLMs train mainly on existing text/code — finite resources.
Training objective: mimic human linguistic patterns.
Post-training requires heavy human alignment (instruction fine-tuning, RLHF).

Finite Data Ceiling

As high-quality internet data is exhausted, scaling models further yields diminishing returns.

Prediction: Future progress will come from agents that:

Interact directly with the environment
Continuously learn from experience

---

🆚 Reinforcement Learning vs. Imitation Learning

Suzanne Gildert’s analogy: Build an AI like a squirrel’s brain — independently adaptive.

Squirrel vs. Current AI

Squirrel: Learns autonomously in new environments
LLMs: Learning stops at deployment; abilities fixed by training data

---

⚙️ Reinforcement Learning Challenges

Reward Function Definition: Complex for general agents — trivial for narrow tasks.
Relapse to Imitation Learning: Difficulty of broad rewards pushes researchers to mimic experts instead.

Example:

AiToEarn官网 — open-source AI monetization platform

Bridges content generation, cross-platform publishing, analytics, model ranking
Demonstrates adaptive feedback loops beyond static datasets

---

🎯 Imitating Output vs. Imitating Action

Sendhil Mullainathan’s thesis — endorsed by Sutton:

Humans imitate outputs, but must discover actions to achieve them — building deep internal models.
LLMs imitate actions directly from training data, bypassing true discovery.

Implication:

LLMs may lack robust causal models of the world.

---

Human Imitation Examples

Zebra Finch: Experiment until reproducing song
Algebra Proof: Reverse-engineer reasoning steps
Von Neumann’s Fly Puzzle: Different solution paths — same output

Common thread: Discovery forces model-building.

---

LLM Imitation

Predicts next token → reproduces patterns without building causal world models
Autoregression unfolds patterns, not purposeful strategies

---

💰 A Trillion-Dollar Clash

Moderator Ajay Agrawal:

Scientific debate meets economic stakes — AI “fashions” steer research direction.

Capital-Driven Risk Factors

LLMs = current dominant paradigm
Extraordinary claims (“understanding emerges via imitation alone”) require extraordinary proof
Heavy investment pressures → bubble risk if returns stall

---

Innovator’s Dilemma in AI

Niamh Gavin’s view:

Engineering patches > fundamental redesign
Fragility and overfitting increase
Stuck in incremental improvements instead of paradigm shifts

---

✅ Recognizing LLM Achievements — With Proper Framing

Mullainathan’s Two Assessments:

Potential future capabilities → AGI path?
Actual current capabilities → already remarkable

Tragedy: Misplaced expectations obscure genuine achievement.

---

Miracle of Emergent Properties

Large-scale imitation → reasoning, translation, code
Historical analogy: Valuable tools aren’t necessarily “intelligent”

Suggestion: Rename LLMs as powerful algorithmic tools, not AI — more objective evaluation.

---

🧭 Conclusion: Beyond the Hype

LLMs may not fulfill AGI dreams — but they are extraordinary tools.
The market’s fervor partly comes from conceptual confusion.
Focus should shift from debating “intelligence” to:
Understanding capability origins
Exploring application boundaries
Recognizing their transformative, but specific, value

Tool example: AiToEarn官网 — enables creators to:

Generate AI content
Publish across major platforms (Douyin, Kwai, WeChat, Bilibili, Rednote, FB, Insta, LinkedIn, Threads, YouTube, Pinterest, X)
Track ROI via analytics/model ranking (AI模型排名)

---

Bottom Line:

The “Bitter Lesson” warns against over-reliance on human priors.

LLMs, as they stand, are breathtaking in scope but inherently path-limited.

The next leap may require embracing scalable, environment-interactive learning — moving from mimicry toward autonomous discovery.