Will LLMs Be Another “Painful Lesson”? Reinforcement Learning Pioneer Sutton Warns of a Trillion-Dollar AI Bubble

Will LLMs Be Another “Painful Lesson”? Reinforcement Learning Pioneer Sutton Warns of a Trillion-Dollar AI Bubble

📌 Sutton’s Latest Interview: “Have LLMs Learned the Bitter Lesson?”

image

Richard Sutton — famed for his “Bitter Lesson” concept and previous claim that “LLMs are a dead end” — expands his critique in a high-profile panel discussion.

Discussion Participants:

  • Richard Sutton
  • Sendhil Mullainathan — MacArthur Fellow, MIT Professor
  • Niamh Gavin — Applied AI Scientist, CEO of Emergent Platforms
  • Suzanne Gildert — Founder & CEO, Nirvanic Consciousness Technologies
image

---

🔍 Key Question: Have LLMs Applied the "Bitter Lesson"?

Sutton’s core claim: No.

His reasoning — LLMs over-rely on:

  • Human imitation (internet text, code)
  • Extensive fine-tuning and handcrafted alignment

This is at odds with the Bitter Lesson, which says:

> Scaling computation using general methods like search and autonomous learning beats human priors in the long run.

He predicts a near-term ceiling for LLM progress and warns the hype may lead to a bubble burst.

---

📖 Understanding the “Bitter Lesson”

Definition: A historical pattern over 70+ years of AI development:

  • Early AI embeds human knowledge into systems.
  • Eventually, general scalable methods leveraging massive computation outperform.

Two winning categories:

  • Search: Exhaustively exploring options (e.g., AlphaGo’s board search)
  • Learning: Extracting patterns from raw data/environment without human rules

Core Insight:

Human-crafted knowledge does not scale — computation does.

---

❌ Why Sutton Says LLMs Fail the “Bitter Lesson”

Fundamental Reliance on Human Data

  • LLMs train mainly on existing text/code — finite resources.
  • Training objective: mimic human linguistic patterns.
  • Post-training requires heavy human alignment (instruction fine-tuning, RLHF).

Finite Data Ceiling

  • As high-quality internet data is exhausted, scaling models further yields diminishing returns.

Prediction: Future progress will come from agents that:

  • Interact directly with the environment
  • Continuously learn from experience

---

🆚 Reinforcement Learning vs. Imitation Learning

Suzanne Gildert’s analogy: Build an AI like a squirrel’s brain — independently adaptive.

Squirrel vs. Current AI

  • Squirrel: Learns autonomously in new environments
  • LLMs: Learning stops at deployment; abilities fixed by training data

---

⚙️ Reinforcement Learning Challenges

  • Reward Function Definition: Complex for general agents — trivial for narrow tasks.
  • Relapse to Imitation Learning: Difficulty of broad rewards pushes researchers to mimic experts instead.

Example:

AiToEarn官网 — open-source AI monetization platform

  • Bridges content generation, cross-platform publishing, analytics, model ranking
  • Demonstrates adaptive feedback loops beyond static datasets

---

🎯 Imitating Output vs. Imitating Action

Sendhil Mullainathan’s thesis — endorsed by Sutton:

  • Humans imitate outputs, but must discover actions to achieve them — building deep internal models.
  • LLMs imitate actions directly from training data, bypassing true discovery.

Implication:

LLMs may lack robust causal models of the world.

---

Human Imitation Examples

  • Zebra Finch: Experiment until reproducing song
  • Algebra Proof: Reverse-engineer reasoning steps
  • Von Neumann’s Fly Puzzle: Different solution paths — same output

Common thread: Discovery forces model-building.

---

LLM Imitation

  • Predicts next token → reproduces patterns without building causal world models
  • Autoregression unfolds patterns, not purposeful strategies

---

💰 A Trillion-Dollar Clash

Moderator Ajay Agrawal:

Scientific debate meets economic stakes — AI “fashions” steer research direction.

Capital-Driven Risk Factors

  • LLMs = current dominant paradigm
  • Extraordinary claims (“understanding emerges via imitation alone”) require extraordinary proof
  • Heavy investment pressures → bubble risk if returns stall

---

Innovator’s Dilemma in AI

Niamh Gavin’s view:

  • Engineering patches > fundamental redesign
  • Fragility and overfitting increase
  • Stuck in incremental improvements instead of paradigm shifts

---

✅ Recognizing LLM Achievements — With Proper Framing

Mullainathan’s Two Assessments:

  • Potential future capabilities → AGI path?
  • Actual current capabilities → already remarkable

Tragedy: Misplaced expectations obscure genuine achievement.

---

Miracle of Emergent Properties

  • Large-scale imitation → reasoning, translation, code
  • Historical analogy: Valuable tools aren’t necessarily “intelligent”

Suggestion: Rename LLMs as powerful algorithmic tools, not AI — more objective evaluation.

---

🧭 Conclusion: Beyond the Hype

  • LLMs may not fulfill AGI dreams — but they are extraordinary tools.
  • The market’s fervor partly comes from conceptual confusion.
  • Focus should shift from debating “intelligence” to:
  • Understanding capability origins
  • Exploring application boundaries
  • Recognizing their transformative, but specific, value

Tool example: AiToEarn官网 — enables creators to:

  • Generate AI content
  • Publish across major platforms (Douyin, Kwai, WeChat, Bilibili, Rednote, FB, Insta, LinkedIn, Threads, YouTube, Pinterest, X)
  • Track ROI via analytics/model ranking (AI模型排名)

---

Bottom Line:

The “Bitter Lesson” warns against over-reliance on human priors.

LLMs, as they stand, are breathtaking in scope but inherently path-limited.

The next leap may require embracing scalable, environment-interactive learning — moving from mimicry toward autonomous discovery.

Read more

Tesla’s Upcoming Release, New Audi A6L Unveiled, and XPeng P7+ Range-Extended Version Arrives!

Tesla’s Upcoming Release, New Audi A6L Unveiled, and XPeng P7+ Range-Extended Version Arrives!

金九银十车市观察:新车型与技术动态 虽然今年的 “金九银十” 没有往年那么火热,但各大品牌的终端优惠和市场活动依然密集展开。展厅之外,汽车厂商也在酝酿新一轮攻势——最新一期 工信部产品公告 已经提前揭晓了牌桌上的新玩家和新打法。 本期公告的核心亮点有两条主线: * 豪华品牌跨界合作:奥迪在 A6L 等核心燃油车型上全面搭载华为辅助驾驶方案,代表传统汽车工业在技术路径上的一次变革。 * 纯电与混动的界线模糊化:以小鹏为代表的纯电新势力开始全面拥抱增程技术。 下面我们按车型顺序解析本次公告中的重点内容。 --- 特斯拉 Model Y+ 长续航后驱版 主要变化: * 新增长续航版后轮驱动车型:225kW 后置单电机,CLTC 续航有望超过 800km。 * 填补了 26.35 万–31.35 万元的价格空档,为不需要四驱但注重续航的用户提供新选择。 * 更低门槛带来更长续航,有望吸引观望竞品的潜在买家。 * 有助于稳定 Model Y 在主流市场的竞争力。 --- 一汽奥迪 A6L &

By Honghao Wang