Father of LSTM Can’t Convince Altman, but Transformer’s Creator Has Left: One-Track Competition Is Pointless, Mindless Scaling Won’t Work

Father of LSTM Can’t Convince Altman, but Transformer’s Creator Has Left: One-Track Competition Is Pointless, Mindless Scaling Won’t Work

November 29, 2025 — Zhejiang

image

> “The Transformer architecture may be trapping the entire industry in a local bottleneck, preventing us from finding truly intelligent reasoning methods.”

> — Llion Jones, co‑inventor of the Transformer.

image
image

Although Llion is one of the original authors of the famous paper Attention Is All You Need, he has since stepped away from Transformer research. He argues that the architecture’s success has led the AI field to focus on incremental enhancements rather than exploring the next major breakthrough.

---

Background: Llion Jones

  • Origin: Wales, UK. Started coding at age 14; built an early chatbot.
  • Google Career:
  • Joined Google in 2011 as a software engineer at YouTube. Initially applied for London, but moved to San Francisco after being offered the California role.
  • In 2015, moved to Google Research to work on NLP during the deep learning revolution.
  • Involved in creating Attention Is All You Need.
  • Later Work:
  • Continued Transformer and language modeling work under futurist Ray Kurzweil.
  • Left Google in 2023 after 12 years to found Sakana AI.

---

Sakana AI: Mission & Funding

  • Funding: 20 billion yen (~$135M USD) Series B led by MUFG, valuation ~$2.6B USD.
  • Philosophy:
  • Avoid chasing massive compute races against U.S. and China.
  • Develop efficient AI technologies for real-world applications.
  • Recent Achievement: Released Continuous Thought Machine (CTM) at NeurIPS 2025 — notable for native adaptive computation and recursive, human-like problem-solving.

---

Podcast Insights: Moving Beyond Transformers

1. Why Shift Away from Transformers

  • Llion feels the research space is oversaturated — many small tweaks, little radical innovation.
  • The Transformer’s creation was bottom‑up, freedom-driven experimentation, unlike today’s big-budget, goal‑oriented research.
  • Proposes scaling up evolutionary search approaches but found little industry interest.

Key Quote:

> “I just want you to work on what you think is interesting and important… and I mean it.”

---

2. Escaping the "Local Optimum"

Challenges:

  • Industry “captured” by LLMs — success makes it hard to pivot.
  • Historical parallel: RNN era incrementalism before Transformers disrupted with major gains.
  • Current tweaks to Transformer architectures risk wasting effort similarly.

Requirement for Replacement:

New architecture must be overwhelmingly better — marginal gains won’t justify switching from the mature Transformer toolchain.

---

3. Scaling Effects: “Too Good” Is a Problem

Jagged Intelligence:

Models show sharp contrasts — solving hard problems yet making basic mistakes.

Example:

  • Video models now draw five fingers — possibly memorization, not true representation.
  • Better architectures would naturally know such facts.

---

4. Large Models in Scientific Research

Sakana's “AI Scientist” System:

  • End‑to‑end research: idea → code → experiments → analysis → paper.
  • Even had a fully AI‑written paper accepted.
  • Preferred model–human interactive collaboration to refine direction.

---

5. Continuous Thought Machine (CTM)

Three Core Innovations:

  • Internal Thought Dimension: Serialized reasoning for tasks like maze solving.
  • Neuron‑as‑Model: Each neuron is a small model for richer dynamics.
  • Thought Representation: Modeled across time via synchronization patterns.

Highlights:

  • Adaptive computation emerges naturally via loss design.
  • Synchronization improves gradient propagation and enables large state space.

---

6. Why CTM Outperforms Transformers

  • Integrates Chain‑of‑Thought internally and continuously.
  • Allows tasks to decompose into “easy” and “hard” parts naturally.
  • Shows excellent calibration without special design.
  • Encourages following “interesting gradients” for emergent capabilities.

---

7. Next‑Generation LM Potential

Example Tasks:

  • Ambiguous mazes reveal different learned strategies under time constraints.
  • SudokuBench: Variant Sudokus with diverse constraints force deep reasoning.

Aim:

Benchmark reasoning that resembles human thought trajectories, not brute‑force search.

---

Key Takeaways

  • Transformers’ dominance may hinder radical architectural innovation.
  • CTM brings biologically inspired time‑based neuron interactions and natural adaptive computation.
  • Future AI systems may collaborate with humans in open‑ended, path‑dependent reasoning.
  • Rich, fine‑grained reasoning datasets (like SudokuBench) could drive genuine AGI‑level advances.

---

---

Event Recommendation

AICon Global Artificial Intelligence Development and Application Conference

Dates: December 19–20, Beijing

Topics:

  • Large model training & inference
  • AI Agents
  • R&D paradigms
  • Organizational innovation
image

---

Original interview video:

YouTube: Machine Learning Street Talk

---

Compiled by InfoQ — opinions are the interviewees'. Reproduction without permission prohibited.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.