Fei-Fei Li and LeCun’s Debate on World Models
The AGI Path Converges on the Battlefield of World Models
The race toward Artificial General Intelligence (AGI) is increasingly being fought in the domain of world models — AI systems that simulate environments.
Recently:
- Fei-Fei Li unveiled her first commercial world model, Marble.
- Almost simultaneously, Yann LeCun left Meta to start his own world model venture.
- Earlier, Google’s Genie 3 stirred excitement in the AI community.
Each heavyweight is betting on a different technical approach to what a "world model" should be.
---
Fei-Fei Li’s Marble: A Commercially Oriented World Model
After publishing a lengthy manifesto on spatial intelligence, Fei-Fei Li’s startup World Labs released its first commercial offering — Marble.

Key features:
- Persistent, downloadable 3D environments
- Export options: Gaussian splats, mesh grids, or direct video
- Significantly reduced scene deformation and detail inconsistency
- Native AI world editor called Chisel for one-prompt world transformation

Example workflow for VR/game developers:
- Enter a single prompt
- Generate a full 3D world
- Export to Unity in one click
---
Criticism: Is Marble Truly a "World Model"?
Industry feedback has been mixed:
> “Isn’t this just a Gaussian splat model? What does ‘world’ really mean in ‘world model’?” — Hacker News user
> “Converting images into 3D environments with Gaussian scattering, depth mapping, and inpainting is cool — but that’s a pipeline, not a robot’s brain.” — Reddit commenter
---
Understanding Gaussian Splats
Gaussian splats depict 3D scenes using thousands of small, semi-transparent “blobs” in 3D space:
- Each Gaussian: soft-edged, glowing bubble
- Clusters of Gaussians: coherent, rich 3D imagery
- Advantages: faster & simpler than traditional photogrammetry
- Trade-off: less precision
Marble’s core: Gaussian splat representation.
---
Why Marble May Not Serve Robotics Training
Limitation:
Marble captures visual details but omits physical laws — vital for robots.
Example:
A human knows a ball rolls downhill due to gravity, friction, and mass.
Marble’s representation lacks these causal properties, making it unsuitable for physics-based decision-making.
Even Marble’s own materials rarely mention robotics use cases.
---
Commercial Strength vs. AGI Ambitions
From a business perspective:
- Marble is ready-to-use for game/VR workflows.
- It avoids speculative AGI goals, focusing instead on asset generation.
From an AGI perspective:
True interactive world models for robots — like LeCun’s JEPA — take a different path.
---
LeCun’s JEPA: Abstract World Modeling
JEPA’s philosophy:
- Rooted in control theory & cognitive science
- No need to output polished visuals
- Focus on predictive abstract representation
- Optimized for multi-step world prediction
Why it matters:
- Not visually dazzling, but acts like a robot’s brain
- Captures underlying world structure necessary for reasoning & robotics training
Contrast:
- Marble: front-end asset generator
- JEPA: back-end prediction engine
---
Google’s Genie 3: Interactive Video Worlds
In Aug 2024, Google DeepMind released Genie 3:

Highlights:
- Generates interactive video environments from a single prompt
- Long-term visual consistency (buildings don’t vanish when turning around)
- Events: rain starts, night falls
- Limitations:
- Built on video logic, not physics
- Lower resolution than Marble
- Less fundamental to robotics than JEPA
---
Comparing the Three Paradigms
| Model | Focus | Strengths | Limitations |
|-------|-------|-----------|-------------|
| Marble | Appearance | Editable, exportable 3D assets | Lacks physical causality |
| Genie 3 | Dynamics | Interactive video with events | No deep physics understanding |
| JEPA | Structure | Abstract state modeling for robots | No human-viewable rendering |
---
The "World Model Pyramid"

Zhao Hao’s pyramid:
- Base → Fei-Fei Li’s Marble: visual realism for humans
- Middle → Google’s Genie 3: dynamic video worlds
- Top → LeCun’s JEPA: abstract representations for robots
Observation:
- Higher: more abstract, robot-friendly
- Lower: more visual, human-friendly
---
Future: Combining Paradigms
A complete AI stack could integrate:
- Visual-rich worlds (Marble)
- Interactive dynamics (Genie 3)
- Abstract structural reasoning (JEPA)
Platforms like AiToEarn官网 are connecting such models with content ecosystems for efficient publishing across:
- Douyin
- YouTube
- X
---
Follow us for updates on cutting-edge world model tech — where commercial utility meets AGI ambition.