AI modeling

Fei-Fei Li Unveils New World Model That Runs on a Single GPU

Honghao Wang

17 Oct 2025 — 2 min read

Fei-Fei Li’s World Model Startup — Latest Breakthrough

Just announced: RTFM (Real-Time Frame Model) by AI pioneer Fei-Fei Li.

This breakthrough model offers real-time operation, persistence, and 3D consistency — and remarkably:

> It runs on a single H100 GPU.

---

🌟 Three Core Design Principles of RTFM

1. Efficiency

Achieves interactive-level frame rates for real-time inference.
Requires only one H100 GPU to run.

2. Scalability

End-to-end framework learns directly from massive video datasets.
Scales naturally with data and compute growth.
Avoids explicit 3D representations to build 3D world models.

3. Persistence

Indefinite interaction possible; scenes remain intact over time.
Persistent 3D worlds do not degrade when the viewpoint changes.

---

📈 Why This Matters

A robust world model can:

Reconstruct, generate, and simulate worlds in real time.
Maintain interaction with physical accuracy and persistence.
Transform industries — from media to robotics.

Generative video modeling progress has led to generative world modeling.

However, compute demands for these models are expected to exceed those of today’s LLMs.

---

⚠️ The Problem with Current Approaches

Directly applying existing video architectures means:

60 FPS 4K streams → over 100,000 tokens/sec (comparable to Frankenstein or Harry Potter 1 in size).
1+ hour interactions → exceed 100M tokens in context.
Infrastructure cannot currently support this efficiently.

Team insight:

> Simple methods that scale elegantly with compute will win long-term, benefiting from declining compute costs.

---

🎯 Their Goal: Efficient & Future-Ready

Design a world model that:

Runs today on a single H100 GPU.
Scales with future hardware.
Maintains interactive frame rates.
Keeps the world persistent and responsive.
Offers high-fidelity previews of future capabilities now.

How They Achieved It

Optimized the entire inference stack.
Innovations in architecture, model distillation, and inference optimization.

---

🔄 How RTFM Differs from Traditional 3D Pipelines

Old way: Explicit 3D representations (meshes, splats) — dominant for decades.

New way with RTFM:

Leverages generative video modeling breakthroughs.
Single neural network handles:
Inputs: 1+ 2D scene images.
Outputs: Novel 2D views from new perspectives.
No explicit 3D geometry needed.
Uses autoregressive diffusion transformer across frame sequences.
Trained end-to-end to predict future frames.

---

🖌️ A Learned Renderer

RTFM acts as a learned renderer:

Transforms image frames → network activations (KV cache).
Implicitly stores world representation.
Attention reads from this representation to render new consistent views.
Learns rendering effects (e.g., reflections, shadows) directly from training data.

---

🔍 Reconstruction vs Generation

RTFM blurs the line:

Reconstruction: Interpolating between existing views (with abundant inputs).
Generation: Extrapolating unseen content (with scarce inputs).

---

🧭 Persistence via Spatial Memory

Problem in classic models:

Autoregressive systems need to reason over ever-growing frame sequences, raising costs and limiting memory capacity.

RTFM’s solution:

Each frame tied to a pose (3D position + orientation).
Pose annotations become spatial memory elements.
Soft prior: the model assumes a 3D Euclidean space without reconstructing it explicitly.
Retrieves nearby frames when generating new ones → reduces processing load.

Context Juggling:

Different spatial regions use different context frames.
Maintains large-scale persistent worlds without expanding computational cost linearly.

---

🚀 Availability

RTFM is now in preview.

You can try it today — and share your feedback!

---

Reference Links:

https://x.com/drfeifei/status/1978840835341914164
https://x.com/theworldlabs/status/1978839175320186988
https://www.worldlabs.ai/blog/rtfm

---

🌐 Monetizing AI-Driven Worlds

Persistent and expansive interactive worlds need efficient tech and creative distribution.

Tools like AiToEarn官网 help creators:

Generate, publish, and earn from AI content globally.
Publish simultaneously to:
Douyin, Kwai, WeChat, Bilibili, Xiaohongshu
Facebook, Instagram, LinkedIn, Threads
YouTube, Pinterest, X (Twitter)
Integrate AI generation, cross-platform scheduling, analytics, and model rankings.

Such ecosystems make it possible to bring RTFM-powered 3D or interactive creations to audiences worldwide — with minimal friction.

---

Would you like me to create a visual infographic summarizing RTFM’s efficiency, scalability, and persistence for quick presentation use? That could make this content far more engaging to share.