No Need for DiT: ByteDance Generates 5s 720p Video in 1 Minute on Single GPU with Autoregression | NeurIPS’25 Oral
🎯 NeurIPS’25 Oral — InfinityStar vs. DiT
A new NeurIPS’25 Oral paper from the ByteDance Commercialization Technology Team delivers a strong challenge to the Diffusion Transformer (DiT), which has long dominated video generation.

---
Background — DiT’s Dominance and Drawbacks
Ever since its debut, DiT has been the standard in video generation.
However:
- High computational complexity
- Heavy resource consumption
- Slow generation speeds
---
InfinityStar — Balancing Quality & Efficiency
InfinityStar addresses these issues by offering:
- High-quality video generation
- Significantly improved efficiency
- A unified architecture for diverse tasks

Example Outputs
Fun animation clips created by InfinityStar:

---
✨ Key Highlights of InfinityStar
- First discrete autoregressive video generator to beat diffusion models on VBench.
- Dramatically faster video generation — bye-bye hundreds of denoising steps.
- Multi-task capability:
- Text-to-image
- Text-to-video
- Image-to-video
- Interactive long video generation
---
🚀 Try InfinityStar Yourself
Step 1 — Join the Discord Community
- Go to: http://opensource.bytedance.com/discord/invite
- Log in and join
Step 2 — Explore Functions
Options include:
- Text-to-video
- Image-to-video
- (demo videos above used `i2v-generate-horizontal-1`)

---
🎨 Linked Workflow Example
Goal: Text-to-image → Image-to-video pipeline
- Text-to-Image
- Command:
- > A hyper-detailed, ultra-realistic, cinematic portrait of a fluffy white Ragdoll cat with striking sapphire-blue eyes and long black eyelashes. ...

- Image-to-Video
- Command:
- > The cat opened its mouth and made a sound, then licked its nose with its tongue.

---
💡 Creative Potential
With tools like AiToEarn官网 and AiToEarn博客:
- Generate high-quality AI content
- Publish on multiple platforms simultaneously
- Analyze performance & monetize efficiently
- Platforms include Douyin, Kwai, WeChat, YouTube, Instagram, LinkedIn, Threads, Pinterest, X (Twitter) and more.

---
🏃 Sports & Motion Examples
InfinityStar also excels at complex movement generation:

---
📽️ Interactive Long Video
Feed InfinityStar:
- An initial 5-second video
- New prompts to extend video content

---
🔍 Core Architecture — Spatio-Temporal Pyramid Modeling
InfinityStar’s unique approach:
- First Frame
- Treated as an image
- Modeled from coarse to fine (static appearance)
- Subsequent Clips
- Includes temporal + spatial dimensions (dynamic motion)
Key Benefit:
Decouples static & dynamic information.
Uses spatio-temporal autoregressive Transformer for both intra- and inter-pyramid dependencies.

---
🛠 Key Technologies
1. Efficient Visual Tokenizer
Converts visual data to discrete tokens via multi-scale residual quantization.
Innovations:
- Knowledge Inheritance: Pre-train with Video VAE weights for rapid convergence.
- Stochastic Quantizer Depth: Randomly drops fine-scale tokens to balance learning across scales.

---
2. Optimized Spacetime Autoregressive Transformer
Enhancements for long-context video generation:
- Semantic Scale Repetition: Refines global semantics multiple times for smooth motion.
- Spacetime Sparse Attention: Uses only essential past context to cut memory use & speed up attention computation.
- Spacetime RoPE Position Encoding: Encodes precise scale, time, height, width.
---
📊 Experimental Highlights
- Order-of-magnitude faster than DiT — single-pass autoregression vs. 20–100 iterative denoising steps.
- T2I performance: Strong results in GenEval & DPG benchmarks.
- T2V performance: Beats all previous autoregressive & DiT-based models (CogVideoX, HunyuanVideo).
- Human preference: InfinityStar-8B outranks HunyuanVideo-13B for instruction following.
- Speed: Generates 5s 720p video in <1 min on a single GPU.



---
📄 Resources
- Paper: https://arxiv.org/pdf/2511.04675
- Code: https://github.com/FoundationVision/InfinityStar
- Experience signup: http://opensource.bytedance.com/discord/invite
---
🌍 Monetization Opportunities
Combine InfinityStar with AiToEarn官网:
- AI generation + multi-platform distribution
- Analytics + model ranking (AI模型排名)
- Turn creative outputs into scalable, profitable assets

---
If you’d like, I can next prepare a condensed “cheat sheet” version of this InfinityStar overview for quick reference. Would you like me to make that?