No Need for DiT: ByteDance Generates 5s 720p Video in 1 Minute on Single GPU with Autoregression | NeurIPS’25 Oral

No Need for DiT: ByteDance Generates 5s 720p Video in 1 Minute on Single GPU with Autoregression | NeurIPS’25 Oral

🎯 NeurIPS’25 Oral — InfinityStar vs. DiT

A new NeurIPS’25 Oral paper from the ByteDance Commercialization Technology Team delivers a strong challenge to the Diffusion Transformer (DiT), which has long dominated video generation.

image

---

Background — DiT’s Dominance and Drawbacks

Ever since its debut, DiT has been the standard in video generation.

However:

  • High computational complexity
  • Heavy resource consumption
  • Slow generation speeds

---

InfinityStar — Balancing Quality & Efficiency

InfinityStar addresses these issues by offering:

  • High-quality video generation
  • Significantly improved efficiency
  • A unified architecture for diverse tasks
image

Example Outputs

Fun animation clips created by InfinityStar:

image

Watch the demo

---

✨ Key Highlights of InfinityStar

  • First discrete autoregressive video generator to beat diffusion models on VBench.
  • Dramatically faster video generation — bye-bye hundreds of denoising steps.
  • Multi-task capability:
  • Text-to-image
  • Text-to-video
  • Image-to-video
  • Interactive long video generation

---

🚀 Try InfinityStar Yourself

Step 1 — Join the Discord Community

  • Go to: http://opensource.bytedance.com/discord/invite
  • Log in and join

Step 2 — Explore Functions

Options include:

  • Text-to-video
  • Image-to-video
  • (demo videos above used `i2v-generate-horizontal-1`)
image

---

🎨 Linked Workflow Example

Goal: Text-to-image → Image-to-video pipeline

  • Text-to-Image
  • Command:
  • > A hyper-detailed, ultra-realistic, cinematic portrait of a fluffy white Ragdoll cat with striking sapphire-blue eyes and long black eyelashes. ...
image
  • Image-to-Video
  • Command:
  • > The cat opened its mouth and made a sound, then licked its nose with its tongue.
image

Video link

---

💡 Creative Potential

With tools like AiToEarn官网 and AiToEarn博客:

  • Generate high-quality AI content
  • Publish on multiple platforms simultaneously
  • Analyze performance & monetize efficiently
  • Platforms include Douyin, Kwai, WeChat, YouTube, Instagram, LinkedIn, Threads, Pinterest, X (Twitter) and more.
image

---

🏃 Sports & Motion Examples

InfinityStar also excels at complex movement generation:

image

Watch the demo

---

📽️ Interactive Long Video

Feed InfinityStar:

  • An initial 5-second video
  • New prompts to extend video content
image

Video link

---

🔍 Core Architecture — Spatio-Temporal Pyramid Modeling

InfinityStar’s unique approach:

  • First Frame
  • Treated as an image
  • Modeled from coarse to fine (static appearance)
  • Subsequent Clips
  • Includes temporal + spatial dimensions (dynamic motion)

Key Benefit:

Decouples static & dynamic information.

Uses spatio-temporal autoregressive Transformer for both intra- and inter-pyramid dependencies.

image

---

🛠 Key Technologies

1. Efficient Visual Tokenizer

Converts visual data to discrete tokens via multi-scale residual quantization.

Innovations:

  • Knowledge Inheritance: Pre-train with Video VAE weights for rapid convergence.
  • Stochastic Quantizer Depth: Randomly drops fine-scale tokens to balance learning across scales.
image

---

2. Optimized Spacetime Autoregressive Transformer

Enhancements for long-context video generation:

  • Semantic Scale Repetition: Refines global semantics multiple times for smooth motion.
  • Spacetime Sparse Attention: Uses only essential past context to cut memory use & speed up attention computation.
  • Spacetime RoPE Position Encoding: Encodes precise scale, time, height, width.

---

📊 Experimental Highlights

  • Order-of-magnitude faster than DiT — single-pass autoregression vs. 20–100 iterative denoising steps.
  • T2I performance: Strong results in GenEval & DPG benchmarks.
  • T2V performance: Beats all previous autoregressive & DiT-based models (CogVideoX, HunyuanVideo).
  • Human preference: InfinityStar-8B outranks HunyuanVideo-13B for instruction following.
  • Speed: Generates 5s 720p video in <1 min on a single GPU.
image
image
image

---

📄 Resources

  • Paper: https://arxiv.org/pdf/2511.04675
  • Code: https://github.com/FoundationVision/InfinityStar
  • Experience signup: http://opensource.bytedance.com/discord/invite

---

🌍 Monetization Opportunities

Combine InfinityStar with AiToEarn官网:

  • AI generation + multi-platform distribution
  • Analytics + model ranking (AI模型排名)
  • Turn creative outputs into scalable, profitable assets
image

---

If you’d like, I can next prepare a condensed “cheat sheet” version of this InfinityStar overview for quick reference. Would you like me to make that?

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.