arxiv:2512.15702

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling

Published on Dec 17

· Submitted by

Yuwei Guo on Dec 18

ByteDance Seed

Upvote

Authors:

Abstract

Resampling Forcing is introduced as a teacher-free framework to train autoregressive video diffusion models with improved temporal consistency using self-resampling and history routing.

AI-generated summary

Autoregressive video diffusion models hold promise for world simulation but are vulnerable to exposure bias arising from the train-test mismatch. While recent works address this via post-training, they typically rely on a bidirectional teacher model or online discriminator. To achieve an end-to-end solution, we introduce Resampling Forcing, a teacher-free framework that enables training autoregressive video models from scratch and at scale. Central to our approach is a self-resampling scheme that simulates inference-time model errors on history frames during training. Conditioned on these degraded histories, a sparse causal mask enforces temporal causality while enabling parallel training with frame-level diffusion loss. To facilitate efficient long-horizon generation, we further introduce history routing, a parameter-free mechanism that dynamically retrieves the top-k most relevant history frames for each query. Experiments demonstrate that our approach achieves performance comparable to distillation-based baselines while exhibiting superior temporal consistency on longer videos owing to native-length training.

View arXiv page View PDF Project page Add to collection

Community

guoyww

Paper submitter 6 days ago

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling

suruoxi

6 days ago

Hi!

The method is proposed to overcome limitations of self-forcing (reliance on teacher model, GAN loss..). Why does it need a warmup by adopting self-forcing objective? How's the result without self-forcing warmup?

The abstract claims to enable training AR model from scratch? Any results without pretrained weights?