Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction

Published in CVPR 2026, 2026

This paper proposes a training-free, layer-adaptive framework for generating long videos using pre-trained video diffusion models. It addresses frame-level relative position and context-length out-of-distribution problems by introducing multi-granularity video-based relative position re-encoding (VRPR) and tiered sparse attention (TSA) with attention sink. The method achieves state-of-the-art performance in long video generation and can be seamlessly integrated into various leading video diffusion models.

Recommended citation: J Tian, C Song, W Cheng, C Zhang. (2026). "Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction." CVPR 2026.
Download Paper