Abstract
LIVE is a long-horizon video world model that uses cycle-consistency and diffusion loss to control error accumulation during extended video generation.
Autoregressive video world models predict future visual observations conditioned on actions. While effective over short horizons, these models often struggle with long-horizon generation, as small prediction errors accumulate over time. Prior methods alleviate this by introducing pre-trained teacher models and sequence-level distribution matching, which incur additional computational cost and fail to prevent error propagation beyond the training horizon. In this work, we propose LIVE, a Long-horizon Interactive Video world modEl that enforces bounded error accumulation via a novel cycle-consistency objective, thereby eliminating the need for teacher-based distillation. Specifically, LIVE first performs a forward rollout from ground-truth frames and then applies a reverse generation process to reconstruct the initial state. The diffusion loss is subsequently computed on the reconstructed terminal state, providing an explicit constraint on long-horizon error propagation. Moreover, we provide an unified view that encompasses different approaches and introduce progressive training curriculum to stabilize training. Experiments demonstrate that LIVE achieves state-of-the-art performance on long-horizon benchmarks, generating stable, high-quality videos far beyond training rollout lengths.
Community
Project Page: https://junchao-cs.github.io/LIVE-demo/
Technical Paper: https://arxiv.org/pdf/2602.03747
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- BAgger: Backwards Aggregation for Mitigating Drift in Autoregressive Video Diffusion Models (2025)
- End-to-End Training for Autoregressive Video Diffusion via Self-Resampling (2025)
- VideoAR: Autoregressive Video Generation via Next-Frame&Scale Prediction (2026)
- LongVie 2: Multimodal Controllable Ultra-Long Video World Model (2025)
- Memorize-and-Generate: Towards Long-Term Consistency in Real-Time Video Generation (2025)
- Endless World: Real-Time 3D-Aware Long Video Generation (2025)
- Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper