LDF-VFI: Towards Holistic Modeling for Video Frame Interpolation with Auto-regressive Diffusion Transformers

This repository contains the weights for LDF-VFI (Local Diffusion Forcing for Video Frame Interpolation), as introduced in the paper Towards Holistic Modeling for Video Frame Interpolation with Auto-regressive Diffusion Transformers.

[Paper] [Project Page] [GitHub]

Introduction

Existing video frame interpolation (VFI) methods often adopt a frame-centric approach, processing videos as independent short segments (e.g., triplets), which leads to temporal inconsistencies and motion artifacts. To overcome this, we propose a holistic, video-centric paradigm named Local Diffusion Forcing for Video Frame Interpolation (LDF-VFI).

Our framework is built upon an auto-regressive diffusion transformer that models the entire video sequence to ensure long-range temporal coherence. LDF-VFI incorporates sparse, local attention and tiled VAE encoding, enabling efficient processing of long sequences and generalization to arbitrary spatial resolutions (e.g., 4K) at inference without retraining.

Key Features

Auto-regressive Diffusion Transformer: Models the entire video sequence for long-range temporal coherence.
Skip-concatenate Sampling: A novel strategy to maintain temporal stability and mitigate error accumulation.
Resolution Generalization: Supports arbitrary spatial resolutions (including 4K) at inference time.
Enhanced Conditional VAE: Leverages multi-scale features from input videos to improve reconstruction fidelity.

Usage

For installation and usage instructions, please refer to the official GitHub repository.

Citation

If you find this work helpful, please cite:

@misc{peng2026holisticmodelingvideoframe,
      title={Towards Holistic Modeling for Video Frame Interpolation with Auto-regressive Diffusion Transformers}, 
      author={Xinyu Peng and Han Li and Yuyang Huang and Ziyang Zheng and Yaoming Wang and Xin Chen and Wenrui Dai and Chenglin Li and Junni Zou and Hongkai Xiong},
      year={2026},
      eprint={2601.14959},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.14959}, 
}

Downloads last month: -

Paper for onecat-ai/LDF-VFI

Towards Holistic Modeling for Video Frame Interpolation with Auto-regressive Diffusion Transformers

Paper • 2601.14959 • Published Jan 21