Improve model card and add metadata
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,16 +1,47 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
|
| 8 |
-
Xin Chen, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong
|
| 9 |
|
| 10 |
-
|
| 11 |
|
| 12 |
-
|
| 13 |
-
- [Github](https://github.com/xypeng9903/LDF-VFI)
|
| 14 |
|
| 15 |
-
**
|
| 16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
library_name: diffusers
|
| 4 |
+
pipeline_tag: image-to-video
|
| 5 |
+
tags:
|
| 6 |
+
- video-frame-interpolation
|
| 7 |
+
- vfi
|
| 8 |
+
- diffusion-transformer
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# LDF-VFI: Towards Holistic Modeling for Video Frame Interpolation with Auto-regressive Diffusion Transformers
|
| 12 |
|
| 13 |
+
This repository contains the weights for **LDF-VFI** (Local Diffusion Forcing for Video Frame Interpolation), as introduced in the paper [Towards Holistic Modeling for Video Frame Interpolation with Auto-regressive Diffusion Transformers](https://huggingface.co/papers/2601.14959).
|
|
|
|
| 14 |
|
| 15 |
+
[[Paper](https://arxiv.org/abs/2601.14959)] [[Project Page](https://xypeng9903.github.io/ldf-vfi-web/)] [[GitHub](https://github.com/xypeng9903/LDF-VFI)]
|
| 16 |
|
| 17 |
+
## Introduction
|
|
|
|
| 18 |
|
| 19 |
+
Existing video frame interpolation (VFI) methods often adopt a frame-centric approach, processing videos as independent short segments (e.g., triplets), which leads to temporal inconsistencies and motion artifacts. To overcome this, we propose a holistic, video-centric paradigm named **L**ocal **D**iffusion **F**orcing for **V**ideo **F**rame **I**nterpolation (LDF-VFI).
|
| 20 |
+
|
| 21 |
+
Our framework is built upon an auto-regressive diffusion transformer that models the entire video sequence to ensure long-range temporal coherence. LDF-VFI incorporates sparse, local attention and tiled VAE encoding, enabling efficient processing of long sequences and generalization to arbitrary spatial resolutions (e.g., 4K) at inference without retraining.
|
| 22 |
+
|
| 23 |
+
## Key Features
|
| 24 |
+
|
| 25 |
+
- **Auto-regressive Diffusion Transformer**: Models the entire video sequence for long-range temporal coherence.
|
| 26 |
+
- **Skip-concatenate Sampling**: A novel strategy to maintain temporal stability and mitigate error accumulation.
|
| 27 |
+
- **Resolution Generalization**: Supports arbitrary spatial resolutions (including 4K) at inference time.
|
| 28 |
+
- **Enhanced Conditional VAE**: Leverages multi-scale features from input videos to improve reconstruction fidelity.
|
| 29 |
+
|
| 30 |
+
## Usage
|
| 31 |
+
|
| 32 |
+
For installation and usage instructions, please refer to the [official GitHub repository](https://github.com/xypeng9903/LDF-VFI).
|
| 33 |
+
|
| 34 |
+
## Citation
|
| 35 |
+
|
| 36 |
+
If you find this work helpful, please cite:
|
| 37 |
+
```bibtex
|
| 38 |
+
@misc{peng2026holisticmodelingvideoframe,
|
| 39 |
+
title={Towards Holistic Modeling for Video Frame Interpolation with Auto-regressive Diffusion Transformers},
|
| 40 |
+
author={Xinyu Peng and Han Li and Yuyang Huang and Ziyang Zheng and Yaoming Wang and Xin Chen and Wenrui Dai and Chenglin Li and Junni Zou and Hongkai Xiong},
|
| 41 |
+
year={2026},
|
| 42 |
+
eprint={2601.14959},
|
| 43 |
+
archivePrefix={arXiv},
|
| 44 |
+
primaryClass={cs.CV},
|
| 45 |
+
url={https://arxiv.org/abs/2601.14959},
|
| 46 |
+
}
|
| 47 |
+
```
|