File size: 4,164 Bytes
5384ffc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90cce1a
5384ffc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
pipeline_tag: image-to-video
library_name: diffusers
license: mit
---

# TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

This repository contains the official model weights for **TLB-VFI**, an efficient video-based diffusion model presented in the paper [TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation](https://huggingface.co/papers/2507.04984).

- 🌐 **Project Page**: [https://zonglinl.github.io/tlbvfi_page](https://zonglinl.github.io/tlbvfi_page)
- 💻 **Code**: [https://github.com/ZonglinL/TLB-VFI](https://github.com/ZonglinL/TLB-VFI)

<div align="center">
  <img src="https://github.com/ZonglinL/TLB-VFI/raw/main/images/visual1.png" width=95%>
</div>

## Abstract

Video Frame Interpolation (VFI) aims to predict the intermediate frame $I_n$ (we use n to denote time in videos to avoid notation overload with the timestep $t$ in diffusion models) based on two consecutive neighboring frames $I_0$ and $I_1$. Recent approaches apply diffusion models (both image-based and video-based) in this task and achieve strong performance. However, image-based diffusion models are unable to extract temporal information and are relatively inefficient compared to non-diffusion methods. Video-based diffusion models can extract temporal information, but they are too large in terms of training scale, model size, and inference time. To mitigate the above issues, we propose Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation (TLB-VFI), an efficient video-based diffusion model. By extracting rich temporal information from video inputs through our proposed 3D-wavelet gating and temporal-aware autoencoder, our method achieves 20% improvement in FID on the most challenging datasets over recent SOTA of image-based diffusion models. Meanwhile, due to the existence of rich temporal information, our method achieves strong performance while having 3times fewer parameters. Such a parameter reduction results in 2.3x speed up. By incorporating optical flow guidance, our method requires 9000x less training data and achieves over 20x fewer parameters than video-based diffusion models.

## Overview

TLB-VFI leverages temporal information extraction in the pixel space (3D wavelet) and latent space (3D convolution and attention) to improve the temporal consistency of the model.

<div align="center">
  <img src="https://github.com/ZonglinL/TLB-VFI/raw/main/images/overview.jpg" width=95%>
</div>

## Quantitative Results

Our method achieves state-of-the-art performance in LPIPS/FloLPIPS/FID among all recent SOTAs.

<div align="center">
  <img src="https://github.com/ZonglinL/TLB-VFI/raw/main/images/quant.png" width=95%>
</div>

## Qualitative Results

Our method achieves the best visual quality among all recent SOTAs. For more visualizations, please refer to our [project page](https://zonglinl.github.io/tlbvfi_page).

<div align="center">
  <img src="https://github.com/ZonglinL/TLB-VFI/raw/main/images/visual3.png" width=95%>
</div>

## Usage

For detailed instructions on setup, training, and evaluation, please refer to the [official GitHub repository](https://github.com/ZonglinL/TLB-VFI).

### Inference Example

You can perform inference using the provided scripts on the GitHub repository. Please ensure you have downloaded the trained model weights.

To interpolate 7 frames in between `frame0` and `frame1`:

```bash
python interpolate.py --resume_model path_to_model_weights --frame0 path_to_the_previous_frame --frame1 path_to_the_next_frame
```

To interpolate 1 frame in between:

```bash
python interpolate_one.py --resume_model path_to_model_weights --frame0 path_to_the_previous_frame --frame1 path_to_the_next_frame
```

## Citation

If you find this repository helpful for your research, please cite the paper:

```bibtex
@article{lyu2025tlbvfitemporalawarelatentbrownian,
      title={TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation}, 
      author={Zonglin Lyu and Chen Chen},
      year={2025},
      eprint={2507.04984},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
}
```