ZhengChen1999 commited on
Commit
55ea86b
·
verified ·
1 Parent(s): 8b458b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -15
README.md CHANGED
@@ -1,9 +1,8 @@
1
  <div align="center">
2
- <p align="center"> <img src="https://zheng-chen.cn/DOVE/assets/DOVE_logo.png" width="480px"> </p>
3
  </div>
4
 
5
 
6
-
7
  # DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
8
 
9
  [Zheng Chen](https://zhengchen1999.github.io/), [Zichen Zou](https://github.com/zzctmd), [Kewei Zhang](), [Xiongfei Su](https://ieeexplore.ieee.org/author/37086348852), [Xin Yuan](https://en.westlake.edu.cn/faculty/xin-yuan.html), [Yong Guo](https://www.guoyongcs.com/), and [Yulun Zhang](http://yulunzhang.com/), "DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution", NeurIPS 2025
@@ -27,6 +26,8 @@
27
  > **Abstract:** Diffusion models have demonstrated promising performance in real-world video super-resolution (VSR). However, the dozens of sampling steps they require, make inference extremely slow. Sampling acceleration techniques, particularly single-step, provide a potential solution. Nonetheless, achieving one step in VSR remains challenging, due to the high training overhead on video data and stringent fidelity demands. To tackle the above issues, we propose DOVE, an efficient one-step diffusion model for real-world VSR. DOVE is obtained by fine-tuning a pretrained video diffusion model (*i.e.*, CogVideoX). To effectively train DOVE, we introduce the latent–pixel training strategy. The strategy employs a two-stage scheme to gradually adapt the model to the video super-resolution task.
28
  > Meanwhile, we design a video processing pipeline to construct a high-quality dataset tailored for VSR, termed HQ-VSR. Fine-tuning on this dataset further enhances the restoration capability of DOVE. Extensive experiments show that DOVE exhibits comparable or superior performance to multi-step diffusion-based VSR methods. It also offers outstanding inference efficiency, achieving up to a **28×** speed-up over existing methods such as MGLD-VSR.
29
 
 
 
30
  ---
31
 
32
 
@@ -44,17 +45,20 @@
44
 
45
 
46
 
 
47
  ---
48
 
49
  ### Training Strategy
50
 
51
- ![](https://zheng-chen.cn/DOVE/assets/Strategy.png)
52
 
53
  ---
54
 
55
  ### Video Processing Pipeline
56
 
57
- ![](https://zheng-chen.cn/DOVE/assets/Pipeline.png)
 
 
58
 
59
 
60
  ## 🔖 TODO
@@ -257,12 +261,12 @@ We achieve state-of-the-art performance on real-world video super-resolution. Vi
257
  - Results in Tab. 2 of the main paper
258
 
259
  <p align="center">
260
- <img width="900" src="https://zheng-chen.cn/DOVE/assets/Quantitative.png">
261
  </p>
262
  - Complexity Comparison in Tab. 2 of the supplementary material
263
 
264
  <p align="center">
265
- <img width="900" src="https://zheng-chen.cn/DOVE/assets/Quantitative-2.png">
266
  </p>
267
 
268
  </details>
@@ -273,16 +277,18 @@ We achieve state-of-the-art performance on real-world video super-resolution. Vi
273
  - Results in Fig. 4 of the main paper
274
 
275
  <p align="center">
276
- <img width="900" src="https://zheng-chen.cn/DOVE/assets/Qualitative-1.png">
277
  </p>
278
  <details>
279
  <summary>More Qualitative Results</summary>
280
 
281
 
 
 
282
  - More results in Fig. 3 of the supplementary material
283
 
284
  <p align="center">
285
- <img width="900" src="https://zheng-chen.cn/DOVE/assets/Qualitative-2-1.png">
286
  </p>
287
 
288
 
@@ -290,31 +296,31 @@ We achieve state-of-the-art performance on real-world video super-resolution. Vi
290
  - More results in Fig. 4 of the supplementary material
291
 
292
  <p align="center">
293
- <img width="900" src="https://zheng-chen.cn/DOVE/assets/Qualitative-2-2.png">
294
  </p>
295
 
296
 
297
  - More results in Fig. 5 of the supplementary material
298
 
299
  <p align="center">
300
- <img width="900" src="https://zheng-chen.cn/DOVE/assets/Qualitative-3-1.png">
301
- <img width="900" src="https://zheng-chen.cn/DOVE/assets/Qualitative-3-2.png">
302
  </p>
303
 
304
 
305
  - More results in Fig. 6 of the supplementary material
306
 
307
  <p align="center">
308
- <img width="900" src="https://zheng-chen.cn/DOVE/assets/Qualitative-4-1.png">
309
- <img width="900" src="https://zheng-chen.cn/DOVE/assets/Qualitative-4-2.png">
310
  </p>
311
 
312
 
313
  - More results in Fig. 7 of the supplementary material
314
 
315
  <p align="center">
316
- <img width="900" src="https://zheng-chen.cn/DOVE/assets/Qualitative-5-1.png">
317
- <img width="900" src="https://zheng-chen.cn/DOVE/assets/Qualitative-5-2.png">
318
  </p>
319
 
320
  </details>
 
1
  <div align="center">
2
+ <p align="center"> <img src="assets/DOVE_logo.png" width="480px"> </p>
3
  </div>
4
 
5
 
 
6
  # DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
7
 
8
  [Zheng Chen](https://zhengchen1999.github.io/), [Zichen Zou](https://github.com/zzctmd), [Kewei Zhang](), [Xiongfei Su](https://ieeexplore.ieee.org/author/37086348852), [Xin Yuan](https://en.westlake.edu.cn/faculty/xin-yuan.html), [Yong Guo](https://www.guoyongcs.com/), and [Yulun Zhang](http://yulunzhang.com/), "DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution", NeurIPS 2025
 
26
  > **Abstract:** Diffusion models have demonstrated promising performance in real-world video super-resolution (VSR). However, the dozens of sampling steps they require, make inference extremely slow. Sampling acceleration techniques, particularly single-step, provide a potential solution. Nonetheless, achieving one step in VSR remains challenging, due to the high training overhead on video data and stringent fidelity demands. To tackle the above issues, we propose DOVE, an efficient one-step diffusion model for real-world VSR. DOVE is obtained by fine-tuning a pretrained video diffusion model (*i.e.*, CogVideoX). To effectively train DOVE, we introduce the latent–pixel training strategy. The strategy employs a two-stage scheme to gradually adapt the model to the video super-resolution task.
27
  > Meanwhile, we design a video processing pipeline to construct a high-quality dataset tailored for VSR, termed HQ-VSR. Fine-tuning on this dataset further enhances the restoration capability of DOVE. Extensive experiments show that DOVE exhibits comparable or superior performance to multi-step diffusion-based VSR methods. It also offers outstanding inference efficiency, achieving up to a **28×** speed-up over existing methods such as MGLD-VSR.
28
 
29
+ ![](./assets/Compare.png)
30
+
31
  ---
32
 
33
 
 
45
 
46
 
47
 
48
+
49
  ---
50
 
51
  ### Training Strategy
52
 
53
+ ![](./assets/Strategy.png)
54
 
55
  ---
56
 
57
  ### Video Processing Pipeline
58
 
59
+ ![](./assets/Pipeline.png)
60
+
61
+
62
 
63
 
64
  ## 🔖 TODO
 
261
  - Results in Tab. 2 of the main paper
262
 
263
  <p align="center">
264
+ <img width="900" src="assets/Quantitative.png">
265
  </p>
266
  - Complexity Comparison in Tab. 2 of the supplementary material
267
 
268
  <p align="center">
269
+ <img width="900" src="assets/Quantitative-2.png">
270
  </p>
271
 
272
  </details>
 
277
  - Results in Fig. 4 of the main paper
278
 
279
  <p align="center">
280
+ <img width="900" src="assets/Qualitative-1.png">
281
  </p>
282
  <details>
283
  <summary>More Qualitative Results</summary>
284
 
285
 
286
+
287
+
288
  - More results in Fig. 3 of the supplementary material
289
 
290
  <p align="center">
291
+ <img width="900" src="assets/Qualitative-2-1.png">
292
  </p>
293
 
294
 
 
296
  - More results in Fig. 4 of the supplementary material
297
 
298
  <p align="center">
299
+ <img width="900" src="assets/Qualitative-2-2.png">
300
  </p>
301
 
302
 
303
  - More results in Fig. 5 of the supplementary material
304
 
305
  <p align="center">
306
+ <img width="900" src="assets/Qualitative-3-1.png">
307
+ <img width="900" src="assets/Qualitative-3-2.png">
308
  </p>
309
 
310
 
311
  - More results in Fig. 6 of the supplementary material
312
 
313
  <p align="center">
314
+ <img width="900" src="assets/Qualitative-4-1.png">
315
+ <img width="900" src="assets/Qualitative-4-2.png">
316
  </p>
317
 
318
 
319
  - More results in Fig. 7 of the supplementary material
320
 
321
  <p align="center">
322
+ <img width="900" src="assets/Qualitative-5-1.png">
323
+ <img width="900" src="assets/Qualitative-5-2.png">
324
  </p>
325
 
326
  </details>