Video-to-Video
Diffusers
Safetensors
H-oliday commited on
Commit
288d99a
Β·
verified Β·
1 Parent(s): 6c931e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -24
README.md CHANGED
@@ -22,27 +22,22 @@ license: apache-2.0
22
  </p>
23
 
24
 
25
-
26
- ## Updates
27
  ---
 
28
  - [2026/06] Release the inference code and pretrained weights πŸŽ‰
29
 
30
 
31
 
32
 
33
-
34
- ## ✨ Highlights
35
  ---
36
-
37
  - **Mask-free shifted-window self-attention (MFSWA).** Each spatial window is **pre-gathered into a dense tensor**, so every attention call reduces to a single standard scaled-dot-product (SDPA) call β€” *no attention mask, cyclic shift, or padding ever enters the graph*. This gives a **1.62Γ— throughput gain over its full-attention teacher** at essentially identical quality, with **no dedicated sparse kernel**.
38
  - **Restoration-aware Autoencoder (ReAE).** A lightweight encoder–decoder jointly fine-tuned with the DiT in pixel space removes the heavy-3D-VAE / tiled-decoding bottleneck.
39
  - **Causal chunk-wise streaming.** A minimal causal protocol (no rolling KV cache, no overlapped DiT inference) bounds the temporal axis, confining the residual \(\mathcal{O}(N^2)\) cost to the spatial axes.
40
  - **Kernel-agnostic & portable.** The same checkpoint runs **bit-identically** across PyTorch SDPA, FlashAttention-2/3, SageAttention, and xFormers β€” no retraining, weight conversion, or kernel rewrite.
41
 
42
-
43
- ## πŸ“Š Results
44
  ---
45
-
46
  ### Efficiency at 2560Γ—1440 (single H100, causal streaming, 24 frames)
47
 
48
  | Metric | SeedVR2-3B (tile)| DOVE (tile)| FlashVSR-Tiny | **RVR (Ours)** |
@@ -58,10 +53,8 @@ license: apache-2.0
58
  <img src="assets/qualitative.png" width="100%" alt="RVR teaser">
59
 
60
 
61
-
62
- ## πŸ›  Installation
63
  ---
64
-
65
  ```bash
66
  git clone https://github.com/Holiday/RVR.git
67
  cd RVR
@@ -85,9 +78,8 @@ pip install -e .
85
  </details>
86
 
87
 
88
-
89
- ## πŸ—‚ Model Zoo
90
  ---
 
91
  | Model Name | Date | Backbone | Link |
92
  |---|---|---|---|
93
  | RVR | 2026.06 | Wan2.2-TI2V-5B | [πŸ€— HuggingFace](https://huggingface.co/H-oliday/RVR) |
@@ -107,9 +99,8 @@ checkpoints/
107
  └── diffusion_pytorch_model.safetensors
108
  ```
109
 
110
-
111
- ## πŸš€ Quick Start
112
  ---
 
113
  ### Python API
114
 
115
  ```python
@@ -153,9 +144,8 @@ Use `--png` to write a PNG sequence.
153
 
154
 
155
 
156
-
157
- ## πŸ“ Repository Structure
158
  ---
 
159
  ```
160
  RVR/
161
  β”œβ”€β”€ README.md
@@ -181,9 +171,8 @@ RVR/
181
 
182
 
183
 
184
-
185
- ## 🎬 More Visual Results
186
  ---
 
187
  > Full-length restored clips (low-quality input β†’ RVR, played back to back).
188
 
189
 
@@ -193,8 +182,8 @@ RVR/
193
 
194
  <video src="https://huggingface.co/H-oliday/RVR/resolve/main/assets/demo_3.mp4" controls width="100%"></video>
195
 
196
- ## πŸ“– Citation
197
  ---
 
198
  ```bibtex
199
  @article{yan2026rvr,
200
  title = {RVR: One-step Generative Streaming Real-time Video Restoration},
@@ -205,15 +194,13 @@ RVR/
205
  ```
206
 
207
 
208
-
209
- ## πŸ™ Acknowledgements
210
  ---
 
211
  RVR builds on [Wan2.2-TI2V-5B](https://github.com/Wan-Video), the lightweight autoencoder [TAEHV](https://github.com/madebyollin/taehv), and the [RealBasicVSR](https://github.com/ckkelvinchan/RealBasicVSR) degradation pipeline. We thank the authors of [SeedVR2](https://github.com/ByteDance-Seed/SeedVR), [DOVE](https://github.com/zhengchen1999/DOVE), and [FlashVSR](https://github.com/OpenImagingLab/FlashVSR) for releasing strong baselines, and the [UltraVideo](https://github.com/Tele-AI/UltraVideo) team for the training corpus.
212
 
213
 
214
-
215
- ## πŸ“œ License
216
  ---
 
217
  Released under the [Apache 2.0 License](LICENSE). The Wan2.2 backbone and any third-party weights remain subject to their original licenses.
218
 
219
  <div align="center">
 
22
  </p>
23
 
24
 
 
 
25
  ---
26
+ ## Updates
27
  - [2026/06] Release the inference code and pretrained weights πŸŽ‰
28
 
29
 
30
 
31
 
 
 
32
  ---
33
+ ## ✨ Highlights
34
  - **Mask-free shifted-window self-attention (MFSWA).** Each spatial window is **pre-gathered into a dense tensor**, so every attention call reduces to a single standard scaled-dot-product (SDPA) call β€” *no attention mask, cyclic shift, or padding ever enters the graph*. This gives a **1.62Γ— throughput gain over its full-attention teacher** at essentially identical quality, with **no dedicated sparse kernel**.
35
  - **Restoration-aware Autoencoder (ReAE).** A lightweight encoder–decoder jointly fine-tuned with the DiT in pixel space removes the heavy-3D-VAE / tiled-decoding bottleneck.
36
  - **Causal chunk-wise streaming.** A minimal causal protocol (no rolling KV cache, no overlapped DiT inference) bounds the temporal axis, confining the residual \(\mathcal{O}(N^2)\) cost to the spatial axes.
37
  - **Kernel-agnostic & portable.** The same checkpoint runs **bit-identically** across PyTorch SDPA, FlashAttention-2/3, SageAttention, and xFormers β€” no retraining, weight conversion, or kernel rewrite.
38
 
 
 
39
  ---
40
+ ## πŸ“Š Results
41
  ### Efficiency at 2560Γ—1440 (single H100, causal streaming, 24 frames)
42
 
43
  | Metric | SeedVR2-3B (tile)| DOVE (tile)| FlashVSR-Tiny | **RVR (Ours)** |
 
53
  <img src="assets/qualitative.png" width="100%" alt="RVR teaser">
54
 
55
 
 
 
56
  ---
57
+ ## πŸ›  Installation
58
  ```bash
59
  git clone https://github.com/Holiday/RVR.git
60
  cd RVR
 
78
  </details>
79
 
80
 
 
 
81
  ---
82
+ ## πŸ—‚ Model Zoo
83
  | Model Name | Date | Backbone | Link |
84
  |---|---|---|---|
85
  | RVR | 2026.06 | Wan2.2-TI2V-5B | [πŸ€— HuggingFace](https://huggingface.co/H-oliday/RVR) |
 
99
  └── diffusion_pytorch_model.safetensors
100
  ```
101
 
 
 
102
  ---
103
+ ## πŸš€ Quick Start
104
  ### Python API
105
 
106
  ```python
 
144
 
145
 
146
 
 
 
147
  ---
148
+ ## πŸ“ Repository Structure
149
  ```
150
  RVR/
151
  β”œβ”€β”€ README.md
 
171
 
172
 
173
 
 
 
174
  ---
175
+ ## 🎬 More Visual Results
176
  > Full-length restored clips (low-quality input β†’ RVR, played back to back).
177
 
178
 
 
182
 
183
  <video src="https://huggingface.co/H-oliday/RVR/resolve/main/assets/demo_3.mp4" controls width="100%"></video>
184
 
 
185
  ---
186
+ ## πŸ“– Citation
187
  ```bibtex
188
  @article{yan2026rvr,
189
  title = {RVR: One-step Generative Streaming Real-time Video Restoration},
 
194
  ```
195
 
196
 
 
 
197
  ---
198
+ ## πŸ™ Acknowledgements
199
  RVR builds on [Wan2.2-TI2V-5B](https://github.com/Wan-Video), the lightweight autoencoder [TAEHV](https://github.com/madebyollin/taehv), and the [RealBasicVSR](https://github.com/ckkelvinchan/RealBasicVSR) degradation pipeline. We thank the authors of [SeedVR2](https://github.com/ByteDance-Seed/SeedVR), [DOVE](https://github.com/zhengchen1999/DOVE), and [FlashVSR](https://github.com/OpenImagingLab/FlashVSR) for releasing strong baselines, and the [UltraVideo](https://github.com/Tele-AI/UltraVideo) team for the training corpus.
200
 
201
 
 
 
202
  ---
203
+ ## πŸ“œ License
204
  Released under the [Apache 2.0 License](LICENSE). The Wan2.2 backbone and any third-party weights remain subject to their original licenses.
205
 
206
  <div align="center">