Instructions to use H-oliday/SwiftVR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use H-oliday/SwiftVR with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("H-oliday/SwiftVR", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -22,27 +22,22 @@ license: apache-2.0
|
|
| 22 |
</p>
|
| 23 |
|
| 24 |
|
| 25 |
-
|
| 26 |
-
## Updates
|
| 27 |
---
|
|
|
|
| 28 |
- [2026/06] Release the inference code and pretrained weights π
|
| 29 |
|
| 30 |
|
| 31 |
|
| 32 |
|
| 33 |
-
|
| 34 |
-
## β¨ Highlights
|
| 35 |
---
|
| 36 |
-
|
| 37 |
- **Mask-free shifted-window self-attention (MFSWA).** Each spatial window is **pre-gathered into a dense tensor**, so every attention call reduces to a single standard scaled-dot-product (SDPA) call β *no attention mask, cyclic shift, or padding ever enters the graph*. This gives a **1.62Γ throughput gain over its full-attention teacher** at essentially identical quality, with **no dedicated sparse kernel**.
|
| 38 |
- **Restoration-aware Autoencoder (ReAE).** A lightweight encoderβdecoder jointly fine-tuned with the DiT in pixel space removes the heavy-3D-VAE / tiled-decoding bottleneck.
|
| 39 |
- **Causal chunk-wise streaming.** A minimal causal protocol (no rolling KV cache, no overlapped DiT inference) bounds the temporal axis, confining the residual \(\mathcal{O}(N^2)\) cost to the spatial axes.
|
| 40 |
- **Kernel-agnostic & portable.** The same checkpoint runs **bit-identically** across PyTorch SDPA, FlashAttention-2/3, SageAttention, and xFormers β no retraining, weight conversion, or kernel rewrite.
|
| 41 |
|
| 42 |
-
|
| 43 |
-
## π Results
|
| 44 |
---
|
| 45 |
-
|
| 46 |
### Efficiency at 2560Γ1440 (single H100, causal streaming, 24 frames)
|
| 47 |
|
| 48 |
| Metric | SeedVR2-3B (tile)| DOVE (tile)| FlashVSR-Tiny | **RVR (Ours)** |
|
|
@@ -58,10 +53,8 @@ license: apache-2.0
|
|
| 58 |
<img src="assets/qualitative.png" width="100%" alt="RVR teaser">
|
| 59 |
|
| 60 |
|
| 61 |
-
|
| 62 |
-
## π Installation
|
| 63 |
---
|
| 64 |
-
|
| 65 |
```bash
|
| 66 |
git clone https://github.com/Holiday/RVR.git
|
| 67 |
cd RVR
|
|
@@ -85,9 +78,8 @@ pip install -e .
|
|
| 85 |
</details>
|
| 86 |
|
| 87 |
|
| 88 |
-
|
| 89 |
-
## π Model Zoo
|
| 90 |
---
|
|
|
|
| 91 |
| Model Name | Date | Backbone | Link |
|
| 92 |
|---|---|---|---|
|
| 93 |
| RVR | 2026.06 | Wan2.2-TI2V-5B | [π€ HuggingFace](https://huggingface.co/H-oliday/RVR) |
|
|
@@ -107,9 +99,8 @@ checkpoints/
|
|
| 107 |
βββ diffusion_pytorch_model.safetensors
|
| 108 |
```
|
| 109 |
|
| 110 |
-
|
| 111 |
-
## π Quick Start
|
| 112 |
---
|
|
|
|
| 113 |
### Python API
|
| 114 |
|
| 115 |
```python
|
|
@@ -153,9 +144,8 @@ Use `--png` to write a PNG sequence.
|
|
| 153 |
|
| 154 |
|
| 155 |
|
| 156 |
-
|
| 157 |
-
## π Repository Structure
|
| 158 |
---
|
|
|
|
| 159 |
```
|
| 160 |
RVR/
|
| 161 |
βββ README.md
|
|
@@ -181,9 +171,8 @@ RVR/
|
|
| 181 |
|
| 182 |
|
| 183 |
|
| 184 |
-
|
| 185 |
-
## π¬ More Visual Results
|
| 186 |
---
|
|
|
|
| 187 |
> Full-length restored clips (low-quality input β RVR, played back to back).
|
| 188 |
|
| 189 |
|
|
@@ -193,8 +182,8 @@ RVR/
|
|
| 193 |
|
| 194 |
<video src="https://huggingface.co/H-oliday/RVR/resolve/main/assets/demo_3.mp4" controls width="100%"></video>
|
| 195 |
|
| 196 |
-
## π Citation
|
| 197 |
---
|
|
|
|
| 198 |
```bibtex
|
| 199 |
@article{yan2026rvr,
|
| 200 |
title = {RVR: One-step Generative Streaming Real-time Video Restoration},
|
|
@@ -205,15 +194,13 @@ RVR/
|
|
| 205 |
```
|
| 206 |
|
| 207 |
|
| 208 |
-
|
| 209 |
-
## π Acknowledgements
|
| 210 |
---
|
|
|
|
| 211 |
RVR builds on [Wan2.2-TI2V-5B](https://github.com/Wan-Video), the lightweight autoencoder [TAEHV](https://github.com/madebyollin/taehv), and the [RealBasicVSR](https://github.com/ckkelvinchan/RealBasicVSR) degradation pipeline. We thank the authors of [SeedVR2](https://github.com/ByteDance-Seed/SeedVR), [DOVE](https://github.com/zhengchen1999/DOVE), and [FlashVSR](https://github.com/OpenImagingLab/FlashVSR) for releasing strong baselines, and the [UltraVideo](https://github.com/Tele-AI/UltraVideo) team for the training corpus.
|
| 212 |
|
| 213 |
|
| 214 |
-
|
| 215 |
-
## π License
|
| 216 |
---
|
|
|
|
| 217 |
Released under the [Apache 2.0 License](LICENSE). The Wan2.2 backbone and any third-party weights remain subject to their original licenses.
|
| 218 |
|
| 219 |
<div align="center">
|
|
|
|
| 22 |
</p>
|
| 23 |
|
| 24 |
|
|
|
|
|
|
|
| 25 |
---
|
| 26 |
+
## Updates
|
| 27 |
- [2026/06] Release the inference code and pretrained weights π
|
| 28 |
|
| 29 |
|
| 30 |
|
| 31 |
|
|
|
|
|
|
|
| 32 |
---
|
| 33 |
+
## β¨ Highlights
|
| 34 |
- **Mask-free shifted-window self-attention (MFSWA).** Each spatial window is **pre-gathered into a dense tensor**, so every attention call reduces to a single standard scaled-dot-product (SDPA) call β *no attention mask, cyclic shift, or padding ever enters the graph*. This gives a **1.62Γ throughput gain over its full-attention teacher** at essentially identical quality, with **no dedicated sparse kernel**.
|
| 35 |
- **Restoration-aware Autoencoder (ReAE).** A lightweight encoderβdecoder jointly fine-tuned with the DiT in pixel space removes the heavy-3D-VAE / tiled-decoding bottleneck.
|
| 36 |
- **Causal chunk-wise streaming.** A minimal causal protocol (no rolling KV cache, no overlapped DiT inference) bounds the temporal axis, confining the residual \(\mathcal{O}(N^2)\) cost to the spatial axes.
|
| 37 |
- **Kernel-agnostic & portable.** The same checkpoint runs **bit-identically** across PyTorch SDPA, FlashAttention-2/3, SageAttention, and xFormers β no retraining, weight conversion, or kernel rewrite.
|
| 38 |
|
|
|
|
|
|
|
| 39 |
---
|
| 40 |
+
## π Results
|
| 41 |
### Efficiency at 2560Γ1440 (single H100, causal streaming, 24 frames)
|
| 42 |
|
| 43 |
| Metric | SeedVR2-3B (tile)| DOVE (tile)| FlashVSR-Tiny | **RVR (Ours)** |
|
|
|
|
| 53 |
<img src="assets/qualitative.png" width="100%" alt="RVR teaser">
|
| 54 |
|
| 55 |
|
|
|
|
|
|
|
| 56 |
---
|
| 57 |
+
## π Installation
|
| 58 |
```bash
|
| 59 |
git clone https://github.com/Holiday/RVR.git
|
| 60 |
cd RVR
|
|
|
|
| 78 |
</details>
|
| 79 |
|
| 80 |
|
|
|
|
|
|
|
| 81 |
---
|
| 82 |
+
## π Model Zoo
|
| 83 |
| Model Name | Date | Backbone | Link |
|
| 84 |
|---|---|---|---|
|
| 85 |
| RVR | 2026.06 | Wan2.2-TI2V-5B | [π€ HuggingFace](https://huggingface.co/H-oliday/RVR) |
|
|
|
|
| 99 |
βββ diffusion_pytorch_model.safetensors
|
| 100 |
```
|
| 101 |
|
|
|
|
|
|
|
| 102 |
---
|
| 103 |
+
## π Quick Start
|
| 104 |
### Python API
|
| 105 |
|
| 106 |
```python
|
|
|
|
| 144 |
|
| 145 |
|
| 146 |
|
|
|
|
|
|
|
| 147 |
---
|
| 148 |
+
## π Repository Structure
|
| 149 |
```
|
| 150 |
RVR/
|
| 151 |
βββ README.md
|
|
|
|
| 171 |
|
| 172 |
|
| 173 |
|
|
|
|
|
|
|
| 174 |
---
|
| 175 |
+
## π¬ More Visual Results
|
| 176 |
> Full-length restored clips (low-quality input β RVR, played back to back).
|
| 177 |
|
| 178 |
|
|
|
|
| 182 |
|
| 183 |
<video src="https://huggingface.co/H-oliday/RVR/resolve/main/assets/demo_3.mp4" controls width="100%"></video>
|
| 184 |
|
|
|
|
| 185 |
---
|
| 186 |
+
## π Citation
|
| 187 |
```bibtex
|
| 188 |
@article{yan2026rvr,
|
| 189 |
title = {RVR: One-step Generative Streaming Real-time Video Restoration},
|
|
|
|
| 194 |
```
|
| 195 |
|
| 196 |
|
|
|
|
|
|
|
| 197 |
---
|
| 198 |
+
## π Acknowledgements
|
| 199 |
RVR builds on [Wan2.2-TI2V-5B](https://github.com/Wan-Video), the lightweight autoencoder [TAEHV](https://github.com/madebyollin/taehv), and the [RealBasicVSR](https://github.com/ckkelvinchan/RealBasicVSR) degradation pipeline. We thank the authors of [SeedVR2](https://github.com/ByteDance-Seed/SeedVR), [DOVE](https://github.com/zhengchen1999/DOVE), and [FlashVSR](https://github.com/OpenImagingLab/FlashVSR) for releasing strong baselines, and the [UltraVideo](https://github.com/Tele-AI/UltraVideo) team for the training corpus.
|
| 200 |
|
| 201 |
|
|
|
|
|
|
|
| 202 |
---
|
| 203 |
+
## π License
|
| 204 |
Released under the [Apache 2.0 License](LICENSE). The Wan2.2 backbone and any third-party weights remain subject to their original licenses.
|
| 205 |
|
| 206 |
<div align="center">
|