H-oliday
/

SwiftVR

Video-to-Video

Diffusers

Safetensors

Model card Files Files and versions

xet

Community

H-oliday commited on 9 days ago

Commit

288d99a

verified ·

1 Parent(s): 6c931e0

Update README.md

Browse files

Files changed (1) hide show

README.md +11 -24

README.md CHANGED Viewed

@@ -22,27 +22,22 @@ license: apache-2.0
 </p>
-## Updates
 ---
 - [2026/06] Release the inference code and pretrained weights 🎉
-## ✨ Highlights
 ---
 - **Mask-free shifted-window self-attention (MFSWA).** Each spatial window is **pre-gathered into a dense tensor**, so every attention call reduces to a single standard scaled-dot-product (SDPA) call — *no attention mask, cyclic shift, or padding ever enters the graph*. This gives a **1.62× throughput gain over its full-attention teacher** at essentially identical quality, with **no dedicated sparse kernel**.
 - **Restoration-aware Autoencoder (ReAE).** A lightweight encoder–decoder jointly fine-tuned with the DiT in pixel space removes the heavy-3D-VAE / tiled-decoding bottleneck.
 - **Causal chunk-wise streaming.** A minimal causal protocol (no rolling KV cache, no overlapped DiT inference) bounds the temporal axis, confining the residual \(\mathcal{O}(N^2)\) cost to the spatial axes.
 - **Kernel-agnostic & portable.** The same checkpoint runs **bit-identically** across PyTorch SDPA, FlashAttention-2/3, SageAttention, and xFormers — no retraining, weight conversion, or kernel rewrite.
-## 📊 Results
 ---
 ### Efficiency at 2560×1440 (single H100, causal streaming, 24 frames)
 | Metric | SeedVR2-3B (tile)| DOVE (tile)| FlashVSR-Tiny | **RVR (Ours)** |
@@ -58,10 +53,8 @@ license: apache-2.0
 <img src="assets/qualitative.png" width="100%" alt="RVR teaser">
-## 🛠 Installation
 ---
 ```bash
 git clone https://github.com/Holiday/RVR.git
 cd RVR
@@ -85,9 +78,8 @@ pip install -e .
 </details>
-## 🗂 Model Zoo
 ---
 | Model Name | Date | Backbone | Link |
 |---|---|---|---|
 | RVR | 2026.06 | Wan2.2-TI2V-5B | [🤗 HuggingFace](https://huggingface.co/H-oliday/RVR) |
@@ -107,9 +99,8 @@ checkpoints/
     └── diffusion_pytorch_model.safetensors
 ```
-## 🚀 Quick Start
 ---
 ### Python API
 ```python
@@ -153,9 +144,8 @@ Use `--png` to write a PNG sequence.
-## 📁 Repository Structure
 ---
 ```
 RVR/
 ├── README.md
@@ -181,9 +171,8 @@ RVR/
-## 🎬 More Visual Results
 ---
 > Full-length restored clips (low-quality input → RVR, played back to back).
@@ -193,8 +182,8 @@ RVR/
 <video src="https://huggingface.co/H-oliday/RVR/resolve/main/assets/demo_3.mp4" controls width="100%"></video>
-## 📖 Citation
 ---
 ```bibtex
 @article{yan2026rvr,
   title   = {RVR: One-step Generative Streaming Real-time Video Restoration},
@@ -205,15 +194,13 @@ RVR/
 ```
-## 🙏 Acknowledgements
 ---
 RVR builds on [Wan2.2-TI2V-5B](https://github.com/Wan-Video), the lightweight autoencoder [TAEHV](https://github.com/madebyollin/taehv), and the [RealBasicVSR](https://github.com/ckkelvinchan/RealBasicVSR) degradation pipeline. We thank the authors of [SeedVR2](https://github.com/ByteDance-Seed/SeedVR), [DOVE](https://github.com/zhengchen1999/DOVE), and [FlashVSR](https://github.com/OpenImagingLab/FlashVSR) for releasing strong baselines, and the [UltraVideo](https://github.com/Tele-AI/UltraVideo) team for the training corpus.
-## 📜 License
 ---
 Released under the [Apache 2.0 License](LICENSE). The Wan2.2 backbone and any third-party weights remain subject to their original licenses.
 <div align="center">

 </p>
 ---
+## Updates
 - [2026/06] Release the inference code and pretrained weights 🎉
 ---
+## ✨ Highlights
 - **Mask-free shifted-window self-attention (MFSWA).** Each spatial window is **pre-gathered into a dense tensor**, so every attention call reduces to a single standard scaled-dot-product (SDPA) call — *no attention mask, cyclic shift, or padding ever enters the graph*. This gives a **1.62× throughput gain over its full-attention teacher** at essentially identical quality, with **no dedicated sparse kernel**.
 - **Restoration-aware Autoencoder (ReAE).** A lightweight encoder–decoder jointly fine-tuned with the DiT in pixel space removes the heavy-3D-VAE / tiled-decoding bottleneck.
 - **Causal chunk-wise streaming.** A minimal causal protocol (no rolling KV cache, no overlapped DiT inference) bounds the temporal axis, confining the residual \(\mathcal{O}(N^2)\) cost to the spatial axes.
 - **Kernel-agnostic & portable.** The same checkpoint runs **bit-identically** across PyTorch SDPA, FlashAttention-2/3, SageAttention, and xFormers — no retraining, weight conversion, or kernel rewrite.
 ---
+## 📊 Results
 ### Efficiency at 2560×1440 (single H100, causal streaming, 24 frames)
 | Metric | SeedVR2-3B (tile)| DOVE (tile)| FlashVSR-Tiny | **RVR (Ours)** |
 <img src="assets/qualitative.png" width="100%" alt="RVR teaser">
 ---
+## 🛠 Installation
 ```bash
 git clone https://github.com/Holiday/RVR.git
 cd RVR
 </details>
 ---
+## 🗂 Model Zoo
 | Model Name | Date | Backbone | Link |
 |---|---|---|---|
 | RVR | 2026.06 | Wan2.2-TI2V-5B | [🤗 HuggingFace](https://huggingface.co/H-oliday/RVR) |
     └── diffusion_pytorch_model.safetensors
 ```
 ---
+## 🚀 Quick Start
 ### Python API
 ```python
 ---
+## 📁 Repository Structure
 ```
 RVR/
 ├── README.md
 ---
+## 🎬 More Visual Results
 > Full-length restored clips (low-quality input → RVR, played back to back).
 <video src="https://huggingface.co/H-oliday/RVR/resolve/main/assets/demo_3.mp4" controls width="100%"></video>
 ---
+## 📖 Citation
 ```bibtex
 @article{yan2026rvr,
   title   = {RVR: One-step Generative Streaming Real-time Video Restoration},
 ```
 ---
+## 🙏 Acknowledgements
 RVR builds on [Wan2.2-TI2V-5B](https://github.com/Wan-Video), the lightweight autoencoder [TAEHV](https://github.com/madebyollin/taehv), and the [RealBasicVSR](https://github.com/ckkelvinchan/RealBasicVSR) degradation pipeline. We thank the authors of [SeedVR2](https://github.com/ByteDance-Seed/SeedVR), [DOVE](https://github.com/zhengchen1999/DOVE), and [FlashVSR](https://github.com/OpenImagingLab/FlashVSR) for releasing strong baselines, and the [UltraVideo](https://github.com/Tele-AI/UltraVideo) team for the training corpus.
 ---
+## 📜 License
 Released under the [Apache 2.0 License](LICENSE). The Wan2.2 backbone and any third-party weights remain subject to their original licenses.
 <div align="center">