Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,27 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
<div align="center">
|
| 6 |
+
<h1>🚀 CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient</h1>
|
| 7 |
+
</div>
|
| 8 |
+
|
| 9 |
+
> **Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient**
|
| 10 |
+
> [Zigeng Chen](https://github.com/czg1225), [Xinyin Ma](https://horseee.github.io/), [Gongfan Fang](https://fangggf.github.io/), [Xinchao Wang](https://sites.google.com/site/sitexinchaowang/)
|
| 11 |
+
> [Learning and Vision Lab](http://lv-nus.org/), National University of Singapore
|
| 12 |
+
> 🥯[[Paper]](https://arxiv.org/abs/2406.06911)🎄[[Project Page]](https://czg1225.github.io/asyncdiff_page/) 💻 [[GitHub]](https://github.com/czg1225/CoDe)
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
<div align="center">
|
| 16 |
+
<img src="assets/teaser.png" width="100%" ></img>
|
| 17 |
+
<br>
|
| 18 |
+
<em>
|
| 19 |
+
1.7x Speedup and 0.5x memory consumption on ImageNet-256 generation. Top: original VAR-d30; Bottom: CoDe N=8. Speed measurement does not include vae decoder
|
| 20 |
+
</em>
|
| 21 |
+
</div>
|
| 22 |
+
<be>
|
| 23 |
+
|
| 24 |
+
## 💡 Introduction
|
| 25 |
+
We propose Collaborative Decoding (CoDe), a novel decoding strategy tailored to the VAR framework. CoDe capitalizes on two critical observations: the substantially reduced parameter demands at larger scales and the exclusive generation patterns across different scales. Based on these insights, we partition the multi-scale inference process into a seamless collaboration between a large model and a small model.This collaboration yields remarkable efficiency with minimal impact on quality: CoDe achieves a 1.7x speedup, slashes memory usage by around 50%, and preserves image quality with only a negligible FID increase from 1.95 to 1.98. When drafting steps are further decreased, CoDe can achieve an impressive 2.9x acceleration, reaching over 41 images/s at 256x256 resolution on a single NVIDIA 4090 GPU, while preserving a commendable FID of 2.27.
|
| 26 |
+

|
| 27 |
+

|