Update README.md
Browse files
README.md
CHANGED
|
@@ -14,19 +14,19 @@ pipeline_tag: any-to-any
|
|
| 14 |
|
| 15 |
|
| 16 |
## Abstract
|
| 17 |
-
Parameter generation has long struggled to match the scale of today
|
| 18 |
-
models, curbing its broader utility. In this paper, we introduce **R**ecurrent Diffusion for Large-Scale
|
| 19 |
-
**P**arameter **G**eneration (**RPG**), a novel framework that generates full neural network parameters—up
|
| 20 |
-
to **hundreds of millions**—on a **single GPU**. Our approach first partitions a network
|
| 21 |
-
into non-overlapping
|
| 22 |
-
mechanism then learns the inter-token relationships, producing
|
| 23 |
-
for a diffusion process that ultimately synthesizes the
|
| 24 |
-
architectures and tasks—including ResNets, ConvNeXts and ViTs on ImageNet-1K and COCO,
|
| 25 |
-
and even LoRA-based LLMs—RPG achieves performance on par with fully trained networks while
|
| 26 |
-
avoiding excessive memory overhead. Notably, it generalizes beyond its training set to generate
|
| 27 |
-
valid parameters for previously unseen tasks, highlighting its flexibility in
|
| 28 |
scenarios. By overcoming the longstanding memory and scalability barriers,
|
| 29 |
-
RPG serves as a critical advance in
|
| 30 |
enabling efficient weight generation at scales previously deemed infeasible.
|
| 31 |
|
| 32 |
|
|
@@ -149,8 +149,7 @@ We thank
|
|
| 149 |
[Mingjia Shi](bdemo.github.io/homepage),
|
| 150 |
[Zangwei Zheng](https://zhengzangw.github.io/),
|
| 151 |
[Ziheng Qin](https://henryqin1997.github.io/ziheng_qin/),
|
| 152 |
-
[Tianlong Chen](https://tianlong-chen.github.io/)
|
| 153 |
-
and [Zhangyang Wang](https://www.ece.utexas.edu/people/faculty/atlas-wang)
|
| 154 |
for valuable discussions and feedbacks.
|
| 155 |
This research is supported by the National Research Foundation,
|
| 156 |
Singapore under its AI Singapore Programme
|
|
@@ -160,8 +159,8 @@ Singapore under its AI Singapore Programme
|
|
| 160 |
## Citation
|
| 161 |
```
|
| 162 |
@misc{wang2025recurrent,
|
| 163 |
-
title={
|
| 164 |
-
author={Wang, Kai and Tang, Dongwen and Zhao, Wangbo and You, Yang},
|
| 165 |
year={2025},
|
| 166 |
}
|
| 167 |
```
|
|
|
|
| 14 |
|
| 15 |
|
| 16 |
## Abstract
|
| 17 |
+
Parameter generation has long struggled to match the scale of today's large vision and language
|
| 18 |
+
models, curbing its broader utility. In this paper, we introduce **R**ecurrent Diffusion for Large-Scale
|
| 19 |
+
**P**arameter **G**eneration (**RPG**), a novel framework that generates full neural network parameters—up
|
| 20 |
+
to **hundreds of millions**—on a **single GPU**. Our approach first partitions a network's parameters
|
| 21 |
+
into non-overlapping 'tokens', each corresponding to a distinct portion of the model. A recurrent
|
| 22 |
+
mechanism then learns the inter-token relationships, producing 'prototypes' which serve as conditions
|
| 23 |
+
for a diffusion process that ultimately synthesizes the parameters. Across a spectrum of
|
| 24 |
+
architectures and tasks—including ResNets, ConvNeXts and ViTs on ImageNet-1K and COCO,
|
| 25 |
+
and even LoRA-based LLMs—RPG achieves performance on par with fully trained networks while
|
| 26 |
+
avoiding excessive memory overhead. Notably, it generalizes beyond its training set to generate
|
| 27 |
+
valid parameters for previously unseen tasks, highlighting its flexibility in open-ended
|
| 28 |
scenarios. By overcoming the longstanding memory and scalability barriers,
|
| 29 |
+
RPG serves as a critical advance in 'AI generating AI', potentially
|
| 30 |
enabling efficient weight generation at scales previously deemed infeasible.
|
| 31 |
|
| 32 |
|
|
|
|
| 149 |
[Mingjia Shi](bdemo.github.io/homepage),
|
| 150 |
[Zangwei Zheng](https://zhengzangw.github.io/),
|
| 151 |
[Ziheng Qin](https://henryqin1997.github.io/ziheng_qin/),
|
| 152 |
+
and [Tianlong Chen](https://tianlong-chen.github.io/)
|
|
|
|
| 153 |
for valuable discussions and feedbacks.
|
| 154 |
This research is supported by the National Research Foundation,
|
| 155 |
Singapore under its AI Singapore Programme
|
|
|
|
| 159 |
## Citation
|
| 160 |
```
|
| 161 |
@misc{wang2025recurrent,
|
| 162 |
+
title={Scaling Up Parameter Generation: A Recurrent Diffusion Approach},
|
| 163 |
+
author={Wang, Kai and Tang, Dongwen and Zhao, Wangbo and Schürholt, Konstantin and Wang, Zhangyang and You, Yang},
|
| 164 |
year={2025},
|
| 165 |
}
|
| 166 |
```
|