Update README.md
Browse files
README.md
CHANGED
|
@@ -41,7 +41,7 @@ Welcome to the official repository for our paper: "STAR: STacked AutoRegressive
|
|
| 41 |
|
| 42 |
|
| 43 |
## **Abstract**
|
| 44 |
-
Multimodal large language models (MLLMs) play a pivotal role in advancing the quest for general artificial intelligence. However, achieving unified target for multimodal understanding and generation remains challenging due to optimization conflicts and performance trade-offs. To effectively enhance generative performance while preserving existing comprehension capabilities, we introduce ***STAR***:
|
| 45 |
|
| 46 |
<div align="center">
|
| 47 |
<img src="assets/teaser.png" width=100%></img>
|
|
|
|
| 41 |
|
| 42 |
|
| 43 |
## **Abstract**
|
| 44 |
+
Multimodal large language models (MLLMs) play a pivotal role in advancing the quest for general artificial intelligence. However, achieving unified target for multimodal understanding and generation remains challenging due to optimization conflicts and performance trade-offs. To effectively enhance generative performance while preserving existing comprehension capabilities, we introduce ***STAR***: a **ST**acked **A**uto**R**egressive scheme for task-progressive unified multimodal learning. This approach decomposes multimodal learning into multiple stages: understanding, generation, and editing. By freezing the parameters of the fundamental autoregressive (AR) model and progressively stacking isomorphic AR modules, it avoids cross-task interference while expanding the model's capabilities. Concurrently, we introduce a high-capacity VQ to enhance the granularity of image representations and employ an implicit reasoning mechanism to improve generation quality under complex conditions. Experiments demonstrate that STAR achieves state-of-the-art performance on GenEval (**0.91**), DPG-Bench (**87.44**), and ImgEdit (**4.34**), validating its efficacy for unified multimodal learning.
|
| 45 |
|
| 46 |
<div align="center">
|
| 47 |
<img src="assets/teaser.png" width=100%></img>
|