Improve model card with pipeline tag, library name, and extended content
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,27 +1,112 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
## 🧠 Method
|
| 6 |
|
| 7 |
-
[](https://
|
| 8 |
[](https://arxiv.org/abs/2509.07295)
|
| 9 |
[](https://github.com/HorizonWind2004/reconstruction-alignment)
|
| 10 |
[](https://huggingface.co/collections/sanaka87/realign-68ad2176380355a3dcedc068)
|
| 11 |
[-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/spaces/sanaka87/BAGEL-ReAlign)
|
| 12 |
[](https://reconstruction-alignment.github.io/)
|
| 13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
## ✍️ Citation
|
| 16 |
|
| 17 |
-
If you find
|
| 18 |
|
|
|
|
| 19 |
@misc{xie2025reconstructionalignmentimprovesunified,
|
| 20 |
-
title={Reconstruction Alignment Improves Unified Multimodal Models},
|
| 21 |
author={Ji Xie and Trevor Darrell and Luke Zettlemoyer and XuDong Wang},
|
| 22 |
year={2025},
|
| 23 |
eprint={2509.07295},
|
| 24 |
archivePrefix={arXiv},
|
| 25 |
primaryClass={cs.CV},
|
| 26 |
-
url={https://arxiv.org/abs/2509.07295},
|
| 27 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
library_name: diffusers
|
| 4 |
+
pipeline_tag: text-to-image
|
| 5 |
---
|
| 6 |
|
| 7 |
+
# Reconstruction Alignment Improves Unified Multimodal Models
|
| 8 |
+
|
| 9 |
+
The model was presented in the paper [Reconstruction Alignment Improves Unified Multimodal Models](https://huggingface.co/papers/2509.07295).
|
| 10 |
+
|
| 11 |
+
**Abstract:**
|
| 12 |
+
Unified multimodal models (UMMs) unify visual understanding and generation within a single architecture. However, conventional training relies on image-text pairs (or sequences) whose captions are typically sparse and miss fine-grained visual details--even when they use hundreds of words to describe a simple image. We introduce Reconstruction Alignment (RecA), a resource-efficient post-training method that leverages visual understanding encoder embeddings as dense "text prompts," providing rich supervision without captions. Concretely, RecA conditions a UMM on its own visual understanding embeddings and optimizes it to reconstruct the input image with a self-supervised reconstruction loss, thereby realigning understanding and generation. Despite its simplicity, RecA is broadly applicable: across autoregressive, masked-autoregressive, and diffusion-based UMMs, it consistently improves generation and editing fidelity. With only 27 GPU-hours, post-training with RecA substantially improves image generation performance on GenEval (0.73$\rightarrow$0.90) and DPGBench (80.93$\rightarrow$88.15), while also boosting editing benchmarks (ImgEdit 3.38$\rightarrow$3.75, GEdit 6.94$\rightarrow$7.25). Notably, RecA surpasses much larger open-source models and applies broadly across diverse UMM architectures, establishing it as an efficient and general post-training alignment strategy for UMMs.
|
| 13 |
+
|
| 14 |
## 🧠 Method
|
| 15 |
|
| 16 |
+
[](https://huggingface.co/papers/2509.07295)
|
| 17 |
[](https://arxiv.org/abs/2509.07295)
|
| 18 |
[](https://github.com/HorizonWind2004/reconstruction-alignment)
|
| 19 |
[](https://huggingface.co/collections/sanaka87/realign-68ad2176380355a3dcedc068)
|
| 20 |
[-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/spaces/sanaka87/BAGEL-ReAlign)
|
| 21 |
[](https://reconstruction-alignment.github.io/)
|
| 22 |
|
| 23 |
+
<div align="center">
|
| 24 |
+
<img src="https://github.com/HorizonWind2004/reconstruction-alignment/raw/main/assets/DEMO.jpg" alt="" style="width: 100%; margin: 20px 0;">
|
| 25 |
+
</div>
|
| 26 |
+
|
| 27 |
+
## 🔥 News
|
| 28 |
+
|
| 29 |
+
- **2025.9.10**: BAGEL training code is released! Harmon training code will be released soon.
|
| 30 |
+
- **2025.9.9**: Our [finetuned weights](https://huggingface.co/collections/sanaka87/realign-68ad2176380355a3dcedc068) and [arXiv paper](https://arxiv.org/abs/2509.07295) are available! We expect to release the training code tomorrow.
|
| 31 |
+
|
| 32 |
+
## 🍭 Results
|
| 33 |
+
|
| 34 |
+
**RecA** achieves state-of-the-art performance on generation benchmarks with remarkable efficiency. Despite using only 1.5B parameters, RecA surpasses models with 7B-24B parameters, achieving GenEval **0.86** and DPGBench **87.21** without GPT-4o distillation data or reinforcement learning. RecA also improves BAGEL's editing performance significantly across all categories. Further two-stage fine-tuning with GPT-4o-Image distillation data enhances the score to **0.90** and **88.15** respectively.
|
| 35 |
+
|
| 36 |
+
<div align="center">
|
| 37 |
+
<img src="https://github.com/HorizonWind2004/reconstruction-alignment/raw/main/assets/main.jpg" alt="" style="width: 100%; margin: 20px 0;">
|
| 38 |
+
</div>
|
| 39 |
+
|
| 40 |
+
<div align="center">
|
| 41 |
+
<img src="https://github.com/HorizonWind2004/reconstruction-alignment/raw/main/assets/edit_result.jpg" alt="" style="width: 100%; margin: 20px 0;">
|
| 42 |
+
</div>
|
| 43 |
+
|
| 44 |
+
We've tested RecA on various base architectures, including Show-o, OpenUni, Harmon, and BAGEL, consistently observing significant performance improvements across all models and benchmarks.
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
<div align="center">
|
| 48 |
+
<img src="https://github.com/HorizonWind2004/reconstruction-alignment/raw/main/assets/t2i_result.jpg" alt="" style="width: 100%; margin: 20px 0;">
|
| 49 |
+
</div>
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
## 🏆 Model Zoo
|
| 53 |
+
|
| 54 |
+
A collection of RecA models on Hugging Face with benchmark performance:
|
| 55 |
+
|
| 56 |
+
| Model Name | Parameters | GenEval | DPGBench | ImgEdit | GEdit |
|
| 57 |
+
|------------|------------|---------|----------|---------|-------|\
|
| 58 |
+
| [BAGEL-RecA](https://huggingface.co/sanaka87/BAGEL-RecA) | 14B | 82.4 (+3.6) | 85.29 (+1.26) | 3.75 (+0.37) | 7.27 (+0.33) |\
|
| 59 |
+
| [Harmon-0.5B-RecA](https://huggingface.co/sanaka87/Harmon-0.5B-RecA) | 0.5B | 78.7 (+11.1) | 84.67 (+4.55) | - | - |\
|
| 60 |
+
| [Harmon-1.5B-RecA](https://huggingface.co/sanaka87/Harmon-1.5B-RecA) | 1.5B | 85.7 (+12.8) | 87.21 (+6.28) | - | - |\
|
| 61 |
+
| [Show-o-RecA](https://huggingface.co/sanaka87/Show-o-RecA) | 1.3B | 61.9 (+5.3) | 75.70 (+5.05) | - | - |\
|
| 62 |
+
| [Show-o-512x512-RecA](https://huggingface.co/sanaka87/Show-o-512x512-RecA) | 1.3B | 72.3 (+6.1) | 84.94 (+2.73) | - | - |\
|
| 63 |
+
| [Harmon-1.5B-RecA-plus](https://huggingface.co/sanaka87/Harmon-1.5B-RecA-plus) | 1.5B | 90.0 | 88.15 | - | - |\
|
| 64 |
+
| [OpenUni-RecA](https://huggingface.co/sanaka87/OpenUni-RecA) | 3.6B | 74.1 (+12.2) | 82.75 (+3.73) | - | - |
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
## ✨ Getting Started
|
| 68 |
+
|
| 69 |
+
For detailed instructions on installation, training, and evaluation, please refer to the respective repository READMEs:
|
| 70 |
+
|
| 71 |
+
- **[BAGEL Training Guide](https://github.com/HorizonWind2004/reconstruction-alignment/tree/main/BAGEL/README.md)**: Complete guide for BAGEL model training and evaluation.
|
| 72 |
+
|
| 73 |
+
- **[Benchmark Evaluation Guide](https://github.com/HorizonWind2004/reconstruction-alignment/tree/main/Benchmark/README.md)**: Multi-benchmark evaluation scripts and setup instructions.
|
| 74 |
+
|
| 75 |
+
## 🚧 TODO
|
| 76 |
+
|
| 77 |
+
- [x] Release our model weights on Hugging Face.
|
| 78 |
+
- [x] Release BAGEL training code.
|
| 79 |
+
- [ ] Release Harmon training code.
|
| 80 |
+
- [ ] Release Show-o and OpenUni training code.
|
| 81 |
+
- [ ] Further scale-up BAGEL training.
|
| 82 |
+
- [ ] Add support for new UMM architectures like Show-o2.
|
| 83 |
+
|
| 84 |
+
## 📮 Contact
|
| 85 |
+
|
| 86 |
+
For questions, feedback, or collaboration opportunities, feel free to reach out!
|
| 87 |
|
| 88 |
## ✍️ Citation
|
| 89 |
|
| 90 |
+
If you find RecA useful for your research, please consider citing:
|
| 91 |
|
| 92 |
+
```bibtex
|
| 93 |
@misc{xie2025reconstructionalignmentimprovesunified,
|
| 94 |
+
title={Reconstruction Alignment Improves Unified Multimodal Models},
|
| 95 |
author={Ji Xie and Trevor Darrell and Luke Zettlemoyer and XuDong Wang},
|
| 96 |
year={2025},
|
| 97 |
eprint={2509.07295},
|
| 98 |
archivePrefix={arXiv},
|
| 99 |
primaryClass={cs.CV},
|
| 100 |
+
url={https://arxiv.org/abs/2509.07295},
|
| 101 |
+
}
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
---
|
| 105 |
+
|
| 106 |
+
<div align="center">
|
| 107 |
+
|
| 108 |
+
⭐ **If you find this project helpful, please consider giving it a star!** ⭐
|
| 109 |
+
|
| 110 |
+
[](https://www.star-history.com/#HorizonWind2004/reconstruction-alignment&Date)
|
| 111 |
+
|
| 112 |
+
</div>
|