Improve model card with pipeline tag, library name, and extended content

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +90 -5
README.md CHANGED
@@ -1,27 +1,112 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
4
 
 
 
 
 
 
 
 
5
  ## 🧠 Method
6
 
7
- [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/pdf/2509.07295)
8
  [![ArXiv](https://img.shields.io/badge/arXiv-A42C25?style=for-the-badge&logo=arxiv&logoColor=white&color=blue)](https://arxiv.org/abs/2509.07295)
9
  [![Github](https://img.shields.io/badge/RecA-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/HorizonWind2004/reconstruction-alignment)
10
  [![Hugging Face Collection](https://img.shields.io/badge/HF_Models-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/collections/sanaka87/realign-68ad2176380355a3dcedc068)
11
  [![HF Demo](https://img.shields.io/badge/Demo_(BAGEL)-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/spaces/sanaka87/BAGEL-ReAlign)
12
  [![Project Page](https://img.shields.io/badge/Project_Page-00CED1?style=for-the-badge&logo=web&logoColor=white)](https://reconstruction-alignment.github.io/)
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  ## ✍️ Citation
16
 
17
- If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation~
18
 
 
19
  @misc{xie2025reconstructionalignmentimprovesunified,
20
- title={Reconstruction Alignment Improves Unified Multimodal Models},
21
  author={Ji Xie and Trevor Darrell and Luke Zettlemoyer and XuDong Wang},
22
  year={2025},
23
  eprint={2509.07295},
24
  archivePrefix={arXiv},
25
  primaryClass={cs.CV},
26
- url={https://arxiv.org/abs/2509.07295},
27
- }
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ library_name: diffusers
4
+ pipeline_tag: text-to-image
5
  ---
6
 
7
+ # Reconstruction Alignment Improves Unified Multimodal Models
8
+
9
+ The model was presented in the paper [Reconstruction Alignment Improves Unified Multimodal Models](https://huggingface.co/papers/2509.07295).
10
+
11
+ **Abstract:**
12
+ Unified multimodal models (UMMs) unify visual understanding and generation within a single architecture. However, conventional training relies on image-text pairs (or sequences) whose captions are typically sparse and miss fine-grained visual details--even when they use hundreds of words to describe a simple image. We introduce Reconstruction Alignment (RecA), a resource-efficient post-training method that leverages visual understanding encoder embeddings as dense "text prompts," providing rich supervision without captions. Concretely, RecA conditions a UMM on its own visual understanding embeddings and optimizes it to reconstruct the input image with a self-supervised reconstruction loss, thereby realigning understanding and generation. Despite its simplicity, RecA is broadly applicable: across autoregressive, masked-autoregressive, and diffusion-based UMMs, it consistently improves generation and editing fidelity. With only 27 GPU-hours, post-training with RecA substantially improves image generation performance on GenEval (0.73$\rightarrow$0.90) and DPGBench (80.93$\rightarrow$88.15), while also boosting editing benchmarks (ImgEdit 3.38$\rightarrow$3.75, GEdit 6.94$\rightarrow$7.25). Notably, RecA surpasses much larger open-source models and applies broadly across diverse UMM architectures, establishing it as an efficient and general post-training alignment strategy for UMMs.
13
+
14
  ## 🧠 Method
15
 
16
+ [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://huggingface.co/papers/2509.07295)
17
  [![ArXiv](https://img.shields.io/badge/arXiv-A42C25?style=for-the-badge&logo=arxiv&logoColor=white&color=blue)](https://arxiv.org/abs/2509.07295)
18
  [![Github](https://img.shields.io/badge/RecA-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/HorizonWind2004/reconstruction-alignment)
19
  [![Hugging Face Collection](https://img.shields.io/badge/HF_Models-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/collections/sanaka87/realign-68ad2176380355a3dcedc068)
20
  [![HF Demo](https://img.shields.io/badge/Demo_(BAGEL)-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/spaces/sanaka87/BAGEL-ReAlign)
21
  [![Project Page](https://img.shields.io/badge/Project_Page-00CED1?style=for-the-badge&logo=web&logoColor=white)](https://reconstruction-alignment.github.io/)
22
 
23
+ <div align="center">
24
+ <img src="https://github.com/HorizonWind2004/reconstruction-alignment/raw/main/assets/DEMO.jpg" alt="" style="width: 100%; margin: 20px 0;">
25
+ </div>
26
+
27
+ ## 🔥 News
28
+
29
+ - **2025.9.10**: BAGEL training code is released! Harmon training code will be released soon.
30
+ - **2025.9.9**: Our [finetuned weights](https://huggingface.co/collections/sanaka87/realign-68ad2176380355a3dcedc068) and [arXiv paper](https://arxiv.org/abs/2509.07295) are available! We expect to release the training code tomorrow.
31
+
32
+ ## 🍭 Results
33
+
34
+ **RecA** achieves state-of-the-art performance on generation benchmarks with remarkable efficiency. Despite using only 1.5B parameters, RecA surpasses models with 7B-24B parameters, achieving GenEval **0.86** and DPGBench **87.21** without GPT-4o distillation data or reinforcement learning. RecA also improves BAGEL's editing performance significantly across all categories. Further two-stage fine-tuning with GPT-4o-Image distillation data enhances the score to **0.90** and **88.15** respectively.
35
+
36
+ <div align="center">
37
+ <img src="https://github.com/HorizonWind2004/reconstruction-alignment/raw/main/assets/main.jpg" alt="" style="width: 100%; margin: 20px 0;">
38
+ </div>
39
+
40
+ <div align="center">
41
+ <img src="https://github.com/HorizonWind2004/reconstruction-alignment/raw/main/assets/edit_result.jpg" alt="" style="width: 100%; margin: 20px 0;">
42
+ </div>
43
+
44
+ We've tested RecA on various base architectures, including Show-o, OpenUni, Harmon, and BAGEL, consistently observing significant performance improvements across all models and benchmarks.
45
+
46
+
47
+ <div align="center">
48
+ <img src="https://github.com/HorizonWind2004/reconstruction-alignment/raw/main/assets/t2i_result.jpg" alt="" style="width: 100%; margin: 20px 0;">
49
+ </div>
50
+
51
+
52
+ ## 🏆 Model Zoo
53
+
54
+ A collection of RecA models on Hugging Face with benchmark performance:
55
+
56
+ | Model Name | Parameters | GenEval | DPGBench | ImgEdit | GEdit |
57
+ |------------|------------|---------|----------|---------|-------|\
58
+ | [BAGEL-RecA](https://huggingface.co/sanaka87/BAGEL-RecA) | 14B | 82.4 (+3.6) | 85.29 (+1.26) | 3.75 (+0.37) | 7.27 (+0.33) |\
59
+ | [Harmon-0.5B-RecA](https://huggingface.co/sanaka87/Harmon-0.5B-RecA) | 0.5B | 78.7 (+11.1) | 84.67 (+4.55) | - | - |\
60
+ | [Harmon-1.5B-RecA](https://huggingface.co/sanaka87/Harmon-1.5B-RecA) | 1.5B | 85.7 (+12.8) | 87.21 (+6.28) | - | - |\
61
+ | [Show-o-RecA](https://huggingface.co/sanaka87/Show-o-RecA) | 1.3B | 61.9 (+5.3) | 75.70 (+5.05) | - | - |\
62
+ | [Show-o-512x512-RecA](https://huggingface.co/sanaka87/Show-o-512x512-RecA) | 1.3B | 72.3 (+6.1) | 84.94 (+2.73) | - | - |\
63
+ | [Harmon-1.5B-RecA-plus](https://huggingface.co/sanaka87/Harmon-1.5B-RecA-plus) | 1.5B | 90.0 | 88.15 | - | - |\
64
+ | [OpenUni-RecA](https://huggingface.co/sanaka87/OpenUni-RecA) | 3.6B | 74.1 (+12.2) | 82.75 (+3.73) | - | - |
65
+
66
+
67
+ ## ✨ Getting Started
68
+
69
+ For detailed instructions on installation, training, and evaluation, please refer to the respective repository READMEs:
70
+
71
+ - **[BAGEL Training Guide](https://github.com/HorizonWind2004/reconstruction-alignment/tree/main/BAGEL/README.md)**: Complete guide for BAGEL model training and evaluation.
72
+
73
+ - **[Benchmark Evaluation Guide](https://github.com/HorizonWind2004/reconstruction-alignment/tree/main/Benchmark/README.md)**: Multi-benchmark evaluation scripts and setup instructions.
74
+
75
+ ## 🚧 TODO
76
+
77
+ - [x] Release our model weights on Hugging Face.
78
+ - [x] Release BAGEL training code.
79
+ - [ ] Release Harmon training code.
80
+ - [ ] Release Show-o and OpenUni training code.
81
+ - [ ] Further scale-up BAGEL training.
82
+ - [ ] Add support for new UMM architectures like Show-o2.
83
+
84
+ ## 📮 Contact
85
+
86
+ For questions, feedback, or collaboration opportunities, feel free to reach out!
87
 
88
  ## ✍️ Citation
89
 
90
+ If you find RecA useful for your research, please consider citing:
91
 
92
+ ```bibtex
93
  @misc{xie2025reconstructionalignmentimprovesunified,
94
+ title={Reconstruction Alignment Improves Unified Multimodal Models},
95
  author={Ji Xie and Trevor Darrell and Luke Zettlemoyer and XuDong Wang},
96
  year={2025},
97
  eprint={2509.07295},
98
  archivePrefix={arXiv},
99
  primaryClass={cs.CV},
100
+ url={https://arxiv.org/abs/2509.07295},
101
+ }
102
+ ```
103
+
104
+ ---
105
+
106
+ <div align="center">
107
+
108
+ ⭐ **If you find this project helpful, please consider giving it a star!** ⭐
109
+
110
+ [![Star History Chart](https://api.star-history.com/svg?repos=HorizonWind2004/reconstruction-alignment&type=Date)](https://www.star-history.com/#HorizonWind2004/reconstruction-alignment&Date)
111
+
112
+ </div>