Any-to-Any
Transformers
English
Chinese
BAGEL-RecA / README.md
sanaka87's picture
Update README.md
436ce49 verified
---
base_model:
- ByteDance-Seed/BAGEL-7B-MoT
datasets:
- jackyhate/text-to-image-2M
language:
- en
- zh
license: apache-2.0
pipeline_tag: any-to-any
library_name: transformers
---
# BAGEL-RecA
**πŸš€ Just 6 Γ— 80GB A100s Γ— 4.5 hours to boost BAGEL performance across all tasks! Outperforms FLUX-Kontext in image editing capabilities!**
> A self-supervised training framework that aligns understanding and generation in modest compute, with huge **zero-shot** gain on generation and editing capability.
## Paper
[Reconstruction Alignment Improves Unified Multimodal Models](https://huggingface.co/papers/2509.07295)
## Project Page
https://reconstruction-alignment.github.io/
## Code
https://github.com/HorizonWind2004/reconstruction-alignment
This repository hosts the model weights (NF4, INT8, BF16) for **BAGEL-RecA**. We fine-tuned BAGEL on 6 80GB NVIDIA A800 for only 27 GPU hours. While the understanding capability remains unchanged, our ReAlign method brings +3.6 **zero-shot improvement** on GenEval , +1.26 on DPGBench, +0.37 on ImgEdit and +0.33 on GEdit.
For installation, usage instructions, and further documentation, please visit [our repo](https://github.com/HorizonWind2004/reconstruction-alignment
) BAGEL's original [GitHub repo](https://github.com/bytedance-seed/BAGEL).
[DF11 version of BAGEL-RecA](https://huggingface.co/theunlikely/BAGEL-RecA-DF11/tree/main), many thanks to @theunlikely !!!
## 🧠 Method
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/pdf/2509.07295)
[![ArXiv](https://img.shields.io/badge/arXiv-A42C25?style=for-the-badge&logo=arxiv&logoColor=white&color=blue)](https://arxiv.org/abs/2509.07295)
[![Github](https://img.shields.io/badge/RecA-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/HorizonWind2004/reconstruction-alignment)
[![Hugging Face Collection](https://img.shields.io/badge/HF_Models-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/collections/sanaka87/realign-68ad2176380355a3dcedc068)
[![HF Demo](https://img.shields.io/badge/Demo_(BAGEL)-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/spaces/sanaka87/BAGEL-ReAlign)
[![Project Page](https://img.shields.io/badge/Project_Page-00CED1?style=for-the-badge&logo=web&logoColor=white)](https://reconstruction-alignment.github.io/)
## πŸ“Š Benchmarks
### 1. Text-to-Image Generation
We test it on 1024x1024 resolution.
| Model | GenEval ↑ | DPGBench ↑ | WISE ↑ |
| ------------ | --------- | --------- | --------- |
| **BAGEL** | 0.787 | 84.03 | 0.50 |
| **BAGEL-RecA** | **0.824** | **85.29** | **0.52** |
### 2. Image Editing
| Model | GEdit-Bench-EN (SC) ↑ | GEdit-Bench-EN (PQ) ↑ | GEdit-Bench-EN (O) ↑ | ImgEdit ↑ |
| ------------- | --------------------- | --------------------- | ------------------- | ------------------ |
| **BAGEL** | 7.96 | 6.64 | 6.94 | 3.38 |
| **BAGEL-NHR** | 8.04 | 6.87 | 7.08 | 3.48 |
| **BAGEL-RecA** | **8.24** | 6.87 | **7.27** | **3.75** |
| **FLUX Kontext** | 6.95 | **7.30** | 6.27 | 3.59 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e99fc07e2ec711a7138262/lGur0scJWaCGkAwH2AHxy.png)
## License
BAGEL-RecA is licensed under the Apache 2.0 license.
## ✍️ Citation
If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation~
```
@article{xie2025reconstruction,
title={Reconstruction Alignment Improves Unified Multimodal Models},
author={Xie, Ji and Darrell, Trevor and Zettlemoyer, Luke and Wang, XuDong},
journal={arXiv preprint arXiv:2509.07295},
year={2025}
}
```