| | --- |
| | base_model: |
| | - ByteDance-Seed/BAGEL-7B-MoT |
| | datasets: |
| | - jackyhate/text-to-image-2M |
| | language: |
| | - en |
| | - zh |
| | license: apache-2.0 |
| | pipeline_tag: any-to-any |
| | library_name: transformers |
| | --- |
| | |
| | # BAGEL-RecA |
| |
|
| | **π Just 6 Γ 80GB A100s Γ 4.5 hours to boost BAGEL performance across all tasks! Outperforms FLUX-Kontext in image editing capabilities!** |
| |
|
| | > A self-supervised training framework that aligns understanding and generation in modest compute, with huge **zero-shot** gain on generation and editing capability. |
| |
|
| | ## Paper |
| | [Reconstruction Alignment Improves Unified Multimodal Models](https://huggingface.co/papers/2509.07295) |
| |
|
| | ## Project Page |
| | https://reconstruction-alignment.github.io/ |
| |
|
| | ## Code |
| |
|
| | https://github.com/HorizonWind2004/reconstruction-alignment |
| |
|
| | This repository hosts the model weights (NF4, INT8, BF16) for **BAGEL-RecA**. We fine-tuned BAGEL on 6 80GB NVIDIA A800 for only 27 GPU hours. While the understanding capability remains unchanged, our ReAlign method brings +3.6 **zero-shot improvement** on GenEval , +1.26 on DPGBench, +0.37 on ImgEdit and +0.33 on GEdit. |
| |
|
| | For installation, usage instructions, and further documentation, please visit [our repo](https://github.com/HorizonWind2004/reconstruction-alignment |
| | ) BAGEL's original [GitHub repo](https://github.com/bytedance-seed/BAGEL). |
| |
|
| | [DF11 version of BAGEL-RecA](https://huggingface.co/theunlikely/BAGEL-RecA-DF11/tree/main), many thanks to @theunlikely !!! |
| |
|
| | ## π§ Method |
| |
|
| | [](https://arxiv.org/pdf/2509.07295) |
| | [](https://arxiv.org/abs/2509.07295) |
| | [](https://github.com/HorizonWind2004/reconstruction-alignment) |
| | [](https://huggingface.co/collections/sanaka87/realign-68ad2176380355a3dcedc068) |
| | [-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/spaces/sanaka87/BAGEL-ReAlign) |
| | [](https://reconstruction-alignment.github.io/) |
| |
|
| | ## π Benchmarks |
| |
|
| | ### 1. Text-to-Image Generation |
| |
|
| | We test it on 1024x1024 resolution. |
| |
|
| | | Model | GenEval β | DPGBench β | WISE β | |
| | | ------------ | --------- | --------- | --------- | |
| | | **BAGEL** | 0.787 | 84.03 | 0.50 | |
| | | **BAGEL-RecA** | **0.824** | **85.29** | **0.52** | |
| |
|
| | ### 2. Image Editing |
| |
|
| | | Model | GEdit-Bench-EN (SC) β | GEdit-Bench-EN (PQ) β | GEdit-Bench-EN (O) β | ImgEdit β | |
| | | ------------- | --------------------- | --------------------- | ------------------- | ------------------ | |
| | | **BAGEL** | 7.96 | 6.64 | 6.94 | 3.38 | |
| | | **BAGEL-NHR** | 8.04 | 6.87 | 7.08 | 3.48 | |
| | | **BAGEL-RecA** | **8.24** | 6.87 | **7.27** | **3.75** | |
| | | **FLUX Kontext** | 6.95 | **7.30** | 6.27 | 3.59 | |
| |
|
| |
|
| |  |
| |
|
| | ## License |
| |
|
| | BAGEL-RecA is licensed under the Apache 2.0 license. |
| |
|
| | ## βοΈ Citation |
| |
|
| | If you find our work inspiring or use our codebase in your research, please consider giving a star β and a citation~ |
| |
|
| |
|
| | ``` |
| | @article{xie2025reconstruction, |
| | title={Reconstruction Alignment Improves Unified Multimodal Models}, |
| | author={Xie, Ji and Darrell, Trevor and Zettlemoyer, Luke and Wang, XuDong}, |
| | journal={arXiv preprint arXiv:2509.07295}, |
| | year={2025} |
| | } |
| | ``` |
| |
|