UniREdit-Bagel-bf16 / README.md
nielsr's picture
nielsr HF Staff
Enhance model card for UniREdit-Bagel: Add metadata, links, and usage
234c7ca verified
|
raw
history blame
5.32 kB
metadata
license: apache-2.0
pipeline_tag: image-to-image
library_name: transformers

UniREdit-Bagel: A Unified Reasoning-based Image Editing Model

This repository hosts UniREdit-Bagel, a model developed as part of the research presented in the paper: UniREditBench: A Unified Reasoning-based Image Editing Benchmark

Project Page: https://maplebb.github.io/UniREditBench/ Code Repository: https://github.com/Maplebb/UniREditBench

image

Introduction

We propose UniREditBench, a unified benchmark for reasoning-based image editing evaluation with broader evaluation dimension coverage and a robust evaluation pipeline. We also design an automated multi-scenario data synthesis pipeline and construct UniREdit-Data-100K, a large-scale synthetic dataset with high-quality chain-of-thought (CoT) reasoning annotations. We fine-tune Bagel on this dataset and develop UniREdit-Bagel, demonstrating substantial improvements in both in-domain and out-of-distribution settings.

image

✨ Highlights:

  • Broader Scenario and Reasoning Dimension Coverage: It contains 2,700 high-quality samples organized into 8 primary reasoning dimensions and 18 sub-categories, spanning both real-world and game-world image editing tasks.
  • Reliable Dual-Reference Evaluation: For each sample assessment, we design both the textual reference and ground-truth (GT) image reference. This multi-modal reference enables vision-language model (VLM) evaluators to perform direct and fine-grained comparisons at both the textual and visual levels with the generated images, leading to more reliable evaluation.
image
image
image

πŸš€ Sample Usage

To perform image editing with reasoning using UniREdit-Bagel, follow the steps below. This section is adapted from the official GitHub repository.

1. Set Up Environment

conda create -n uniredit python=3.10 -y
conda activate uniredit
pip install -r requirements.txt
pip install flash_attn==2.7.0.post1 --no-build-isolation

You can also install flash_attn via:

# for cuda11 torch2.5.x
pip install "https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.0.post1/flash_attn-2.7.0.post1+cu11torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl"

# for cuda12 torch2.5.x
pip install "https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.0.post1/flash_attn-2.7.0.post1+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl"

2. Benchmark and Checkpoint Preparation

First, prepare the UniREditBench benchmark dataset:

huggingface-cli download --resume-download maplebb/UniREditBench  --local-dir ./UniREditBench
cd UniREditBench
unzip original_image.zip
unzip reference_image.zip
cd ..

Then, prepare the UniREdit-Bagel checkpoint:

huggingface-cli download --resume-download maplebb/UniREdit-Bagel  --local-dir ./ckpt

pip install safetensors

python merge_ckpt.py

(Note: The merge_ckpt.py script is part of the UniREditBench GitHub repository and should be run from its root directory after cloning and downloading the checkpoint.)

3. Inference

Once the environment and checkpoints are prepared, you can run inference:

GPUS=8
model_path=./ckpt
input_path=./UniREditBench
output_path=./output_images

# Image Editing with Reasoning
torchrun \
    --nnodes=1 \
    --nproc_per_node=$GPUS \
    gen_images_mp_uniredit.py \
    --input_dir $input_path \
    --output_dir $output_path \
    --metadata_file ./UniREditBench/data.json \
    --max_latent_size 64 \
    --model-path $model_path \
    --think

This command will generate edited images based on the instructions and save them to the specified output_images directory. The --think argument enables reasoning capabilities.

πŸ“§ Contact

If you have any comments or questions, please open a new issue on the GitHub repository or feel free to contact Feng Han and Yibin Wang.

⭐ Citation

If you find our work helpful or inspiring, please consider citing it:

@article{han2025unireditbench,
  title={UniREditBench: A Unified Reasoning-based Image Editing Benchmark},
  author={Han, Feng and Wang, Yibin and Li, Chenglin and Liang, Zheming and Wang, Dianyi and Jiao, Yang and Wei, Zhipeng and Gong, Chao and Jin, Cheng and Chen, Jingjing and Wang, Jiaqi},
  journal={arXiv preprint arXiv:2511.01295},
  year={2025}
}