|
|
--- |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- BAAI/Emu3-Stage1 |
|
|
--- |
|
|
|
|
|
# EARL - SFT (S) (8B) |
|
|
|
|
|
**Model Name:** `mair-lab/sft-simple` |
|
|
**Model Size:** 8B parameters |
|
|
**Base Model:** [BAAI/Emu3-Stage1](https://huggingface.co/BAAI/Emu3-Stage1) |
|
|
**Training Method:** Supervised Fine-Tuning (SFT) |
|
|
**Dataset:** Simple Edit (S) |
|
|
|
|
|
This model is part of the EARL benchmark effort introduced in our paper: |
|
|
👉 [EARL: The Promise of RL for Autoregressive Image Editing](https://arxiv.org/abs/2508.01119) |
|
|
|
|
|
## Model Summary |
|
|
|
|
|
This SFT model is fine-tuned from Emu3 using direct supervision on the Simple Edit dataset. It is optimized for general-purpose autoregressive image editing without requiring intermediate reasoning steps. This model achieves state-of-the-art performance on several editing benchmarks across modalities. |
|
|
|
|
|
➡️ **Inference script and usage:** [GitHub Repo](https://github.com/saba96/EARL?tab=readme-ov-file) |
|
|
|
|
|
## Benchmark Results (Avg Score Across Benchmarks) |
|
|
|
|
|
| Model | Base Model | OmniEdit | EmuEdit | AURORA | MB | VisMin | I2EBench | **AVG** | |
|
|
|----------------|------------|----------|---------|--------|------|--------|----------|---------| |
|
|
| Magicbrush | SD v1.5 | 3.43 | 3.28 | 3.01 | 3.64 | 3.48 | 3.06 | 3.32 | |
|
|
| InstructPix2Pix| SD v1.5 | 3.97 | 3.24 | 3.05 | 3.12 | 2.94 | 3.23 | 3.26 | |
|
|
| Aurora | SD v1.5 | 4.50 | 4.40 | 4.12 | 4.62 | 3.82 | 3.58 | 4.17 | |
|
|
| Omnigen* | - | 5.68 | 5.00 | 4.10 | 4.68 | 4.09 | 4.68 | 4.70 | |
|
|
| **SFT (S)** | Emu3 | 5.73 | 3.66 | 3.58 | 3.19 | 3.57 | 3.59 | **3.88** | |
|
|
|
|
|
> 📈 **Note:** The Emu3-based SFT (S) model achieves top results among all open-source supervised models on **OmniEdit** and competitive performance across other benchmarks. |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
- Open-ended and instruction-guided image editing |
|
|
- Object, attribute, style and environment change |
|
|
|