---
license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-image
tags:
  - image-editing
  - multi-image
  - diffusers
  - joyai
base_model:
  - Qwen/Qwen3-VL-8B-Instruct
---

# JoyAI-Image Edit Plus

JoyAI-Image Edit Plus is a multi-image instruction-guided editing model from the [JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image) family. It accepts **multiple reference images** and a text instruction to generate a new image that combines elements from the references according to the instruction.

## Model Architecture

| Component | Model | Size |
|-----------|-------|------|
| Text Encoder | Qwen3-VL-8B-Instruct | 8B |
| Transformer (MMDiT) | JoyImageEditPlusTransformer3DModel | 16B |
| VAE | AutoencoderKLWan | 240M |
| Scheduler | FlowMatchEulerDiscreteScheduler | - |

## Installation

`JoyImageEditPlusPipeline` has not yet been merged into the official diffusers release. Before it is available in a stable version, you need to install diffusers from the PR branch:

```bash
pip install git+https://github.com/tangyanf/diffusers.git@add-joyimage-edit-plus
```

If you have already installed diffusers, make sure to uninstall it first:

```bash
pip uninstall diffusers -y
pip install git+https://github.com/tangyanf/diffusers.git@add-joyimage-edit-plus
```

Once the PR is merged into the official diffusers repository, you can switch back to the standard installation:

```bash
pip install diffusers --upgrade
```

## Usage

```python
import torch
from PIL import Image
from diffusers import JoyImageEditPlusPipeline

pipe = JoyImageEditPlusPipeline.from_pretrained(
    "jdopensource/JoyAI-Image-Edit-Plus-Diffusers",
    torch_dtype=torch.bfloat16,
).to("cuda")

# Load reference images
images = [
    Image.open("reference_0.png").convert("RGB"),
    Image.open("reference_1.png").convert("RGB"),
]

# Determine output resolution from the last reference image
target_h, target_w = pipe._get_bucket_size(images[-1])

# Generate
result = pipe(
    images=images,
    prompt="Combine the person from the second image with the scene from the first image.",
    negative_prompt="low quality, blurry, deformed",
    height=target_h,
    width=target_w,
    num_inference_steps=30,
    guidance_scale=4.0,
    generator=torch.Generator(device="cuda").manual_seed(42),
)
result.images[0].save("output.png")
```

## Example

**Prompt:** "The woman is lovingly holding the cute puppy in her arms"

| Input 0 | Input 1 | Output |
|---------|---------|--------|
| ![input_0](examples/input_0.png) | ![input_1](examples/input_1.png) | ![output](examples/output.png) |

## Recommended Parameters

| Parameter | Value |
|-----------|-------|
| `num_inference_steps` | 30 |
| `guidance_scale` | 4.0 |
| `torch_dtype` | `torch.bfloat16` |
| Resolution | Auto-detected via `_get_bucket_size()` (1024-base buckets) |

## CLI Inference

```bash
python inference.py \
    --model_path jdopensource/JoyAI-Image-Edit-Plus-Diffusers \
    --images examples/input_0.png examples/input_1.png \
    --prompt "The woman is lovingly holding the cute puppy in her arms" \
    --num_inference_steps 30 \
    --guidance_scale 4.0 \
    --seed 42 \
    --output output.png
```

## Model Details

- **Developed by**: JD.com
- **License**: Apache-2.0
- **Diffusers version**: >= 0.39.0
- **Framework**: PyTorch

## Citation

```bibtex
@misc{joyai-image-2025,
  title={JoyAI-Image: A Unified Multimodal Foundation Model for Image Understanding, Generation, and Editing},
  author={Joy Future Academy, JD},
  year={2025},
  url={https://github.com/jd-opensource/JoyAI-Image}
}
```