File size: 3,584 Bytes
a5953a4
 
63e10a1
 
 
 
 
 
 
 
 
a5953a4
63e10a1
 
 
 
 
 
 
 
 
 
 
 
 
 
e3aa829
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63e10a1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
690c1dd
63e10a1
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-image
tags:
  - image-editing
  - multi-image
  - diffusers
  - joyai
base_model:
  - Qwen/Qwen3-VL-8B-Instruct
---

# JoyAI-Image Edit Plus

JoyAI-Image Edit Plus is a multi-image instruction-guided editing model from the [JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image) family. It accepts **multiple reference images** and a text instruction to generate a new image that combines elements from the references according to the instruction.

## Model Architecture

| Component | Model | Size |
|-----------|-------|------|
| Text Encoder | Qwen3-VL-8B-Instruct | 8B |
| Transformer (MMDiT) | JoyImageEditPlusTransformer3DModel | 16B |
| VAE | AutoencoderKLWan | 240M |
| Scheduler | FlowMatchEulerDiscreteScheduler | - |

## Installation

`JoyImageEditPlusPipeline` has not yet been merged into the official diffusers release. Before it is available in a stable version, you need to install diffusers from the PR branch:

```bash
pip install git+https://github.com/tangyanf/diffusers.git@add-joyimage-edit-plus
```

If you have already installed diffusers, make sure to uninstall it first:

```bash
pip uninstall diffusers -y
pip install git+https://github.com/tangyanf/diffusers.git@add-joyimage-edit-plus
```

Once the PR is merged into the official diffusers repository, you can switch back to the standard installation:

```bash
pip install diffusers --upgrade
```

## Usage

```python
import torch
from PIL import Image
from diffusers import JoyImageEditPlusPipeline

pipe = JoyImageEditPlusPipeline.from_pretrained(
    "jdopensource/JoyAI-Image-Edit-Plus-Diffusers",
    torch_dtype=torch.bfloat16,
).to("cuda")

# Load reference images
images = [
    Image.open("reference_0.png").convert("RGB"),
    Image.open("reference_1.png").convert("RGB"),
]

# Determine output resolution from the last reference image
target_h, target_w = pipe._get_bucket_size(images[-1])

# Generate
result = pipe(
    images=images,
    prompt="Combine the person from the second image with the scene from the first image.",
    negative_prompt="low quality, blurry, deformed",
    height=target_h,
    width=target_w,
    num_inference_steps=30,
    guidance_scale=4.0,
    generator=torch.Generator(device="cuda").manual_seed(42),
)
result.images[0].save("output.png")
```

## Example

**Prompt:** "The woman is lovingly holding the cute puppy in her arms"

| Input 0 | Input 1 | Output |
|---------|---------|--------|
| ![input_0](examples/input_0.png) | ![input_1](examples/input_1.png) | ![output](examples/output.png) |

## Recommended Parameters

| Parameter | Value |
|-----------|-------|
| `num_inference_steps` | 30 |
| `guidance_scale` | 4.0 |
| `torch_dtype` | `torch.bfloat16` |
| Resolution | Auto-detected via `_get_bucket_size()` (1024-base buckets) |

## CLI Inference

```bash
python inference.py \
    --model_path jdopensource/JoyAI-Image-Edit-Plus-Diffusers \
    --images examples/input_0.png examples/input_1.png \
    --prompt "The woman is lovingly holding the cute puppy in her arms" \
    --num_inference_steps 30 \
    --guidance_scale 4.0 \
    --seed 42 \
    --output output.png
```

## Model Details

- **Developed by**: JD.com
- **License**: Apache-2.0
- **Diffusers version**: >= 0.39.0
- **Framework**: PyTorch

## Citation

```bibtex
@misc{joyai-image-2025,
  title={JoyAI-Image: A Unified Multimodal Foundation Model for Image Understanding, Generation, and Editing},
  author={Joy Future Academy, JD},
  year={2025},
  url={https://github.com/jd-opensource/JoyAI-Image}
}
```