Instructions to use jdopensource/JoyAI-Image-Edit-Plus-Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use jdopensource/JoyAI-Image-Edit-Plus-Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("jdopensource/JoyAI-Image-Edit-Plus-Diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: diffusers | |
| pipeline_tag: image-to-image | |
| tags: | |
| - image-editing | |
| - multi-image | |
| - diffusers | |
| - joyai | |
| base_model: | |
| - Qwen/Qwen3-VL-8B-Instruct | |
| # JoyAI-Image Edit Plus | |
| JoyAI-Image Edit Plus is a multi-image instruction-guided editing model from the [JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image) family. It accepts **multiple reference images** and a text instruction to generate a new image that combines elements from the references according to the instruction. | |
| ## Model Architecture | |
| | Component | Model | Size | | |
| |-----------|-------|------| | |
| | Text Encoder | Qwen3-VL-8B-Instruct | 8B | | |
| | Transformer (MMDiT) | JoyImageEditPlusTransformer3DModel | 16B | | |
| | VAE | AutoencoderKLWan | 240M | | |
| | Scheduler | FlowMatchEulerDiscreteScheduler | - | | |
| ## Installation | |
| `JoyImageEditPlusPipeline` has not yet been merged into the official diffusers release. Before it is available in a stable version, you need to install diffusers from the PR branch: | |
| ```bash | |
| pip install git+https://github.com/tangyanf/diffusers.git@add-joyimage-edit-plus | |
| ``` | |
| If you have already installed diffusers, make sure to uninstall it first: | |
| ```bash | |
| pip uninstall diffusers -y | |
| pip install git+https://github.com/tangyanf/diffusers.git@add-joyimage-edit-plus | |
| ``` | |
| Once the PR is merged into the official diffusers repository, you can switch back to the standard installation: | |
| ```bash | |
| pip install diffusers --upgrade | |
| ``` | |
| ## Usage | |
| ```python | |
| import torch | |
| from PIL import Image | |
| from diffusers import JoyImageEditPlusPipeline | |
| pipe = JoyImageEditPlusPipeline.from_pretrained( | |
| "jdopensource/JoyAI-Image-Edit-Plus-Diffusers", | |
| torch_dtype=torch.bfloat16, | |
| ).to("cuda") | |
| # Load reference images | |
| images = [ | |
| Image.open("reference_0.png").convert("RGB"), | |
| Image.open("reference_1.png").convert("RGB"), | |
| ] | |
| # Determine output resolution from the last reference image | |
| target_h, target_w = pipe._get_bucket_size(images[-1]) | |
| # Generate | |
| result = pipe( | |
| images=images, | |
| prompt="Combine the person from the second image with the scene from the first image.", | |
| negative_prompt="low quality, blurry, deformed", | |
| height=target_h, | |
| width=target_w, | |
| num_inference_steps=30, | |
| guidance_scale=4.0, | |
| generator=torch.Generator(device="cuda").manual_seed(42), | |
| ) | |
| result.images[0].save("output.png") | |
| ``` | |
| ## Example | |
| **Prompt:** "The woman is lovingly holding the cute puppy in her arms" | |
| | Input 0 | Input 1 | Output | | |
| |---------|---------|--------| | |
| |  |  |  | | |
| ## Recommended Parameters | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | `num_inference_steps` | 30 | | |
| | `guidance_scale` | 4.0 | | |
| | `torch_dtype` | `torch.bfloat16` | | |
| | Resolution | Auto-detected via `_get_bucket_size()` (1024-base buckets) | | |
| ## CLI Inference | |
| ```bash | |
| python inference.py \ | |
| --model_path jdopensource/JoyAI-Image-Edit-Plus-Diffusers \ | |
| --images examples/input_0.png examples/input_1.png \ | |
| --prompt "The woman is lovingly holding the cute puppy in her arms" \ | |
| --num_inference_steps 30 \ | |
| --guidance_scale 4.0 \ | |
| --seed 42 \ | |
| --output output.png | |
| ``` | |
| ## Model Details | |
| - **Developed by**: JD.com | |
| - **License**: Apache-2.0 | |
| - **Diffusers version**: >= 0.39.0 | |
| - **Framework**: PyTorch | |
| ## Citation | |
| ```bibtex | |
| @misc{joyai-image-2025, | |
| title={JoyAI-Image: A Unified Multimodal Foundation Model for Image Understanding, Generation, and Editing}, | |
| author={Joy Future Academy, JD}, | |
| year={2025}, | |
| url={https://github.com/jd-opensource/JoyAI-Image} | |
| } | |
| ``` | |