--- license: apache-2.0 library_name: diffusers pipeline_tag: image-to-image tags: - image-editing - multi-image - diffusers - joyai base_model: - Qwen/Qwen3-VL-8B-Instruct --- # JoyAI-Image Edit Plus JoyAI-Image Edit Plus is a multi-image instruction-guided editing model from the [JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image) family. It accepts **multiple reference images** and a text instruction to generate a new image that combines elements from the references according to the instruction. ## Model Architecture | Component | Model | Size | |-----------|-------|------| | Text Encoder | Qwen3-VL-8B-Instruct | 8B | | Transformer (MMDiT) | JoyImageEditPlusTransformer3DModel | 16B | | VAE | AutoencoderKLWan | 240M | | Scheduler | FlowMatchEulerDiscreteScheduler | - | ## Installation `JoyImageEditPlusPipeline` has not yet been merged into the official diffusers release. Before it is available in a stable version, you need to install diffusers from the PR branch: ```bash pip install git+https://github.com/tangyanf/diffusers.git@add-joyimage-edit-plus ``` If you have already installed diffusers, make sure to uninstall it first: ```bash pip uninstall diffusers -y pip install git+https://github.com/tangyanf/diffusers.git@add-joyimage-edit-plus ``` Once the PR is merged into the official diffusers repository, you can switch back to the standard installation: ```bash pip install diffusers --upgrade ``` ## Usage ```python import torch from PIL import Image from diffusers import JoyImageEditPlusPipeline pipe = JoyImageEditPlusPipeline.from_pretrained( "jdopensource/JoyAI-Image-Edit-Plus-Diffusers", torch_dtype=torch.bfloat16, ).to("cuda") # Load reference images images = [ Image.open("reference_0.png").convert("RGB"), Image.open("reference_1.png").convert("RGB"), ] # Determine output resolution from the last reference image target_h, target_w = pipe._get_bucket_size(images[-1]) # Generate result = pipe( images=images, prompt="Combine the person from the second image with the scene from the first image.", negative_prompt="low quality, blurry, deformed", height=target_h, width=target_w, num_inference_steps=30, guidance_scale=4.0, generator=torch.Generator(device="cuda").manual_seed(42), ) result.images[0].save("output.png") ``` ## Example **Prompt:** "The woman is lovingly holding the cute puppy in her arms" | Input 0 | Input 1 | Output | |---------|---------|--------| | ![input_0](examples/input_0.png) | ![input_1](examples/input_1.png) | ![output](examples/output.png) | ## Recommended Parameters | Parameter | Value | |-----------|-------| | `num_inference_steps` | 30 | | `guidance_scale` | 4.0 | | `torch_dtype` | `torch.bfloat16` | | Resolution | Auto-detected via `_get_bucket_size()` (1024-base buckets) | ## CLI Inference ```bash python inference.py \ --model_path jdopensource/JoyAI-Image-Edit-Plus-Diffusers \ --images examples/input_0.png examples/input_1.png \ --prompt "The woman is lovingly holding the cute puppy in her arms" \ --num_inference_steps 30 \ --guidance_scale 4.0 \ --seed 42 \ --output output.png ``` ## Model Details - **Developed by**: JD.com - **License**: Apache-2.0 - **Diffusers version**: >= 0.39.0 - **Framework**: PyTorch ## Citation ```bibtex @misc{joyai-image-2025, title={JoyAI-Image: A Unified Multimodal Foundation Model for Image Understanding, Generation, and Editing}, author={Joy Future Academy, JD}, year={2025}, url={https://github.com/jd-opensource/JoyAI-Image} } ```