tangyanfei's picture
Upload README.md with huggingface_hub
690c1dd verified
|
Raw
History Blame Contribute Delete
3.58 kB
---
license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-image
tags:
- image-editing
- multi-image
- diffusers
- joyai
base_model:
- Qwen/Qwen3-VL-8B-Instruct
---
# JoyAI-Image Edit Plus
JoyAI-Image Edit Plus is a multi-image instruction-guided editing model from the [JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image) family. It accepts **multiple reference images** and a text instruction to generate a new image that combines elements from the references according to the instruction.
## Model Architecture
| Component | Model | Size |
|-----------|-------|------|
| Text Encoder | Qwen3-VL-8B-Instruct | 8B |
| Transformer (MMDiT) | JoyImageEditPlusTransformer3DModel | 16B |
| VAE | AutoencoderKLWan | 240M |
| Scheduler | FlowMatchEulerDiscreteScheduler | - |
## Installation
`JoyImageEditPlusPipeline` has not yet been merged into the official diffusers release. Before it is available in a stable version, you need to install diffusers from the PR branch:
```bash
pip install git+https://github.com/tangyanf/diffusers.git@add-joyimage-edit-plus
```
If you have already installed diffusers, make sure to uninstall it first:
```bash
pip uninstall diffusers -y
pip install git+https://github.com/tangyanf/diffusers.git@add-joyimage-edit-plus
```
Once the PR is merged into the official diffusers repository, you can switch back to the standard installation:
```bash
pip install diffusers --upgrade
```
## Usage
```python
import torch
from PIL import Image
from diffusers import JoyImageEditPlusPipeline
pipe = JoyImageEditPlusPipeline.from_pretrained(
"jdopensource/JoyAI-Image-Edit-Plus-Diffusers",
torch_dtype=torch.bfloat16,
).to("cuda")
# Load reference images
images = [
Image.open("reference_0.png").convert("RGB"),
Image.open("reference_1.png").convert("RGB"),
]
# Determine output resolution from the last reference image
target_h, target_w = pipe._get_bucket_size(images[-1])
# Generate
result = pipe(
images=images,
prompt="Combine the person from the second image with the scene from the first image.",
negative_prompt="low quality, blurry, deformed",
height=target_h,
width=target_w,
num_inference_steps=30,
guidance_scale=4.0,
generator=torch.Generator(device="cuda").manual_seed(42),
)
result.images[0].save("output.png")
```
## Example
**Prompt:** "The woman is lovingly holding the cute puppy in her arms"
| Input 0 | Input 1 | Output |
|---------|---------|--------|
| ![input_0](examples/input_0.png) | ![input_1](examples/input_1.png) | ![output](examples/output.png) |
## Recommended Parameters
| Parameter | Value |
|-----------|-------|
| `num_inference_steps` | 30 |
| `guidance_scale` | 4.0 |
| `torch_dtype` | `torch.bfloat16` |
| Resolution | Auto-detected via `_get_bucket_size()` (1024-base buckets) |
## CLI Inference
```bash
python inference.py \
--model_path jdopensource/JoyAI-Image-Edit-Plus-Diffusers \
--images examples/input_0.png examples/input_1.png \
--prompt "The woman is lovingly holding the cute puppy in her arms" \
--num_inference_steps 30 \
--guidance_scale 4.0 \
--seed 42 \
--output output.png
```
## Model Details
- **Developed by**: JD.com
- **License**: Apache-2.0
- **Diffusers version**: >= 0.39.0
- **Framework**: PyTorch
## Citation
```bibtex
@misc{joyai-image-2025,
title={JoyAI-Image: A Unified Multimodal Foundation Model for Image Understanding, Generation, and Editing},
author={Joy Future Academy, JD},
year={2025},
url={https://github.com/jd-opensource/JoyAI-Image}
}
```