|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
# Qwen-Image Precise Region Control Model |
|
|
|
|
|
 |
|
|
|
|
|
## Model Introduction |
|
|
|
|
|
This model is the V2 version of a precise region control model trained based on [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image). The model architecture uses LoRA, enabling control over the position and shape of each entity by providing textual descriptions and regional conditions (mask maps) for each entity. The training framework is built on [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio), and the dataset used is the [Qwen-Image-Self-Generated-Dataset](https://www.modelscope.cn/datasets/DiffSynth-Studio/Qwen-Image-Self-Generated-Dataset). |
|
|
|
|
|
Compared to the [V1](https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-EliGen) version, this model is trained on a self-generated dataset from Qwen-Image, resulting in generated images whose styles are more consistent with the base model. |
|
|
|
|
|
## Result Demonstration |
|
|
|
|
|
|Entity Control Condition|Generated Image 1|Generated Image 2|Generated Image 3| |
|
|
|-|-|-|-| |
|
|
||||| |
|
|
||||| |
|
|
|
|
|
|
|
|
## Inference Code |
|
|
``` |
|
|
git clone https://github.com/modelscope/DiffSynth-Studio.git |
|
|
cd DiffSynth-Studio |
|
|
pip install -e . |
|
|
``` |
|
|
|
|
|
```python |
|
|
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig |
|
|
from modelscope import dataset_snapshot_download, snapshot_download |
|
|
import torch |
|
|
from PIL import Image |
|
|
``` |
|
|
|
|
|
```python |
|
|
pipe = QwenImagePipeline.from_pretrained( |
|
|
torch_dtype=torch.bfloat16, |
|
|
device="cuda", |
|
|
model_configs=[ |
|
|
ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"), |
|
|
ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"), |
|
|
ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"), |
|
|
], |
|
|
tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"), |
|
|
) |
|
|
|
|
|
snapshot_download("DiffSynth-Studio/Qwen-Image-EliGen-V2", local_dir="models/DiffSynth-Studio/Qwen-Image-EliGen-V2", allow_file_pattern="model.safetensors") |
|
|
pipe.load_lora(pipe.dit, "models/DiffSynth-Studio/Qwen-Image-EliGen-V2/model.safetensors") |
|
|
|
|
|
global_prompt = "Poster for the Qwen-Image-EliGen Magic Café, featuring two magical coffees—one emitting flames and the other emitting ice spikes—against a light blue misty background. The poster includes text: 'Qwen-Image-EliGen Magic Café' and 'New Product Launch'" |
|
|
entity_prompts = ["A red magic coffee with flames rising from the cup", |
|
|
"A red magic coffee surrounded by ice spikes", |
|
|
"Text: 'New Product Launch'", |
|
|
"Text: 'Qwen-Image-EliGen Magic Café'"] |
|
|
|
|
|
dataset_snapshot_download(dataset_id="DiffSynth-Studio/examples_in_diffsynth", local_dir="./", allow_file_pattern=f"data/examples/eligen/qwen-image/example_6/*.png") |
|
|
masks = [Image.open(f"./data/examples/eligen/qwen-image/example_6/{i}.png").convert('RGB').resize((1328, 1328)) for i in range(len(entity_prompts))] |
|
|
|
|
|
image = pipe( |
|
|
prompt=global_prompt, |
|
|
seed=0, |
|
|
eligen_entity_prompts=entity_prompts, |
|
|
eligen_entity_masks=masks, |
|
|
) |
|
|
image.save("image.jpg") |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
If you find our work helpful, please consider citing our research: |
|
|
``` |
|
|
@article{zhang2025eligen, |
|
|
title={Eligen: Entity-level Controlled Image Generation with Regional Attention}, |
|
|
author={Zhang, Hong and Duan, Zhongjie and Wang, Xingjun and Chen, Yingda and Zhang, Yu}, |
|
|
journal={arXiv preprint arXiv:2501.01097}, |
|
|
year={2025} |
|
|
} |
|
|
``` |