DiffSynth-Studio
/

Qwen-Image-Distill-Full

Model card Files Files and versions

Qwen-Image-Distill-Full / README.md

kelseye's picture

Upload folder using huggingface_hub

99ba876 verified 6 months ago

|

history blame contribute delete

2.59 kB

	---
	license: apache-2.0
	---
	# Qwen-Image Full Distillation Accelerated Model

	![](./assets/title.jpg)

	## Model Introduction

	This model is a distilled and accelerated version of [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image). The original model requires 40 inference steps and classifier-free guidance (CFG), resulting in a total of 80 forward passes. In contrast, the distilled accelerated model only requires 15 inference steps without CFG, totaling just 15 forward passes—achieving approximately 5x speedup. Of course, the number of inference steps can be further reduced based on requirements, though this may lead to some degradation in generation quality.

	The training framework is built upon [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio). The training data consists of 16,000 images generated by the original model using randomly sampled prompts from [DiffusionDB](https://www.modelscope.cn/datasets/AI-ModelScope/diffusiondb). The training process was conducted on 8 * MI308X GPUs and took approximately one day.

	## Performance Comparison

	\|\|Original Model\|Original Model\|Accelerated Model\|
	\|-\|-\|-\|-\|
	\|Inference Steps\|40\|15\|15\|
	\|CFG Scale\|4\|1\|1\|
	\|Forward Passes\|80\|15\|15\|
	\|Example 1\|![](./assets/image_1_full.jpg)\|![](./assets/image_1_original.jpg)\|![](./assets/image_1_ours.jpg)\|
	\|Example 2\|![](./assets/image_2_full.jpg)\|![](./assets/image_2_original.jpg)\|![](./assets/image_2_ours.jpg)\|
	\|Example 3\|![](./assets/image_3_full.jpg)\|![](./assets/image_3_original.jpg)\|![](./assets/image_3_ours.jpg)\|

	## Inference Code

	```shell
	git clone https://github.com/modelscope/DiffSynth-Studio.git
	cd DiffSynth-Studio
	pip install -e .
	```

	```python
	from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
	import torch


	pipe = QwenImagePipeline.from_pretrained(
	torch_dtype=torch.bfloat16,
	device="cuda",
	model_configs=[
	ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Distill-Full", origin_file_pattern="diffusion_pytorch_model*.safetensors"),
	ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
	ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
	],
	tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
	)
	prompt = "精致肖像，水下少女，蓝裙飘逸，发丝轻扬，光影透澈，气泡环绕，面容恬静，细节精致，梦幻唯美。"
	image = pipe(prompt, seed=0, num_inference_steps=15, cfg_scale=1)
	image.save("image.jpg")
	```