|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
<div align="center"> |
|
|
|
|
|
<h1> lyraDiff: An Out-of-the-box Acceleration Engine for Diffusion and DiT Models</h1> |
|
|
|
|
|
</div> |
|
|
|
|
|
|
|
|
`lyraDiff` introduces a **recompilation-free** inference engine for Diffusion and DiT models, achieving **state-of-the-art speed**, **extensive model support**, and **pixel-level image consistency**. |
|
|
|
|
|
## Highlights |
|
|
- **State-of-the-art Inference Speed**: `lyraDiff` utilizes multiple techniques to achieve up to **6.1x** speedup of the model inference, including **Quantization**, **Fused GEMM Kernels**, **Flash Attention**, and **NHWC & Fused GroupNorm**. |
|
|
- **Memory Efficiency**: `lyraDiff` utilizes buffer-based DRAM reuse strategy and multiple types of quantizations (FP8/INT8/INT4) to save **10-40%** of DRAM usage. |
|
|
- **Extensive Model Support**: `lyraDiff` supports a wide range of top Generative/SR models such as **SD1.5, SDXL, FLUX, S3Diff, etc.**, and those most commonly used plugins such as **LoRA, ControlNet and Ip-Adapter**. |
|
|
- **Zero Compilation Deployment**: Unlike **TensorRT** or **AITemplate**, which takes minutes to compile, `lyraDiff` eliminates runtime recompilation overhead even with model inputs of dynamic shapes. |
|
|
- **Image Gen Consistency**: The outputs of `lyraDiff` are aligned with the ones of [HF diffusers](https://github.com/huggingface/diffusers) at the pixel level, even under LoRA switch in quantization mode. |
|
|
- **Fast Plugin Hot-swap**: `lyraDiff` provides **Super Fast Model Hot-swap for ControlNet and LoRA** which can hugely benefit a real-time image gen service. |
|
|
|
|
|
## Usage |
|
|
|
|
|
`lyraDiff-IP-Adapters` is converted from the standard [IP-Adapter](https://huggingface.co/h94/IP-Adapter) weights using this [script](https://github.com/TMElyralab/lyraDiff/blob/main/lyradiff/convert_model_scripts/convert_ipadapter.py) to be compatiable with [lyraDiff](https://github.com/TMElyralab/lyraDiff), and contains both SD1.5 and SDXL version of converted IP-Adapter |
|
|
|
|
|
We provide a reference implementation of lyraDiff version of SD1.5/SDXL, as well as sampling code, in a dedicated [github repository](https://github.com/TMElyralab/lyraDiff). |
|
|
|
|
|
### Example |
|
|
We provide minimal [script](https://github.com/TMElyralab/lyraDiff/blob/main/examples/SDXL/ipadapter_demo.py) for running SDXL models + IP-Adapter with lyraDiff as follows: |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import time |
|
|
import sys, os |
|
|
from diffusers import StableDiffusionXLPipeline |
|
|
from lyradiff.lyradiff_model.module.lyradiff_ip_adapter import LyraIPAdapter |
|
|
from transformers import CLIPTextModel, CLIPTokenizer, CLIPTextModelWithProjection |
|
|
from lyradiff.lyradiff_model.lyradiff_unet_model import LyraDiffUNet2DConditionModel |
|
|
from lyradiff.lyradiff_model.lyradiff_vae_model import LyraDiffVaeModel |
|
|
from diffusers import EulerAncestralDiscreteScheduler |
|
|
from PIL import Image |
|
|
from diffusers.utils import load_image |
|
|
import GPUtil |
|
|
|
|
|
model_path = "/path/to/sdxl/model/" |
|
|
vae_model_path = "/path/to/sdxl/sdxl-vae-fp16-fix" |
|
|
|
|
|
text_encoder = CLIPTextModel.from_pretrained(model_path, subfolder="text_encoder").to(torch.float16).to(torch.device("cuda")) |
|
|
text_encoder_2 = CLIPTextModelWithProjection.from_pretrained(model_path, subfolder="text_encoder_2").to(torch.float16).to(torch.device("cuda")) |
|
|
tokenizer = CLIPTokenizer.from_pretrained(model_path, subfolder="tokenizer") |
|
|
tokenizer_2 = CLIPTokenizer.from_pretrained( model_path, subfolder="tokenizer_2") |
|
|
|
|
|
unet = LyraDiffUNet2DConditionModel(is_sdxl=True) |
|
|
vae = LyraDiffVaeModel(scaling_factor=0.13025, is_upcast=False) |
|
|
|
|
|
unet.load_from_diffusers_model(os.path.join(model_path, "unet")) |
|
|
vae.load_from_diffusers_model(vae_model_path) |
|
|
|
|
|
scheduler = EulerAncestralDiscreteScheduler.from_pretrained(model_path, subfolder="scheduler", timestep_spacing="linspace") |
|
|
|
|
|
pipe = StableDiffusionXLPipeline( |
|
|
vae=vae, |
|
|
unet=unet, |
|
|
text_encoder=text_encoder, |
|
|
text_encoder_2=text_encoder_2, |
|
|
tokenizer=tokenizer, |
|
|
tokenizer_2=tokenizer_2, |
|
|
scheduler=scheduler |
|
|
) |
|
|
|
|
|
ip_ckpt = "/path/to/sdxl/ip_ckpt/ip-adapter-plus_sdxl_vit-h.bin" |
|
|
image_encoder_path = "/path/to/sdxl/ip_ckpt/image_encoder" |
|
|
|
|
|
# Create LyraIPAdapter |
|
|
ip_adapter = LyraIPAdapter(unet_model=unet.model, sdxl=True, device=torch.device("cuda"), ip_ckpt=ip_ckpt, ip_plus=True, image_encoder_path=image_encoder_path, num_ip_tokens=16, ip_projection_dim=1024) |
|
|
|
|
|
# load ip_adapter image |
|
|
ip_image = load_image("https://cdn-uploads.huggingface.co/production/uploads/6461b412846a6c8c8305319d/8U6yNHTPLaOC3gIWJZWGL.png") |
|
|
ip_scale = 0.5 |
|
|
|
|
|
# get ip image embedding and pass it to the pipeline |
|
|
ip_image_embedding = [ip_adapter.get_image_embeds_lyradiff(ip_image)['ip_hidden_states']] |
|
|
# unet set ip adapter scale in unet model obj, since we cannot set ip_adapter_scale through diffusers pipeline |
|
|
unet.set_ip_adapter_scale(ip_scale) |
|
|
|
|
|
for i in range(3): |
|
|
generator = torch.Generator("cuda").manual_seed(123) |
|
|
start = time.perf_counter() |
|
|
images = pipe(prompt="a beautiful girl, cartoon style", |
|
|
height=1024, |
|
|
width=1024, |
|
|
num_inference_steps=20, |
|
|
num_images_per_prompt=1, |
|
|
guidance_scale=7.5, |
|
|
negative_prompt="NSFW", |
|
|
generator=torch.Generator("cuda").manual_seed(123), |
|
|
ip_adapter_image_embeds=ip_image_embedding |
|
|
)[0] |
|
|
images[0].save(f"sdxl_ip_{i}.png") |
|
|
``` |
|
|
|
|
|
|
|
|
## Citation |
|
|
``` bibtex |
|
|
@Misc{lyraDiff_2025, |
|
|
author = {Kangjian Wu, Zhengtao Wang, Yibo Lu, Haoxiong Su, Sa Xiao, Qiwen Mao, Mian Peng, Bin Wu, Wenjiang Zhou}, |
|
|
title = {lyraDiff: Accelerating Diffusion Models with best flexibility}, |
|
|
howpublished = {\url{https://github.com/TMElyralab/lyraDiff}}, |
|
|
year = {2025} |
|
|
} |