Update README.md

db9524e verified 10 months ago

5.68 kB

	---
	license: apache-2.0
	---
	<div align="center">

	<h1> lyraDiff: An Out-of-the-box Acceleration Engine for Diffusion and DiT Models</h1>

	</div>


	`lyraDiff` introduces a recompilation-free inference engine for Diffusion and DiT models, achieving state-of-the-art speed, extensive model support, and pixel-level image consistency.

	## Highlights
	- State-of-the-art Inference Speed: `lyraDiff` utilizes multiple techniques to achieve up to 6.1x speedup of the model inference, including Quantization, Fused GEMM Kernels, Flash Attention, and NHWC & Fused GroupNorm.
	- Memory Efficiency: `lyraDiff` utilizes buffer-based DRAM reuse strategy and multiple types of quantizations (FP8/INT8/INT4) to save 10-40% of DRAM usage.
	- Extensive Model Support: `lyraDiff` supports a wide range of top Generative/SR models such as SD1.5, SDXL, FLUX, S3Diff, etc., and those most commonly used plugins such as LoRA, ControlNet and Ip-Adapter.
	- Zero Compilation Deployment: Unlike TensorRT or AITemplate, which takes minutes to compile, `lyraDiff` eliminates runtime recompilation overhead even with model inputs of dynamic shapes.
	- Image Gen Consistency: The outputs of `lyraDiff` are aligned with the ones of [HF diffusers](https://github.com/huggingface/diffusers) at the pixel level, even under LoRA switch in quantization mode.
	- Fast Plugin Hot-swap: `lyraDiff` provides Super Fast Model Hot-swap for ControlNet and LoRA which can hugely benefit a real-time image gen service.

	## Usage

	`lyraDiff-IP-Adapters` is converted from the standard [IP-Adapter](https://huggingface.co/h94/IP-Adapter) weights using this [script](https://github.com/TMElyralab/lyraDiff/blob/main/lyradiff/convert_model_scripts/convert_ipadapter.py) to be compatiable with [lyraDiff](https://github.com/TMElyralab/lyraDiff), and contains both SD1.5 and SDXL version of converted IP-Adapter

	We provide a reference implementation of lyraDiff version of SD1.5/SDXL, as well as sampling code, in a dedicated [github repository](https://github.com/TMElyralab/lyraDiff).

	### Example
	We provide minimal [script](https://github.com/TMElyralab/lyraDiff/blob/main/examples/SDXL/ipadapter_demo.py) for running SDXL models + IP-Adapter with lyraDiff as follows:

	```python
	import torch
	import time
	import sys, os
	from diffusers import StableDiffusionXLPipeline
	from lyradiff.lyradiff_model.module.lyradiff_ip_adapter import LyraIPAdapter
	from transformers import CLIPTextModel, CLIPTokenizer, CLIPTextModelWithProjection
	from lyradiff.lyradiff_model.lyradiff_unet_model import LyraDiffUNet2DConditionModel
	from lyradiff.lyradiff_model.lyradiff_vae_model import LyraDiffVaeModel
	from diffusers import EulerAncestralDiscreteScheduler
	from PIL import Image
	from diffusers.utils import load_image
	import GPUtil

	model_path = "/path/to/sdxl/model/"
	vae_model_path = "/path/to/sdxl/sdxl-vae-fp16-fix"

	text_encoder = CLIPTextModel.from_pretrained(model_path, subfolder="text_encoder").to(torch.float16).to(torch.device("cuda"))
	text_encoder_2 = CLIPTextModelWithProjection.from_pretrained(model_path, subfolder="text_encoder_2").to(torch.float16).to(torch.device("cuda"))
	tokenizer = CLIPTokenizer.from_pretrained(model_path, subfolder="tokenizer")
	tokenizer_2 = CLIPTokenizer.from_pretrained( model_path, subfolder="tokenizer_2")

	unet = LyraDiffUNet2DConditionModel(is_sdxl=True)
	vae = LyraDiffVaeModel(scaling_factor=0.13025, is_upcast=False)

	unet.load_from_diffusers_model(os.path.join(model_path, "unet"))
	vae.load_from_diffusers_model(vae_model_path)

	scheduler = EulerAncestralDiscreteScheduler.from_pretrained(model_path, subfolder="scheduler", timestep_spacing="linspace")

	pipe = StableDiffusionXLPipeline(
	vae=vae,
	unet=unet,
	text_encoder=text_encoder,
	text_encoder_2=text_encoder_2,
	tokenizer=tokenizer,
	tokenizer_2=tokenizer_2,
	scheduler=scheduler
	)

	ip_ckpt = "/path/to/sdxl/ip_ckpt/ip-adapter-plus_sdxl_vit-h.bin"
	image_encoder_path = "/path/to/sdxl/ip_ckpt/image_encoder"

	# Create LyraIPAdapter
	ip_adapter = LyraIPAdapter(unet_model=unet.model, sdxl=True, device=torch.device("cuda"), ip_ckpt=ip_ckpt, ip_plus=True, image_encoder_path=image_encoder_path, num_ip_tokens=16, ip_projection_dim=1024)

	# load ip_adapter image
	ip_image = load_image("https://cdn-uploads.huggingface.co/production/uploads/6461b412846a6c8c8305319d/8U6yNHTPLaOC3gIWJZWGL.png")
	ip_scale = 0.5

	# get ip image embedding and pass it to the pipeline
	ip_image_embedding = [ip_adapter.get_image_embeds_lyradiff(ip_image)['ip_hidden_states']]
	# unet set ip adapter scale in unet model obj, since we cannot set ip_adapter_scale through diffusers pipeline
	unet.set_ip_adapter_scale(ip_scale)

	for i in range(3):
	generator = torch.Generator("cuda").manual_seed(123)
	start = time.perf_counter()
	images = pipe(prompt="a beautiful girl, cartoon style",
	height=1024,
	width=1024,
	num_inference_steps=20,
	num_images_per_prompt=1,
	guidance_scale=7.5,
	negative_prompt="NSFW",
	generator=torch.Generator("cuda").manual_seed(123),
	ip_adapter_image_embeds=ip_image_embedding
	)[0]
	images[0].save(f"sdxl_ip_{i}.png")
	```


	## Citation
	``` bibtex
	@Misc{lyraDiff_2025,
	author = {Kangjian Wu, Zhengtao Wang, Yibo Lu, Haoxiong Su, Sa Xiao, Qiwen Mao, Mian Peng, Bin Wu, Wenjiang Zhou},
	title = {lyraDiff: Accelerating Diffusion Models with best flexibility},
	howpublished = {\url{https://github.com/TMElyralab/lyraDiff}},
	year = {2025}
	}