Add files using upload-large-folder tool

e2bcd96 verified 5 months ago

6.57 kB

	# T-GATE

	[T-GATE](https://github.com/HaozheLiu-ST/T-GATE/tree/main) 通过跳过交叉注意力计算一旦收敛，加速了 [Stable Diffusion](../api/pipelines/stable_diffusion/overview)、[PixArt](../api/pipelines/pixart) 和 [Latency Consistency Model](../api/pipelines/latent_consistency_models.md) 管道的推理。此方法不需要任何额外训练，可以将推理速度提高 10-50%。T-GATE 还与 [DeepCache](./deepcache) 等其他优化方法兼容。

	开始之前，请确保安装 T-GATE。

	```bash
	pip install tgate
	pip install -U torch diffusers transformers accelerate DeepCache
	```

	要使用 T-GATE 与管道，您需要使用其对应的加载器。

	\| 管道 \| T-GATE 加载器 \|
	\|---\|---\|
	\| PixArt \| TgatePixArtLoader \|
	\| Stable Diffusion XL \| TgateSDXLLoader \|
	\| Stable Diffusion XL + DeepCache \| TgateSDXLDeepCacheLoader \|
	\| Stable Diffusion \| TgateSDLoader \|
	\| Stable Diffusion + DeepCache \| TgateSDDeepCacheLoader \|

	接下来，创建一个 `TgateLoader`，包含管道、门限步骤（停止计算交叉注意力的时间步）和推理步骤数。然后在管道上调用 `tgate` 方法，提供提示、门限步骤和推理步骤数。

	让我们看看如何为几个不同的管道启用此功能。

	<hfoptions id="pipelines">
	<hfoption id="PixArt">

	使用 T-GATE 加速 `PixArtAlphaPipeline`：

	```py
	import torch
	from diffusers import PixArtAlphaPipeline
	from tgate import TgatePixArtLoader

	pipe = PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-1024-MS", torch_dtype=torch.float16)

	gate_step = 8
	inference_step = 25
	pipe = TgatePixArtLoader(
	pipe,
	gate_step=gate_step,
	num_inference_steps=inference_step,
	).to("cuda")

	image = pipe.tgate(
	"An alpaca made of colorful building blocks, cyberpunk.",
	gate_step=gate_step,
	num_inference_steps=inference_step,
	).images[0]
	```
	</hfoption>
	<hfoption id="Stable Diffusion XL">

	使用 T-GATE 加速 `StableDiffusionXLPipeline`：

	```py
	import torch
	from diffusers import StableDiffusionXLPipeline
	from diffusers import DPMSolverMultistepScheduler
	from tgate import TgateSDXLLoader

	pipe = StableDiffusionXLPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	torch_dtype=torch.float16,
	variant="fp16",
	use_safetensors=True,
	)
	pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

	gate_step = 10
	inference_step = 25
	pipe = TgateSDXLLoader(
	pipe,
	gate_step=gate_step,
	num_inference_steps=inference_step,
	).to("cuda")

	image = pipe.tgate(
	"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k.",
	gate_step=gate_step,
	num_inference_steps=inference_step
	).images[0]
	```
	</hfoption>
	<hfoption id="StableDiffusionXL with DeepCache">

	使用 [DeepCache](https://github.co 加速 `StableDiffusionXLPipeline`
	m/horseee/DeepCache) 和 T-GATE：

	```py
	import torch
	from diffusers import StableDiffusionXLPipeline
	from diffusers import DPMSolverMultistepScheduler
	from tgate import TgateSDXLDeepCacheLoader

	pipe = StableDiffusionXLPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	torch_dtype=torch.float16,
	variant="fp16",
	use_safetensors=True,
	)
	pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

	gate_step = 10
	inference_step = 25
	pipe = TgateSDXLDeepCacheLoader(
	pipe,
	cache_interval=3,
	cache_branch_id=0,
	).to("cuda")

	image = pipe.tgate(
	"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k.",
	gate_step=gate_step,
	num_inference_steps=inference_step
	).images[0]
	```
	</hfoption>
	<hfoption id="Latent Consistency Model">

	使用 T-GATE 加速 `latent-consistency/lcm-sdxl`：

	```py
	import torch
	from diffusers import StableDiffusionXLPipeline
	from diffusers import UNet2DConditionModel, LCMScheduler
	from diffusers import DPMSolverMultistepScheduler
	from tgate import TgateSDXLLoader

	unet = UNet2DConditionModel.from_pretrained(
	"latent-consistency/lcm-sdxl",
	torch_dtype=torch.float16,
	variant="fp16",
	)
	pipe = StableDiffusionXLPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	unet=unet,
	torch_dtype=torch.float16,
	variant="fp16",
	)
	pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

	gate_step = 1
	inference_step = 4
	pipe = TgateSDXLLoader(
	pipe,
	gate_step=gate_step,
	num_inference_steps=inference_step,
	lcm=True
	).to("cuda")

	image = pipe.tgate(
	"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k.",
	gate_step=gate_step,
	num_inference_steps=inference_step
	).images[0]
	```
	</hfoption>
	</hfoptions>

	T-GATE 还支持 [`StableDiffusionPipeline`] 和 [PixArt-alpha/PixArt-LCM-XL-2-1024-MS](https://hf.co/PixArt-alpha/PixArt-LCM-XL-2-1024-MS)。

	## 基准测试
	\| 模型 \| MACs \| 参数 \| 延迟 \| 零样本 10K-FID on MS-COCO \|
	\|-----------------------\|----------\|-----------\|---------\|---------------------------\|
	\| SD-1.5 \| 16.938T \| 859.520M \| 7.032s \| 23.927 \|
	\| SD-1.5 w/ T-GATE \| 9.875T \| 815.557M \| 4.313s \| 20.789 \|
	\| SD-2.1 \| 38.041T \| 865.785M \| 16.121s \| 22.609 \|
	\| SD-2.1 w/ T-GATE \| 22.208T \| 815.433 M \| 9.878s \| 19.940 \|
	\| SD-XL \| 149.438T \| 2.570B \| 53.187s \| 24.628 \|
	\| SD-XL w/ T-GATE \| 84.438T \| 2.024B \| 27.932s \| 22.738 \|
	\| Pixart-Alpha \| 107.031T \| 611.350M \| 61.502s \| 38.669 \|
	\| Pixart-Alpha w/ T-GATE \| 65.318T \| 462.585M \| 37.867s \| 35.825 \|
	\| DeepCache (SD-XL) \| 57.888T \| - \| 19.931s \| 23.755 \|
	\| DeepCache 配合 T-GATE \| 43.868T \| - \| 14.666秒 \| 23.999 \|
	\| LCM (SD-XL) \| 11.955T \| 2.570B \| 3.805秒 \| 25.044 \|
	\| LCM 配合 T-GATE \| 11.171T \| 2.024B \| 3.533秒 \| 25.028 \|
	\| LCM (Pixart-Alpha) \| 8.563T \| 611.350M \| 4.733秒 \| 36.086 \|
	\| LCM 配合 T-GATE \| 7.623T \| 462.585M \| 4.543秒 \| 37.048 \|

	延迟测试基于 NVIDIA 1080TI，MACs 和 Params 使用 [calflops](https://github.com/MrYxJ/calculate-flops.pytorch) 计算，FID 使用 [PytorchFID](https://github.com/mseitzer/pytorch-fid) 计算。