| # Caching Acceleration for Diffusion Models |
|
|
| SGLang provides multiple caching acceleration strategies for Diffusion Transformer (DiT) models. These strategies can significantly reduce inference time by skipping redundant computation. |
|
|
| ## Overview |
|
|
| SGLang supports two complementary caching approaches: |
|
|
| | Strategy | Scope | Mechanism | Best For | |
| |----------|-------|-----------|----------| |
| | **Cache-DiT** | Block-level | Skip individual transformer blocks dynamically | Advanced, higher speedup | |
| | **TeaCache** | Timestep-level | Skip entire denoising steps based on L1 similarity | Simple, built-in | |
|
|
|
|
|
|
| ## Cache-DiT |
|
|
| [Cache-DiT](https://github.com/vipshop/cache-dit) provides block-level caching with |
| advanced strategies like DBCache and TaylorSeer. It can achieve up to **1.69x speedup**. |
|
|
| See [cache_dit.md](cache_dit.md) for detailed configuration. |
|
|
| ### Quick Start |
|
|
| ```bash |
| SGLANG_CACHE_DIT_ENABLED=true \ |
| sglang generate --model-path Qwen/Qwen-Image \ |
| --prompt "A beautiful sunset over the mountains" |
| ``` |
|
|
| ### Key Features |
|
|
| - **DBCache**: Dynamic block-level caching based on residual differences |
| - **TaylorSeer**: Taylor expansion-based calibration for optimized caching |
| - **SCM**: Step-level computation masking for additional speedup |
|
|
| ## TeaCache |
|
|
| TeaCache (Temporal similarity-based caching) accelerates diffusion inference by detecting when consecutive denoising steps are similar enough to skip computation entirely. |
|
|
| See [teacache.md](teacache.md) for detailed documentation. |
|
|
| ### Quick Overview |
|
|
| - Tracks L1 distance between modulated inputs across timesteps |
| - When accumulated distance is below threshold, reuses cached residual |
| - Supports CFG with separate positive/negative caches |
|
|
| ### Supported Models |
|
|
| - Wan (wan2.1, wan2.2) |
| - Hunyuan (HunyuanVideo) |
| - Z-Image |
|
|
| For Flux and Qwen models, TeaCache is automatically disabled when CFG is enabled. |
|
|
| ## References |
|
|
| - [Cache-DiT Repository](https://github.com/vipshop/cache-dit) |
| - [TeaCache Paper](https://arxiv.org/abs/2411.14324) |
|
|