# Caching Acceleration for Diffusion Models

SGLang provides multiple caching acceleration strategies for Diffusion Transformer (DiT) models. These strategies can significantly reduce inference time by skipping redundant computation.

## Overview

SGLang supports two complementary caching approaches:

| Strategy | Scope | Mechanism | Best For |
|----------|-------|-----------|----------|
| **Cache-DiT** | Block-level | Skip individual transformer blocks dynamically | Advanced, higher speedup |
| **TeaCache** | Timestep-level | Skip entire denoising steps based on L1 similarity | Simple, built-in |


## Cache-DiT

[Cache-DiT](https://github.com/vipshop/cache-dit) provides block-level caching with
advanced strategies like DBCache and TaylorSeer. It can achieve up to **1.69x speedup**.

See [cache_dit.md](cache_dit.md) for detailed configuration.

### Quick Start

```bash
SGLANG_CACHE_DIT_ENABLED=true \
sglang generate --model-path Qwen/Qwen-Image \
    --prompt "A beautiful sunset over the mountains"
```

### Key Features

- **DBCache**: Dynamic block-level caching based on residual differences
- **TaylorSeer**: Taylor expansion-based calibration for optimized caching
- **SCM**: Step-level computation masking for additional speedup

## TeaCache

TeaCache (Temporal similarity-based caching) accelerates diffusion inference by detecting when consecutive denoising steps are similar enough to skip computation entirely.

See [teacache.md](teacache.md) for detailed documentation.

### Quick Overview

- Tracks L1 distance between modulated inputs across timesteps
- When accumulated distance is below threshold, reuses cached residual
- Supports CFG with separate positive/negative caches

### Supported Models

- Wan (wan2.1, wan2.2)
- Hunyuan (HunyuanVideo)
- Z-Image

For Flux and Qwen models, TeaCache is automatically disabled when CFG is enabled.

## References

- [Cache-DiT Repository](https://github.com/vipshop/cache-dit)
- [TeaCache Paper](https://arxiv.org/abs/2411.14324)