FasterDFlash
/

Hanrui

Model card Files Files and versions

Hanrui / sglang /docs /diffusion /performance /cache /teacache.md

Lekr0's picture

Add files using upload-large-folder tool

6268841 verified 27 days ago

|

history blame contribute delete

2.96 kB

	# TeaCache Acceleration

	> Note: This is one of two caching strategies available in SGLang.
	> For an overview of all caching options, see [caching](../index.md).

	TeaCache (Temporal similarity-based caching) accelerates diffusion inference by detecting when consecutive denoising steps are similar enough to skip computation entirely.

	## Overview

	TeaCache works by:
	1. Tracking the L1 distance between modulated inputs across consecutive timesteps
	2. Accumulating the rescaled L1 distance over steps
	3. When accumulated distance is below a threshold, reusing the cached residual
	4. Supporting CFG (Classifier-Free Guidance) with separate positive/negative caches

	## How It Works

	### L1 Distance Tracking

	At each denoising step, TeaCache computes the relative L1 distance between the current and previous modulated inputs:

	```
	rel_l1 = \|current - previous\|.mean() / \|previous\|.mean()
	```

	This distance is then rescaled using polynomial coefficients and accumulated:

	```
	accumulated += poly(coefficients)(rel_l1)
	```

	### Cache Decision

	- If `accumulated >= threshold`: Force computation, reset accumulator
	- If `accumulated < threshold`: Skip computation, use cached residual

	### CFG Support

	For models that support CFG cache separation (Wan, Hunyuan, Z-Image), TeaCache maintains separate caches for positive and negative branches:
	- `previous_modulated_input` / `previous_residual` for positive branch
	- `previous_modulated_input_negative` / `previous_residual_negative` for negative branch

	For models that don't support CFG separation (Flux, Qwen), TeaCache is automatically disabled when CFG is enabled.

	## Configuration

	TeaCache is configured via `TeaCacheParams` in the sampling parameters:

	```python
	from sglang.multimodal_gen.configs.sample.teacache import TeaCacheParams

	params = TeaCacheParams(
	teacache_thresh=0.1, # Threshold for accumulated L1 distance
	coefficients=[1.0, 0.0, 0.0], # Polynomial coefficients for L1 rescaling
	)
	```

	### Parameters

	\| Parameter \| Type \| Description \|
	\|-----------\|------\|-------------\|
	\| `teacache_thresh` \| float \| Threshold for accumulated L1 distance. Lower = more caching, faster but potentially lower quality \|
	\| `coefficients` \| list[float] \| Polynomial coefficients for L1 rescaling. Model-specific tuning \|

	### Model-Specific Configurations

	Different models may have different optimal configurations. The coefficients are typically tuned per-model to balance speed and quality.

	## Supported Models

	TeaCache is built into the following model families:

	\| Model Family \| CFG Cache Separation \| Notes \|
	\|--------------\|---------------------\|-------\|
	\| Wan (wan2.1, wan2.2) \| Yes \| Full support \|
	\| Hunyuan (HunyuanVideo) \| Yes \| To be supported \|
	\| Z-Image \| Yes \| To be supported \|
	\| Flux \| No \| To be supported \|
	\| Qwen \| No \| To be supported \|


	## References

	- [TeaCache: Accelerating Diffusion Models with Temporal Similarity](https://arxiv.org/abs/2411.14324)