FasterDFlash
/

Hanrui

Model card Files Files and versions

Hanrui / sglang /docs /diffusion /index.md

Lekr0's picture

Add files using upload-large-folder tool

6268841 verified about 1 month ago

|

history blame contribute delete

3.13 kB

	# SGLang Diffusion

	SGLang Diffusion is an inference framework for accelerated image and video generation using diffusion models. It provides an end-to-end unified pipeline with optimized kernels and an efficient scheduler loop.

	## Key Features

	- Broad Model Support: Wan series, FastWan series, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux, Z-Image, GLM-Image, and more
	- Fast Inference: Optimized kernels, efficient scheduler loop, and Cache-DiT acceleration
	- Ease of Use: OpenAI-compatible API, CLI, and Python SDK
	- Multi-Platform: NVIDIA GPUs (H100, H200, A100, B200, 4090), AMD GPUs (MI300X, MI325X) and Ascend NPU (A2, A3)

	---

	## Quick Start

	### Installation

	```bash
	uv pip install "sglang[diffusion]" --prerelease=allow
	```

	See [Installation Guide](installation.md) for more installation methods and ROCm-specific instructions.

	### Basic Usage

	Generate an image with the CLI:

	```bash
	sglang generate --model-path Qwen/Qwen-Image \
	--prompt "A beautiful sunset over the mountains" \
	--save-output
	```

	Or start a server with the OpenAI-compatible API:

	```bash
	sglang serve --model-path Qwen/Qwen-Image --port 30010
	```

	---

	## Documentation

	### Getting Started

	- [Installation](installation.md) - Install SGLang Diffusion via pip, uv, Docker, or from source
	- [Compatibility Matrix](compatibility_matrix.md) - Supported models and optimization compatibility

	### Usage

	- [CLI Documentation](api/cli.md) - Command-line interface for `sglang generate` and `sglang serve`
	- [OpenAI API](api/openai_api.md) - OpenAI-compatible API for image/video generation and LoRA management

	### Performance Optimization

	- [Performance Overview](performance/index.md) - Overview of all performance optimization strategies
	- [Attention Backends](performance/attention_backends.md) - Available attention backends (FlashAttention, SageAttention, etc.)
	- [Caching Strategies](performance/cache/) - Cache-DiT and TeaCache acceleration
	- [Profiling](performance/profiling.md) - Profiling techniques with PyTorch Profiler and Nsight Systems

	### Reference

	- [Environment Variables](environment_variables.md) - Configuration via environment variables
	- [Support New Models](support_new_models.md) - Guide for adding new diffusion models
	- [Contributing](contributing.md) - Contribution guidelines and commit message conventions
	- [CI Performance](ci_perf.md) - Performance baseline generation script

	---

	## CLI Quick Reference

	### Generate (one-off generation)

	```bash
	sglang generate --model-path <MODEL> --prompt "<PROMPT>" --save-output
	```

	### Serve (HTTP server)

	```bash
	sglang serve --model-path <MODEL> --port 30010
	```

	### Enable Cache-DiT acceleration

	```bash
	SGLANG_CACHE_DIT_ENABLED=true sglang generate --model-path <MODEL> --prompt "<PROMPT>"
	```

	---

	## References

	- [SGLang GitHub](https://github.com/sgl-project/sglang)
	- [Cache-DiT](https://github.com/vipshop/cache-dit)
	- [FastVideo](https://github.com/hao-ai-lab/FastVideo)
	- [xDiT](https://github.com/xdit-project/xDiT)
	- [Diffusers](https://github.com/huggingface/diffusers)