File size: 3,126 Bytes
# SGLang Diffusion

SGLang Diffusion is an inference framework for accelerated image and video generation using diffusion models. It provides an end-to-end unified pipeline with optimized kernels and an efficient scheduler loop.

## Key Features

- **Broad Model Support**: Wan series, FastWan series, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux, Z-Image, GLM-Image, and more
- **Fast Inference**: Optimized kernels, efficient scheduler loop, and Cache-DiT acceleration
- **Ease of Use**: OpenAI-compatible API, CLI, and Python SDK
- **Multi-Platform**: NVIDIA GPUs (H100, H200, A100, B200, 4090), AMD GPUs (MI300X, MI325X) and Ascend NPU (A2, A3)

---

## Quick Start

### Installation

```bash
uv pip install "sglang[diffusion]" --prerelease=allow
```

See [Installation Guide](installation.md) for more installation methods and ROCm-specific instructions.

### Basic Usage

Generate an image with the CLI:

```bash
sglang generate --model-path Qwen/Qwen-Image \
    --prompt "A beautiful sunset over the mountains" \
    --save-output
```

Or start a server with the OpenAI-compatible API:

```bash
sglang serve --model-path Qwen/Qwen-Image --port 30010
```

---

## Documentation

### Getting Started

- **[Installation](installation.md)** - Install SGLang Diffusion via pip, uv, Docker, or from source
- **[Compatibility Matrix](compatibility_matrix.md)** - Supported models and optimization compatibility

### Usage

- **[CLI Documentation](api/cli.md)** - Command-line interface for `sglang generate` and `sglang serve`
- **[OpenAI API](api/openai_api.md)** - OpenAI-compatible API for image/video generation and LoRA management

### Performance Optimization

- **[Performance Overview](performance/index.md)** - Overview of all performance optimization strategies
- **[Attention Backends](performance/attention_backends.md)** - Available attention backends (FlashAttention, SageAttention, etc.)
- **[Caching Strategies](performance/cache/)** - Cache-DiT and TeaCache acceleration
- **[Profiling](performance/profiling.md)** - Profiling techniques with PyTorch Profiler and Nsight Systems

### Reference

- **[Environment Variables](environment_variables.md)** - Configuration via environment variables
- **[Support New Models](support_new_models.md)** - Guide for adding new diffusion models
- **[Contributing](contributing.md)** - Contribution guidelines and commit message conventions
- **[CI Performance](ci_perf.md)** - Performance baseline generation script

---

## CLI Quick Reference

### Generate (one-off generation)

```bash
sglang generate --model-path <MODEL> --prompt "<PROMPT>" --save-output
```

### Serve (HTTP server)

```bash
sglang serve --model-path <MODEL> --port 30010
```

### Enable Cache-DiT acceleration

```bash
SGLANG_CACHE_DIT_ENABLED=true sglang generate --model-path <MODEL> --prompt "<PROMPT>"
```

---

## References

- [SGLang GitHub](https://github.com/sgl-project/sglang)
- [Cache-DiT](https://github.com/vipshop/cache-dit)
- [FastVideo](https://github.com/hao-ai-lab/FastVideo)
- [xDiT](https://github.com/xdit-project/xDiT)
- [Diffusers](https://github.com/huggingface/diffusers)