| | --- |
| | library_name: tensorrt |
| | license: openrail++ |
| | base_model: stabilityai/stable-diffusion-xl-base-1.0 |
| | language: |
| | - en |
| | tags: |
| | - stable-diffusion |
| | - stable-diffusion-xl |
| | - stable-diffusion-xl-lcm |
| | - stable-diffusion-xl-lcmlora |
| | - tensorrt |
| | - text-to-image |
| | --- |
| | |
| | # Stable Diffusion XL 1.0 TensorRT |
| |
|
| | ## Introduction |
| |
|
| | This repository hosts the TensorRT versions(sdxl, sdxl-lcm, sdxl-lcmlora) of **Stable Diffusion XL 1.0** created in collaboration with [NVIDIA](https://huggingface.co/nvidia). The optimized versions give substantial improvements in speed and efficiency. |
| |
|
| | See the [usage instructions](#usage-example) for how to run the SDXL pipeline with the ONNX files hosted in this repository. |
| |
|
| |
|
| |  |
| |
|
| | ## Model Description |
| |
|
| | - **Developed by:** Stability AI |
| | - **Model type:** Diffusion-based text-to-image generative model |
| | - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/blob/main/LICENSE.md) |
| | - **Model Description:** This is a conversion of the [SDXL base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [SDXL refiner 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0) models for [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) optimized inference |
| |
|
| |
|
| | ## Performance Comparison |
| |
|
| | #### Timings for 30 steps at 1024x1024 |
| |
|
| | | Accelerator | Baseline (non-optimized) | NVIDIA TensorRT (optimized) | Percentage improvement | |
| | |-------------|--------------------------|-----------------------------|------------------------| |
| | | A10 | 9399 ms | 8160 ms | ~13% | |
| | | A100 | 3704 ms | 2742 ms | ~26% | |
| | | H100 | 2496 ms | 1471 ms | ~41% | |
| |
|
| | #### Image throughput for 30 steps at 1024x1024 |
| |
|
| | | Accelerator | Baseline (non-optimized) | NVIDIA TensorRT (optimized) | Percentage improvement | |
| | |-------------|--------------------------|-----------------------------|------------------------| |
| | | A10 | 0.10 images/sec | 0.12 images/sec | ~20% | |
| | | A100 | 0.27 images/sec | 0.36 images/sec | ~33% | |
| | | H100 | 0.40 images/sec | 0.68 images/sec | ~70% | |
| |
|
| | #### Timings for Latent Consistency Model(LCM) version for 4 steps at 1024x1024 |
| |
|
| | | Accelerator | CLIP | Unet | VAE |Total | |
| | |-------------|--------------------------|-----------------------------|------------------------|------------------------| |
| | | A100 | 1.08 ms | 192.02 ms | 228.34 ms | 426.16 ms | |
| | | H100 | 0.78 ms | 102.8 ms | 126.95 ms | 234.22 ms | |
| |
|
| |
|
| | ## Usage Example |
| |
|
| | 1. Following the [setup instructions](https://github.com/rajeevsrao/TensorRT/blob/release/9.2/demo/Diffusion/README.md) on launching a TensorRT NGC container. |
| | ```shell |
| | git clone https://github.com/rajeevsrao/TensorRT.git |
| | cd TensorRT |
| | git checkout release/9.2 |
| | docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.11-py3 /bin/bash |
| | ``` |
| |
|
| | 2. Download the SDXL TensorRT files from this repo |
| | ```shell |
| | git lfs install |
| | git clone https://huggingface.co/stabilityai/stable-diffusion-xl-1.0-tensorrt |
| | cd stable-diffusion-xl-1.0-tensorrt |
| | git lfs pull |
| | cd .. |
| | ``` |
| |
|
| | 3. Install libraries and requirements |
| | ```shell |
| | cd demo/Diffusion |
| | python3 -m pip install --upgrade pip |
| | pip3 install -r requirements.txt |
| | python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt |
| | ``` |
| |
|
| | 4. Perform TensorRT optimized inference: |
| |
|
| | - **SDXL** |
| | |
| | The first invocation produces plan files in `engine_xl_base` and `engine_xl_refiner` specific to the accelerator being run on and are reused for later invocations. |
| | |
| | ``` |
| | python3 demo_txt2img_xl.py \ |
| | "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" \ |
| | --build-static-batch \ |
| | --use-cuda-graph \ |
| | --num-warmup-runs 1 \ |
| | --width 1024 \ |
| | --height 1024 \ |
| | --denoising-steps 30 \ |
| | --onnx-base-dir /workspace/stable-diffusion-xl-1.0-tensorrt/sdxl-1.0-base \ |
| | --onnx-refiner-dir /workspace/stable-diffusion-xl-1.0-tensorrt/sdxl-1.0-refiner |
| | ``` |
| | |
| | - **SDXL-LCM** |
| | |
| | The first invocation produces plan files in --engine-dir specific to the accelerator being run on and are reused for later invocations. |
| | ``` |
| | python3 demo_txt2img_xl.py \ |
| | ""Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"" \ |
| | --version=xl-1.0 \ |
| | --onnx-dir /workspace/stable-diffusion-xl-1.0-tensorrt/lcm \ |
| | --engine-dir /workspace/stable-diffusion-xl-1.0-tensorrt/lcm/engine-sdxl-lcm-nocfg \ |
| | --scheduler LCM \ |
| | --denoising-steps 4 \ |
| | --guidance-scale 0.0 \ |
| | --seed 42 |
| | |
| | ``` |
| | - **SDXL-LCMLORA** |
| | |
| | The first invocation produces plan files in --engine-dir specific to the accelerator being run on and are reused for later invocations. |
| | |
| | ``` |
| | python3 demo_txt2img_xl.py \ |
| | ""Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"" \ |
| | --version=xl-1.0 \ |
| | --onnx-dir /workspace/stable-diffusion-xl-1.0-tensorrt/lcmlora \ |
| | --engine-dir /workspace/stable-diffusion-xl-1.0-tensorrt/lcm/engine-sdxl-lcmlora-nocfg \ |
| | --scheduler LCM \ |
| | --lora-path latent-consistency/lcm-lora-sdxl \ |
| | --lora-scale 1.0 \ |
| | --denoising-steps 4 \ |
| | --guidance-scale 0.0 \ |
| | --seed 42 |
| | |
| | ``` |