# Introduction This demo application ("demoDiffusion") showcases the acceleration of [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion-v1-4) pipeline using TensorRT plugins. # Setup ### Clone the TensorRT OSS repository ```bash git clone git@github.com:NVIDIA/TensorRT.git -b release/8.5 --single-branch cd TensorRT git submodule update --init --recursive ``` ### Launch TensorRT NGC container Install nvidia-docker using [these intructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker). ```bash docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/tensorrt:22.10-py3 /bin/bash ``` ### (Optional) Install latest TensorRT release ```bash python3 -m pip install --upgrade pip python3 -m pip install --upgrade tensorrt ``` > NOTE: Alternatively, you can download and install TensorRT packages from [NVIDIA TensorRT Developer Zone](https://developer.nvidia.com/tensorrt). ### Build TensorRT plugins library Build TensorRT Plugins library using the [TensorRT OSS build instructions](https://github.com/NVIDIA/TensorRT/blob/main/README.md#building-tensorrt-oss). ```bash export TRT_OSSPATH=/workspace cd $TRT_OSSPATH mkdir -p build && cd build cmake .. -DTRT_OUT_DIR=$PWD/out cd plugin make -j$(nproc) export PLUGIN_LIBS="$TRT_OSSPATH/build/out/libnvinfer_plugin.so" ``` ### Install required packages ```bash cd $TRT_OSSPATH/demo/Diffusion pip3 install -r requirements.txt # Create output directories mkdir -p onnx engine output ``` > NOTE: demoDiffusion has been tested on systems with NVIDIA A100, RTX3090, and RTX4090 GPUs, and the following software configuration. ``` cuda-python 11.8.1 diffusers 0.7.2 onnx 1.12.0 onnx-graphsurgeon 0.3.25 onnxruntime 1.13.1 polygraphy 0.43.1 tensorrt 8.5.1.7 tokenizers 0.13.2 torch 1.12.0+cu116 transformers 4.24.0 ``` > NOTE: optionally install HuggingFace [accelerate](https://pypi.org/project/accelerate/) package for faster and less memory-intense model loading. # Running demoDiffusion ### Review usage instructions ```bash python3 demo-diffusion.py --help ``` ### HuggingFace user access token To download the model checkpoints for the Stable Diffusion pipeline, you will need a `read` access token. See [instructions](https://huggingface.co/docs/hub/security-tokens). ```bash export HF_TOKEN= ``` ### Generate an image guided by a single text prompt ```bash LD_PRELOAD=${PLUGIN_LIBS} python3 demo-diffusion.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN -v ``` # Restrictions - Upto 16 simultaneous prompts (maximum batch size) per inference. - For generating images of dynamic shapes without rebuilding the engines, use `--force-dynamic-shape`. - Supports images sizes between 256x256 and 1024x1024.