File size: 2,897 Bytes
ed19f8a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | # Introduction
This demo application ("demoDiffusion") showcases the acceleration of [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion-v1-4) pipeline using TensorRT plugins.
# Setup
### Clone the TensorRT OSS repository
```bash
git clone git@github.com:NVIDIA/TensorRT.git -b release/8.5 --single-branch
cd TensorRT
git submodule update --init --recursive
```
### Launch TensorRT NGC container
Install nvidia-docker using [these intructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker).
```bash
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/tensorrt:22.10-py3 /bin/bash
```
### (Optional) Install latest TensorRT release
```bash
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade tensorrt
```
> NOTE: Alternatively, you can download and install TensorRT packages from [NVIDIA TensorRT Developer Zone](https://developer.nvidia.com/tensorrt).
### Build TensorRT plugins library
Build TensorRT Plugins library using the [TensorRT OSS build instructions](https://github.com/NVIDIA/TensorRT/blob/main/README.md#building-tensorrt-oss).
```bash
export TRT_OSSPATH=/workspace
cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DTRT_OUT_DIR=$PWD/out
cd plugin
make -j$(nproc)
export PLUGIN_LIBS="$TRT_OSSPATH/build/out/libnvinfer_plugin.so"
```
### Install required packages
```bash
cd $TRT_OSSPATH/demo/Diffusion
pip3 install -r requirements.txt
# Create output directories
mkdir -p onnx engine output
```
> NOTE: demoDiffusion has been tested on systems with NVIDIA A100, RTX3090, and RTX4090 GPUs, and the following software configuration.
```
cuda-python 11.8.1
diffusers 0.7.2
onnx 1.12.0
onnx-graphsurgeon 0.3.25
onnxruntime 1.13.1
polygraphy 0.43.1
tensorrt 8.5.1.7
tokenizers 0.13.2
torch 1.12.0+cu116
transformers 4.24.0
```
> NOTE: optionally install HuggingFace [accelerate](https://pypi.org/project/accelerate/) package for faster and less memory-intense model loading.
# Running demoDiffusion
### Review usage instructions
```bash
python3 demo-diffusion.py --help
```
### HuggingFace user access token
To download the model checkpoints for the Stable Diffusion pipeline, you will need a `read` access token. See [instructions](https://huggingface.co/docs/hub/security-tokens).
```bash
export HF_TOKEN=<your access token>
```
### Generate an image guided by a single text prompt
```bash
LD_PRELOAD=${PLUGIN_LIBS} python3 demo-diffusion.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN -v
```
# Restrictions
- Upto 16 simultaneous prompts (maximum batch size) per inference.
- For generating images of dynamic shapes without rebuilding the engines, use `--force-dynamic-shape`.
- Supports images sizes between 256x256 and 1024x1024.
|