peterfeltermuff pharmapsychotic commited on
Commit
10baa6f
·
0 Parent(s):

Duplicate from stabilityai/stable-diffusion-xl-1.0-tensorrt

Browse files

Co-authored-by: pharmapsychotic <pharmapsychotic@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ lcm/unetxl.opt/dbf91c42-985c-11ee-9041-0242ac110002 filter=lfs diff=lfs merge=lfs -text
37
+ lcmlora/unetxl-8c8ce9e8b00b259425e5f3eaa4b1d705-1.00.opt/1376e228-9608-11ee-9b07-0242ac110002 filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: tensorrt
3
+ license: openrail++
4
+ base_model: stabilityai/stable-diffusion-xl-base-1.0
5
+ language:
6
+ - en
7
+ tags:
8
+ - stable-diffusion
9
+ - stable-diffusion-xl
10
+ - stable-diffusion-xl-lcm
11
+ - stable-diffusion-xl-lcmlora
12
+ - tensorrt
13
+ - text-to-image
14
+ ---
15
+
16
+ # Stable Diffusion XL 1.0 TensorRT
17
+
18
+ ## Introduction
19
+
20
+ This repository hosts the TensorRT versions(sdxl, sdxl-lcm, sdxl-lcmlora) of **Stable Diffusion XL 1.0** created in collaboration with [NVIDIA](https://huggingface.co/nvidia). The optimized versions give substantial improvements in speed and efficiency.
21
+
22
+ See the [usage instructions](#usage-example) for how to run the SDXL pipeline with the ONNX files hosted in this repository.
23
+
24
+
25
+ ![examples](./examples.jpg)
26
+
27
+ ## Model Description
28
+
29
+ - **Developed by:** Stability AI
30
+ - **Model type:** Diffusion-based text-to-image generative model
31
+ - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/blob/main/LICENSE.md)
32
+ - **Model Description:** This is a conversion of the [SDXL base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [SDXL refiner 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0) models for [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) optimized inference
33
+
34
+
35
+ ## Performance Comparison
36
+
37
+ #### Timings for 30 steps at 1024x1024
38
+
39
+ | Accelerator | Baseline (non-optimized) | NVIDIA TensorRT (optimized) | Percentage improvement |
40
+ |-------------|--------------------------|-----------------------------|------------------------|
41
+ | A10 | 9399 ms | 8160 ms | ~13% |
42
+ | A100 | 3704 ms | 2742 ms | ~26% |
43
+ | H100 | 2496 ms | 1471 ms | ~41% |
44
+
45
+ #### Image throughput for 30 steps at 1024x1024
46
+
47
+ | Accelerator | Baseline (non-optimized) | NVIDIA TensorRT (optimized) | Percentage improvement |
48
+ |-------------|--------------------------|-----------------------------|------------------------|
49
+ | A10 | 0.10 images/sec | 0.12 images/sec | ~20% |
50
+ | A100 | 0.27 images/sec | 0.36 images/sec | ~33% |
51
+ | H100 | 0.40 images/sec | 0.68 images/sec | ~70% |
52
+
53
+ #### Timings for Latent Consistency Model(LCM) version for 4 steps at 1024x1024
54
+
55
+ | Accelerator | CLIP | Unet | VAE |Total |
56
+ |-------------|--------------------------|-----------------------------|------------------------|------------------------|
57
+ | A100 | 1.08 ms | 192.02 ms | 228.34 ms | 426.16 ms |
58
+ | H100 | 0.78 ms | 102.8 ms | 126.95 ms | 234.22 ms |
59
+
60
+
61
+ ## Usage Example
62
+
63
+ 1. Following the [setup instructions](https://github.com/rajeevsrao/TensorRT/blob/release/9.2/demo/Diffusion/README.md) on launching a TensorRT NGC container.
64
+ ```shell
65
+ git clone https://github.com/rajeevsrao/TensorRT.git
66
+ cd TensorRT
67
+ git checkout release/9.2
68
+ docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.11-py3 /bin/bash
69
+ ```
70
+
71
+ 2. Download the SDXL TensorRT files from this repo
72
+ ```shell
73
+ git lfs install
74
+ git clone https://huggingface.co/stabilityai/stable-diffusion-xl-1.0-tensorrt
75
+ cd stable-diffusion-xl-1.0-tensorrt
76
+ git lfs pull
77
+ cd ..
78
+ ```
79
+
80
+ 3. Install libraries and requirements
81
+ ```shell
82
+ cd demo/Diffusion
83
+ python3 -m pip install --upgrade pip
84
+ pip3 install -r requirements.txt
85
+ python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt
86
+ ```
87
+
88
+ 4. Perform TensorRT optimized inference:
89
+
90
+ - **SDXL**
91
+
92
+ The first invocation produces plan files in `engine_xl_base` and `engine_xl_refiner` specific to the accelerator being run on and are reused for later invocations.
93
+
94
+ ```
95
+ python3 demo_txt2img_xl.py \
96
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" \
97
+ --build-static-batch \
98
+ --use-cuda-graph \
99
+ --num-warmup-runs 1 \
100
+ --width 1024 \
101
+ --height 1024 \
102
+ --denoising-steps 30 \
103
+ --onnx-base-dir /workspace/stable-diffusion-xl-1.0-tensorrt/sdxl-1.0-base \
104
+ --onnx-refiner-dir /workspace/stable-diffusion-xl-1.0-tensorrt/sdxl-1.0-refiner
105
+ ```
106
+
107
+ - **SDXL-LCM**
108
+
109
+ The first invocation produces plan files in --engine-dir specific to the accelerator being run on and are reused for later invocations.
110
+ ```
111
+ python3 demo_txt2img_xl.py \
112
+ ""Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"" \
113
+ --version=xl-1.0 \
114
+ --onnx-dir /workspace/stable-diffusion-xl-1.0-tensorrt/lcm \
115
+ --engine-dir /workspace/stable-diffusion-xl-1.0-tensorrt/lcm/engine-sdxl-lcm-nocfg \
116
+ --scheduler LCM \
117
+ --denoising-steps 4 \
118
+ --guidance-scale 0.0 \
119
+ --seed 42
120
+
121
+ ```
122
+ - **SDXL-LCMLORA**
123
+
124
+ The first invocation produces plan files in --engine-dir specific to the accelerator being run on and are reused for later invocations.
125
+
126
+ ```
127
+ python3 demo_txt2img_xl.py \
128
+ ""Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"" \
129
+ --version=xl-1.0 \
130
+ --onnx-dir /workspace/stable-diffusion-xl-1.0-tensorrt/lcmlora \
131
+ --engine-dir /workspace/stable-diffusion-xl-1.0-tensorrt/lcm/engine-sdxl-lcmlora-nocfg \
132
+ --scheduler LCM \
133
+ --lora-path latent-consistency/lcm-lora-sdxl \
134
+ --lora-scale 1.0 \
135
+ --denoising-steps 4 \
136
+ --guidance-scale 0.0 \
137
+ --seed 42
138
+
139
+ ```
examples.jpg ADDED
lcm/clip.opt/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c342a572b89967ec14697c57655f2b81d27172eed6de07b6f7ee91e3b914514
3
+ size 322531134
lcm/clip2.opt/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d85ae80d928b8a02b56572374a0faad41d5f4b82da473d973ebda9fbd89d970
3
+ size 1517189726
lcm/unetxl.opt/dbf91c42-985c-11ee-9041-0242ac110002 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b146e7cda628219dfaf6c9924716e1cb94dc6c74bbf964761da7da7929a615f9
3
+ size 5136090880
lcm/unetxl.opt/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6470c2fbe084e33c7672401c269469d06330fc51a419c8c5e24bac44d78a0ef
3
+ size 3369087
lcm/vae.opt/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f7045d54982c1bb7c8898d38a71e6fa9bbd0aaac5222fefafe49842ccb016507
3
+ size 99186612
lcmlora/clip.opt/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c342a572b89967ec14697c57655f2b81d27172eed6de07b6f7ee91e3b914514
3
+ size 322531134
lcmlora/clip2.opt/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d85ae80d928b8a02b56572374a0faad41d5f4b82da473d973ebda9fbd89d970
3
+ size 1517189726
lcmlora/unetxl-8c8ce9e8b00b259425e5f3eaa4b1d705-1.00.opt/1376e228-9608-11ee-9b07-0242ac110002 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e096cfd4fa4f0eac4b554b0d3f80356f413533c4af657fc9f9d532493813271b
3
+ size 5136090880
lcmlora/unetxl-8c8ce9e8b00b259425e5f3eaa4b1d705-1.00.opt/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44abca57a01d64c79a7fb5d14e9e043b83d1358f8116defe3147e5241a3d3936
3
+ size 3369087
lcmlora/vae.opt/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f7045d54982c1bb7c8898d38a71e6fa9bbd0aaac5222fefafe49842ccb016507
3
+ size 99186612
sdxl-1.0-base/clip.opt/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2bcd2a625e64a43bd8d78168178c1383891c540a022d7984d86974a2b4661aba
3
+ size 322531134
sdxl-1.0-base/clip2.opt/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb3e48f933c5dfe6cf8ae9d2121818d37f239215b113df3405242254bae732a2
3
+ size 1517189726
sdxl-1.0-base/unetxl.opt/435d4c0a-2d32-11ee-8476-0242c0a80101 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbdd1938d37406e9ea9889dbffdcd38f74da588fda7eb63b9351c491fd573853
3
+ size 5136090880
sdxl-1.0-base/unetxl.opt/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbd5a42d8c38934068eccc8ec08f5a63aca8eba7cda06b717dde6f3b665829bf
3
+ size 6136637
sdxl-1.0-refiner/clip2.opt/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb3e48f933c5dfe6cf8ae9d2121818d37f239215b113df3405242254bae732a2
3
+ size 1517189726
sdxl-1.0-refiner/unetxl.opt/6e186582-2d74-11ee-8aa7-0242c0a80102 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b127bd75f1b4ae08c8c4915948ba4bb76ea489fbb611102e46d9470656900d5
3
+ size 4519958016
sdxl-1.0-refiner/unetxl.opt/6ed855ee-2d70-11ee-af8e-0242c0a80101 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63cb1dfb3ccb10ef89ab20dd0edfaaae26635737586ab8ad21d97f195a8cc12b
3
+ size 847120896
sdxl-1.0-refiner/unetxl.opt/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0448a6ee66e6a46bd396f11003ff7075a53fda7bfeef43854cf2acdc894d3ba1
3
+ size 4040948