license: gfdl
language:
- en
pipeline_tag: text-to-image
tags:
- list
- comparison
Pops' Stable Diffusion Speed List
A hand curated list of generation speeds for various hardware and models.
Use the ComfyUI workflow above to start testing.
Methodology
All settings use:
- Euler sampler
- Normal scheduler
- CFG 8
- SD1.5: jzli/Hassaku-1.3
- SDXL: OnomaAIResearch/Illustrious-XL-v2.0
- Lumina 2: neta-art/NetaLumina_Alpha Round NNNN EP6 S127716
OS is Arch Linux unless stated otherwise. DirectML stuff is all done on Windows 11 22H2
Raw gen times are not recorded due to variance due to steps being variable. Instead iterations per second (and the inverse of it) are given since they are independent of steps.
The given speed value (it/s or s/it) is used, and then extrapolated using the formula 1/speed to get the other value. If its under 0.01 then it will be expanded to four digits compared to the usual 2
If you can contribute to the list, do so as well. Lets make the most comprehensive, curated list of local Image Gen speeds!
Models Used
The following are the models used for testing. The models you use can be the same architecture as the tested models
Benchmarks
Lumina 2
1536px
| Chip | it/s | s/it | Backend | App | Notes |
|---|---|---|---|---|---|
| NVIDIA RTX 4090 | 1.29it/s | 0.78s/it | CUDA 12.9 | ComfyUI | Windows 11 24H2 |
| NVIDIA RTX 3090 | 0.41it/s | 2.40s/it | CUDA 12.6 | ComfyUI |
1024px
| Chip | it/s | s/it | Backend | App | Notes |
|---|---|---|---|---|---|
| NVIDIA RTX 5090 | 2.58it/s | 0.39s/it | CUDA 12.8 | ComfyUI | Windows 11 24H2 |
| NVIDIA RTX 4090 | 2.22it/s | 0.45s/it | CUDA 12.9 | ComfyUI | Windows 11 24H2 |
| NVIDIA RTX 3090 | 1.14it/s | 0.87s/it | CUDA 12.9 | ComfyUI | Sage Attention |
| NVIDIA RTX 3090 | 1.00it/s | 1.00s/it | CUDA 12.9 | ComfyUI | |
| NVIDIA GTX 980 | 0.0599it/s | 16.69s/it | CUDA 12.4 | ComfyUI | FP32 CPU TE |
| AMD Ryzen 5800X | 0.0102it/s | 97.86s/it | CPU | ComfyUI |
512px
| Chip | it/s | s/it | Backend | App | Notes |
| NVIDIA RTX 5090 | 8.85it/s | 0.11s/it | CUDA 12.8 | ComfyUI | Windows 11 24H2 |
| NVIDIA RTX 5090 | 8.04it/s | 0.12s/it | CUDA 12.9 | ComfyUI | Windows 11 24H2 |
| NVIDIA RTX 3090 | 4.35it/s | 0.23s/it | CUDA 12.9 | ComfyUI | Sage Attention |
| NVIDIA GTX 980 | 0.28it/s | 3.57s/it | CUDA 12.4 | ComfyUI | FP8 CPU TE |
| NVIDIA GTX 980 | 0.25it/s | 3.99s/it | CUDA 12.4 | ComfyUI | FP32 CPU TE |
| AMD Ryzen 5800X | 0.0649it/s | 15.42s/it | CPU | ComfyUI |
256px
| Chip | It/s | s/it | Backend | App | Notes |
| NVIDIA RTX 5090 | 13.42it/s | 0.0745s/it | CUDA 12.8 | ComfyUI | Windows 11 24H2 |
| NVIDIA RTX 4090 | 14.92it/s | 0.067s/it | CUDA 12.9 | ComfyUI | Windows 11 24H2 |
| NVIDIA RTX 3090 | 11.37it/s | 0.0880s/it | CUDA 12.9 | ComfyUI | Sage Attention |
| NVIDIA GTX 980 | 0.78it/s | 1.27s/it | CUDA 12.4 | ComfyUI | FP8 CPU TE |
| NVIDIA GTX 980 | 0.59it/s | 1.68s/it | CUDA 12.4 | ComfyUI | FP32 CPU TE |
| AMD Ryzen 5800X | 0.25it/s | 3.98s/it | CPU | ComfyUI |
SDXL
1536px
| Chip | It/s | s/it | Backend | App | Notes |
| NVIDIA RTX 5090 | 3.38it/s | 0.29s/it | CUDA 12.8 | ComfyUI | Windows 11 24H2 |
| NVIDIA RTX 4090 | 3.11it/s | 0.32s/it | CUDA 12.9 | ComfyUI | Windows 11 24H2 |
| NVIDIA RTX 3090 | 1.63it/s | 0.61s/it | CUDA 12.9 | ComfyUI |
Runs on 2GB of VRAM with tiled VAE.
| Chip | It/s | s/it | Backend | App | Notes |
| NVIDIA RTX 5090 | 8.95it/s | 0.11s/it | CUDA 12.8 | ComfyUI | Windows 11 24H2 |
| NVIDIA RTX 4090 | 7it/s | 0.14s/it | CUDA 12.9 | ComfyUI | Windows 11 24H2 |
| NVIDIA RTX 3090 | 4.00it/s | 0.25s/it | CUDA 12.9 | ComfyUI | |
| NVIDIA GTX 980 | 0.18it/s | 5.35s/it | CUDA 12.4 | ComfyUI | |
| AMD Pro W5500 | 0.13it/s | 7.35s/it | Vulkan | KoboldCPP | |
| AMD Pro W5500 | 0.0699it/s | 14.31s/it | DirectML | ComfyUI | |
| AMD Ryzen 5800X | 0.0365it/s | 27.42s/it | CPU | ComfyUI | |
| AMD Pro WX 4100 | 0.0247it/s | 40.50s/it | DirectML | ComfyUI | |
| AMD Pro W5500 | 0.0147it/s | 68.04s/it | Vulkan | KoboldCPP | Windows 11 |
512px
| Chip | It/s | s/it | Backend | App | Notes |
| NVIDIA RTX 5090 | 21.52it/s | 0.0465s/it | CUDA 12.8 | ComfyUI | Windows 11 24H2 |
| NVIDIA RTX 4090 | 18.5it/s | 0.05s/it | CUDA 12.9 | ComfyUI | Windows 11 24H2 |
| NVIDIA RTX 3090 | 12.39it/s | 0.0807s/it | CUDA 12.9 | ComfyUI | |
| NVIDIA GTX 980 | 0.69it/s | 1.45s/it | CUDA 12.4 | ComfyUI | |
| AMD Pro W5500 | 0.54it/s | 1.85s/it | Vulkan | KoboldCPP | Windows 11 |
| AMD Pro W5500 | 0.42it/s | 2.38s/it | DirectML | ComfyUI | |
| AMD Pro W5500 | 0.20it/s | 5.06s/it | Vulkan | KoboldCPP | |
| AMD Ryzen 5800X | 0.19it/s | 5.32s/it | CPU | ComfyUI | |
| AMD HD 7790 | 0.11it/s | 9.39s/it | DirectML | ComfyUI | |
| AMD Pro WX 4100 | 0.1043it/s | 9.59s/it | DirectML | ComfyUI |
SD1.5
512px
| Chip | It/s | s/it | Backend | App | Notes |
| NVIDIA RTX 3090 | 20.58it/s | 0.0486s/it | CUDA 12.9 | ComfyUI | |
| NVIDIA GTX 980 | 1.59it/s | 0.63it/s | CUDA 12.4 | ComfyUI | |
| AMD Pro W5500 | 1.01it/s | 0.99s/it | Vulkan | KoboldCPP | |
| AMD Pro W5500 | 0.78it/s | 1.27s/it | Vulkan | KoboldCPP | Windows 11 |
| AMD Pro W5500 | 0.75it/s | 1.32s/it | DirectML | ComfyUI | |
| AMD Pro WX 4100 | 0.24it/s | 4.07s/it | DirectML | ComfyUI | |
| AMD Pro WX 4100 | 0.22it/s | 4.38s/it | Vulkan | KoboldCPP | Windows |
| AMD Ryzen 5800X | 0.22it/s | 4.73s/it | CPU | ComfyUI | |
| AMD RX 550 | 0.0651it/s | 15.35s/it | Vulkan | KoboldCPP | 64 bit bus |
| Intel i7-3770 | 0.0569it/s | 17.57s/it | CPU | ComfyUI | |
| Intel i5-6300U | 0.0295it/s | 33.85s/it | CPU | KoboldCPP | |
| Intel i5-4300U | 0.0117it/s | 85.34s/it | CPU | KoboldCPP |
256px
Runable even on 1GB of VRAM!
| Chip | It/s | s/it | Backend | App | Notes |
| NVIDIA RTX 3090 | 33.85it/s | 0.0295s/it | CUDA 12.9 | ComfyUI | |
| NVIDIA GTX 980 | 4.43it/s | 0.23s/it | CUDA 12.4 | ComfyUI | |
| AMD Pro W5500 | 3.84it/s | 0.26s/it | Vulkan | KoboldCPP | |
| AMD Pro W5500 | 2.84it/s | 0.35s/it | Vulkan | KoboldCPP | Windows 11 |
| AMD Pro W5500 | 2.05it/s | 0.48s/it | DirectML | ComfyUI | |
| AMD Ryzen 5800X | 1.02it/s | 0.98s/it | CPU | ComfyUI | |
| AMD Pro WX 4100 | 0.71it/s | 1.41s/it | Vulkan | KoboldCPP | Windows 11 |
| AMD Pro WX 4100 | 0.66it/s | 1.50s/it | DirectML | ComfyUI | |
| AMD HD 7790 | 0.60it/s | 1.65s/it | DirectML | ComfyUI | |
| AMD HD 7750 | 0.48it/s | 2.08s/it | DirectML | ComfyUI | |
| Intel i7-4790K | 0.40it/s | 2.46s/it | CPU | ComfyUI | |
| Intel i7-3770 | 0.26it/s | 3.84s/it | CPU | ComfyUI | |
| Intel i5-6300U | 0.14it/s | 6.98s/it | CPU | KoboldCPP | |
| Intel i7-3770 | 0.0316it/s | 31.68s/it | CPU | KoboldCPP Old CPU | |
| Intel Core 2 Quad Q9300 | 0.0081it/s | 123.34s/it | CPU | KoboldCPP Failsafe | |
| Intel Core 2 Duo T9300 | 0.0049it/s | 204.41s/it | CPU | KoboldCPP Failsafe |
How do I make my gens faster?
- Use simple samplers such as Euler instead of double step ones such as DPM 2M
- Lower your image sizes. SDXL can work coherently down to 384px and SD1.5 can go down to 128px.
- Use addons such as TeaCache
- Use low step LoRAs such as DMD2
- As a last resort, disable CFG by setting your CFG to 1. This will disable your negative prompt but also increase your speeds drastically. This will also severely affect your output quality
- Upgrade your potato with a new GPU if all else fails.