Update README.md

c3343e8 verified 10 months ago

22.5 kB

license: gfdl
language:
  - en
pipeline_tag: text-to-image
tags:
  - list
  - comparison

Pops' Stable Diffusion Speed List

A hand curated list of generation speeds for various hardware and models.

Use the ComfyUI workflow above to start testing.

Methodology

All settings use:

Euler sampler
Normal scheduler
CFG 8

OS is Arch Linux unless stated otherwise. DirectML stuff is all done on Windows 11 22H2

Raw gen times are not recorded due to variance due to steps being variable. Instead iterations per second (and the inverse of it) are given since they are independent of steps.

The given speed value (it/s or s/it) is used, and then extrapolated using the formula 1/speed to get the other value. If its under 0.01 then it will be expanded to four digits compared to the usual 2

If you can contribute to the list, do so as well. Lets make the most comprehensive, curated list of local Image Gen speeds!

Models Used

The following are the models used for testing. The models you use can be the same architecture as the tested models

SD1.5: jzli/Hassaku-1.3
SDXL: OnomaAIResearch/Illustrious-XL-v2.0
Lumina 2: neta-art/NetaLumina_Alpha Round NNNN EP6 S127716

Benchmarks

Lumina 2

1536px

Chip	it/s	s/it	Backend	App	Notes
NVIDIA RTX 4090	1.29it/s	0.78s/it	CUDA 12.9	ComfyUI	Windows 11 24H2
NVIDIA RTX 3090	0.41it/s	2.40s/it	CUDA 12.6	ComfyUI

1024px

Chip	it/s	s/it	Backend	App	Notes
NVIDIA RTX 5090	2.58it/s	0.39s/it	CUDA 12.8	ComfyUI	Windows 11 24H2
NVIDIA RTX 4090	2.22it/s	0.45s/it	CUDA 12.9	ComfyUI	Windows 11 24H2
NVIDIA RTX 3090	1.14it/s	0.87s/it	CUDA 12.9	ComfyUI	Sage Attention
NVIDIA RTX 3090	1.00it/s	1.00s/it	CUDA 12.9	ComfyUI
NVIDIA GTX 980	0.0599it/s	16.69s/it	CUDA 12.4	ComfyUI	FP32 CPU TE
AMD Ryzen 5800X	0.0102it/s	97.86s/it	CPU	ComfyUI

512px

Chip	it/s	s/it	Backend	App	Notes
NVIDIA RTX 5090	8.85it/s	0.11s/it	CUDA 12.8	ComfyUI	Windows 11 24H2
NVIDIA RTX 5090	8.04it/s	0.12s/it	CUDA 12.9	ComfyUI	Windows 11 24H2
NVIDIA RTX 3090	4.35it/s	0.23s/it	CUDA 12.9	ComfyUI	Sage Attention
NVIDIA GTX 980	0.28it/s	3.57s/it	CUDA 12.4	ComfyUI	FP8 CPU TE
NVIDIA GTX 980	0.25it/s	3.99s/it	CUDA 12.4	ComfyUI	FP32 CPU TE
AMD Ryzen 5800X	0.0649it/s	15.42s/it	CPU	ComfyUI

256px

Chip	It/s	s/it	Backend	App	Notes
NVIDIA RTX 5090	13.42it/s	0.0745s/it	CUDA 12.8	ComfyUI	Windows 11 24H2
NVIDIA RTX 4090	14.92it/s	0.067s/it	CUDA 12.9	ComfyUI	Windows 11 24H2
NVIDIA RTX 3090	11.37it/s	0.0880s/it	CUDA 12.9	ComfyUI	Sage Attention
NVIDIA GTX 980	0.78it/s	1.27s/it	CUDA 12.4	ComfyUI	FP8 CPU TE
NVIDIA GTX 980	0.59it/s	1.68s/it	CUDA 12.4	ComfyUI	FP32 CPU TE
AMD Ryzen 5800X	0.25it/s	3.98s/it	CPU	ComfyUI

SDXL

1536px

Chip	It/s	s/it	Backend	App	Notes
NVIDIA RTX 5090	3.38it/s	0.29s/it	CUDA 12.8	ComfyUI	Windows 11 24H2
NVIDIA RTX 4090	3.11it/s	0.32s/it	CUDA 12.9	ComfyUI	Windows 11 24H2
NVIDIA RTX 3090	1.63it/s	0.61s/it	CUDA 12.9	ComfyUI

1024px

Runs on 2GB of VRAM with tiled VAE.

Chip	It/s	s/it	Backend	App	Notes
NVIDIA RTX 5090	8.95it/s	0.11s/it	CUDA 12.8	ComfyUI	Windows 11 24H2
NVIDIA RTX 4090	7it/s	0.14s/it	CUDA 12.9	ComfyUI	Windows 11 24H2
NVIDIA RTX 3090	4.00it/s	0.25s/it	CUDA 12.9	ComfyUI
NVIDIA GTX 980	0.18it/s	5.35s/it	CUDA 12.4	ComfyUI
AMD Pro W5500	0.13it/s	7.35s/it	Vulkan	KoboldCPP
AMD Pro W5500	0.0699it/s	14.31s/it	DirectML	ComfyUI
AMD Ryzen 5800X	0.0365it/s	27.42s/it	CPU	ComfyUI
AMD Pro WX 4100	0.0247it/s	40.50s/it	DirectML	ComfyUI
AMD Pro W5500	0.0147it/s	68.04s/it	Vulkan	KoboldCPP	Windows 11

512px

Chip	It/s	s/it	Backend	App	Notes
NVIDIA RTX 5090	21.52it/s	0.0465s/it	CUDA 12.8	ComfyUI	Windows 11 24H2
NVIDIA RTX 4090	18.5it/s	0.05s/it	CUDA 12.9	ComfyUI	Windows 11 24H2
NVIDIA RTX 3090	12.39it/s	0.0807s/it	CUDA 12.9	ComfyUI
NVIDIA GTX 980	0.69it/s	1.45s/it	CUDA 12.4	ComfyUI
AMD Pro W5500	0.54it/s	1.85s/it	Vulkan	KoboldCPP	Windows 11
AMD Pro W5500	0.42it/s	2.38s/it	DirectML	ComfyUI
AMD Pro W5500	0.20it/s	5.06s/it	Vulkan	KoboldCPP
AMD Ryzen 5800X	0.19it/s	5.32s/it	CPU	ComfyUI
AMD HD 7790	0.11it/s	9.39s/it	DirectML	ComfyUI
AMD Pro WX 4100	0.1043it/s	9.59s/it	DirectML	ComfyUI

SD1.5

512px

Chip	It/s	s/it	Backend	App	Notes
NVIDIA RTX 3090	20.58it/s	0.0486s/it	CUDA 12.9	ComfyUI
NVIDIA GTX 980	1.59it/s	0.63it/s	CUDA 12.4	ComfyUI
AMD Pro W5500	1.01it/s	0.99s/it	Vulkan	KoboldCPP
AMD Pro W5500	0.78it/s	1.27s/it	Vulkan	KoboldCPP	Windows 11
AMD Pro W5500	0.75it/s	1.32s/it	DirectML	ComfyUI
AMD Pro WX 4100	0.24it/s	4.07s/it	DirectML	ComfyUI
AMD Pro WX 4100	0.22it/s	4.38s/it	Vulkan	KoboldCPP	Windows
AMD Ryzen 5800X	0.22it/s	4.73s/it	CPU	ComfyUI
AMD RX 550	0.0651it/s	15.35s/it	Vulkan	KoboldCPP	64 bit bus
Intel i7-3770	0.0569it/s	17.57s/it	CPU	ComfyUI
Intel i5-6300U	0.0295it/s	33.85s/it	CPU	KoboldCPP
Intel i5-4300U	0.0117it/s	85.34s/it	CPU	KoboldCPP

256px

Runable even on 1GB of VRAM!

Chip	It/s	s/it	Backend	App	Notes
NVIDIA RTX 3090	33.85it/s	0.0295s/it	CUDA 12.9	ComfyUI
NVIDIA GTX 980	4.43it/s	0.23s/it	CUDA 12.4	ComfyUI
AMD Pro W5500	3.84it/s	0.26s/it	Vulkan	KoboldCPP
AMD Pro W5500	2.84it/s	0.35s/it	Vulkan	KoboldCPP	Windows 11
AMD Pro W5500	2.05it/s	0.48s/it	DirectML	ComfyUI
AMD Ryzen 5800X	1.02it/s	0.98s/it	CPU	ComfyUI
AMD Pro WX 4100	0.71it/s	1.41s/it	Vulkan	KoboldCPP	Windows 11
AMD Pro WX 4100	0.66it/s	1.50s/it	DirectML	ComfyUI
AMD HD 7790	0.60it/s	1.65s/it	DirectML	ComfyUI
AMD HD 7750	0.48it/s	2.08s/it	DirectML	ComfyUI
Intel i7-4790K	0.40it/s	2.46s/it	CPU	ComfyUI
Intel i7-3770	0.26it/s	3.84s/it	CPU	ComfyUI
Intel i5-6300U	0.14it/s	6.98s/it	CPU	KoboldCPP
Intel i7-3770	0.0316it/s	31.68s/it	CPU	KoboldCPP Old CPU
Intel Core 2 Quad Q9300	0.0081it/s	123.34s/it	CPU	KoboldCPP Failsafe
Intel Core 2 Duo T9300	0.0049it/s	204.41s/it	CPU	KoboldCPP Failsafe

How do I make my gens faster?

Use simple samplers such as Euler instead of double step ones such as DPM 2M
Lower your image sizes. SDXL can work coherently down to 384px and SD1.5 can go down to 128px.
Use addons such as TeaCache
Use low step LoRAs such as DMD2
As a last resort, disable CFG by setting your CFG to 1. This will disable your negative prompt but also increase your speeds drastically. This will also severely affect your output quality
Upgrade your potato with a new GPU if all else fails.