File size: 6,124 Bytes
795ffd0
d6c8c90
795ffd0
d6c8c90
 
 
 
 
795ffd0
d6c8c90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
795ffd0
 
d6c8c90
de91068
d6c8c90
795ffd0
d6c8c90
795ffd0
 
 
d6c8c90
 
 
795ffd0
d6c8c90
 
795ffd0
d6c8c90
 
de91068
 
 
d6c8c90
795ffd0
d6c8c90
 
 
 
 
 
 
 
 
795ffd0
 
 
d6c8c90
795ffd0
d6c8c90
de91068
d6c8c90
 
 
 
 
 
de91068
d6c8c90
795ffd0
d6c8c90
795ffd0
d6c8c90
 
 
 
 
795ffd0
d6c8c90
795ffd0
d6c8c90
 
 
 
795ffd0
 
 
d6c8c90
795ffd0
d6c8c90
795ffd0
d6c8c90
 
795ffd0
d6c8c90
 
 
 
 
795ffd0
d6c8c90
795ffd0
d6c8c90
 
 
 
795ffd0
d6c8c90
795ffd0
d6c8c90
 
 
 
 
 
 
 
 
 
795ffd0
d6c8c90
 
 
 
 
 
 
 
 
795ffd0
 
d6c8c90
795ffd0
d6c8c90
 
 
 
 
 
795ffd0
d6c8c90
795ffd0
d6c8c90
795ffd0
d6c8c90
 
 
 
 
 
 
 
795ffd0
 
 
d6c8c90
 
 
795ffd0
d6c8c90
 
 
 
 
 
 
 
 
795ffd0
 
 
d6c8c90
795ffd0
d6c8c90
 
795ffd0
 
 
 
 
d6c8c90
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
license: mit
language:
- pt
- en
- es
- fr
- ar
tags:
- vulkan
- amd
- rx580
- local-ai
- llama-cpp
- stable-diffusion
- gguf
- flux
- openwebui
- polaris
- gcn4
- hardware-revival
- windows
- wsl2
pretty_name: RX 580 Local AI  Complete Stack (AIVisionsLab)
---

# RX 580 Local AI — Complete Stack

**AIVisionsLab Studios** · São Paulo, Brazil 🇧🇷

> Running SOTA AI on 2017 hardware in 2026. No CUDA. No ROCm. No cloud.

---

## What this is

This repository documents the complete stack for running local AI on an **AMD RX 580 8GB** using the **Vulkan API** as the GPU backend — bypassing the need for CUDA or ROCm entirely.

AMD officially dropped ROCm support for Polaris/GCN4 in v5.x. DirectML failed. OpenVINO failed.  
This project proves the hardware is still capable — the problem was always the software stack, not the GPU.

**Full master documentation (PT/EN/ES/FR/AR):**  
🌐 [setup-ia-local-rx580-vulkan.web.app](https://setup-ia-local-rx580-vulkan.web.app/)

---

## Hardware

| Component | Spec |
|-----------|------|
| GPU | AMD RX 580 **2048SP** 8GB GDDR5 (Polaris / GCN4) |
| CPU | Intel Xeon **E5-2690 v3** — 12c/24t · 3.5GHz boost (2014) |
| RAM | **32GB DDR4 REG ECC** Quad Channel RDIMM |
| Storage | **NVMe 1TB** — 1.7–3.5 GB/s (critical bottleneck) |
| OS | Windows 10 Pro + WSL2 Ubuntu 22.04.5 |
| Vulkan SDK | 1.4.341.1 |
| AMD Driver | 31.0.21924.61 |

---

## Performance (real logs, not synthetic benchmarks)

### LLM — llama.cpp with Vulkan

| Model | Quantization | Speed | VRAM |
|-------|-------------|-------|------|
| Mistral 7B Instruct | Q4_K_M | **~9 tok/s** | ~6GB |
| Llama 3 8B Instruct | Q4_K_M | **~7 tok/s** | ~6.8GB |
| Qwen2.5 7B | Q4_K_M | **~8 tok/s** | ~6.2GB |
| DeepSeek R1 8B | Q4_K_M | **~7 tok/s** | ~6.8GB |

> CPU baseline (Xeon, no GPU): 3–5 tok/s. Vulkan uplift: **3–4×**

### Image Generation — stable-diffusion.cpp with Vulkan

| Model | Resolution | Steps | Time | Backend |
|-------|------------|-------|------|---------|
| DreamShaper 8 (SD 1.5 GGUF) | 512×512 | 20 | **~72s** | RX 580 Vulkan |
| FLUX.1 Schnell q4_k | 1024×1024 | 4 | **~14 min** | GPU+CPU hybrid |
| FLUX.1 Schnell fp8 (16GB) | 1024×1024 | 4 | **~24 min** | Xeon CPU / WSL2 |

### Storage impact

| Operation | HDD | NVMe | Improvement |
|-----------|-----|------|-------------|
| LLM 7B load | ~25 min | **~4 min** | 6× faster |
| FLUX 16GB load | ~25 min | **~30s** | **50× faster** |

---

## Models used

### For sd-server (stable-diffusion.cpp)

> ⚠️ **Critical:** Only use **leejet** GGUF models for sd-server.  
> city96 GGUF models are ComfyUI-only. Using them returns `new_sd_ctx_t failed`.

| Model | Source | Use |
|-------|--------|-----|
| `flux1-schnell-q4_k.gguf` | [leejet/FLUX.1-schnell-gguf](https://huggingface.co/leejet/FLUX.1-schnell-gguf) | FLUX GPU hybrid |
| `flux1-schnell-Q3_K_S.gguf` | [leejet/FLUX.1-schnell-gguf](https://huggingface.co/leejet/FLUX.1-schnell-gguf) | FLUX lighter (~5.2GB) |
| `DreamShaper_8.safetensors` | Civitai | SD 1.5 production |

### For ComfyUI (city96 compatible)

| Model | Source | Use |
|-------|--------|-----|
| `flux1-schnell-Q4_K_S.gguf` | [city96/FLUX.1-schnell-gguf](https://huggingface.co/city96/FLUX.1-schnell-gguf) | ComfyUI only |
| `flux1-schnell-fp8.safetensors` | Comfy-Org | Full 16GB CPU |

### VAE / CLIP / T5XXL (required for FLUX)

| File | Purpose | RAM allocation |
|------|---------|----------------|
| `ae.safetensors` | VAE decoder | ~160MB CPU |
| `clip_l.safetensors` | CLIP encoder | ~235MB GPU |
| `t5xxl_fp16.safetensors` | T5 encoder | ~9.3GB CPU |
| `t5xxl_fp8.safetensors` | T5 encoder (lighter) | ~5GB CPU |

---

## Architecture

```
OpenWebUI (Docker :3000)

        ├──► LLM: llama-server.exe (:8081) — RX 580 Vulkan
        │         └── fallback: Ollama (:11434) — CPU

        └──► Images:
              ├──► SD 1.5 GGUF: sd-server.exe (:7860) — RX 580 Vulkan
              └──► FLUX.1 16GB: ComfyUI (:8188) — Xeon CPU WSL2
```

### FLUX memory segmentation

| Component | File | Allocation | Size |
|-----------|------|------------|------|
| Diffusion model | flux1-schnell-q4_k.gguf | **GPU VRAM** | ~6.5GB |
| VAE | ae.safetensors | **CPU RAM** | ~160MB |
| CLIP L | clip_l.safetensors | **GPU VRAM** | ~235MB |
| T5XXL | t5xxl_fp16.safetensors | **CPU RAM** | ~9.3GB |

---

## What failed (documented with root cause)

| Attempt | Error | Root cause |
|---------|-------|------------|
| DirectML | `OpaqueTensorImpl` | MS encapsulates tensors — ComfyUI can't read them |
| ROCm | Kernel panics | GCN4/Polaris dropped in v5.x — permanent |
| OpenVINO + Forge | `No module 'ldm'` | Extension targets A1111 — incompatible with Forge |
| CPU + HDD | ~19 min/image | Zero GPU utilization + I/O bottleneck |

Full analysis: [docs/what-failed.md](https://github.com/aivisionslab-studios/rx580-local-ai-guide/blob/main/docs/what-failed.md)

---

## Community & Credits

This work builds on independent research from:

| Author | Publication | Contribution |
|--------|-------------|-------------|
| [艾米心 Amihart](https://medium.com/@amihart) | Medium, Jan 2025 | First validation of LLMs via Vulkan on RX 580 — 24.56 tok/s |
| [DH / DadHacks](https://dadhacks.org/2025/12/05/ai-image-generation-on-rx-580-using-vulkan-a-cost-effective-solution/) | dadhacks.org, Dec 2025 | Refuted "SD can't run on Vulkan" — sd.cpp Linux guide |
| [leejet](https://github.com/leejet/stable-diffusion.cpp) | GitHub | stable-diffusion.cpp engine |
| [ggerganov](https://github.com/ggerganov/llama.cpp) | GitHub | llama.cpp + ggml engine |
| [woodrex](https://hub.docker.com/r/woodrex/sd-webui-for-gfx803) | Docker Hub | ROCm gfx803 containers |

> *"The hardware was never obsolete. It was waiting for the right software."*

---

## GitHub

📦 [aivisionslab-studios/rx580-local-ai-guide](https://github.com/aivisionslab-studios/rx580-local-ai-guide)  
Scripts, build guides, automation, troubleshooting docs.

---

## License

MIT — use freely, give credit, document what you learn.