File size: 6,124 Bytes
795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 de91068 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 de91068 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 de91068 d6c8c90 de91068 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 795ffd0 d6c8c90 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | ---
license: mit
language:
- pt
- en
- es
- fr
- ar
tags:
- vulkan
- amd
- rx580
- local-ai
- llama-cpp
- stable-diffusion
- gguf
- flux
- openwebui
- polaris
- gcn4
- hardware-revival
- windows
- wsl2
pretty_name: RX 580 Local AI — Complete Stack (AIVisionsLab)
---
# RX 580 Local AI — Complete Stack
**AIVisionsLab Studios** · São Paulo, Brazil 🇧🇷
> Running SOTA AI on 2017 hardware in 2026. No CUDA. No ROCm. No cloud.
---
## What this is
This repository documents the complete stack for running local AI on an **AMD RX 580 8GB** using the **Vulkan API** as the GPU backend — bypassing the need for CUDA or ROCm entirely.
AMD officially dropped ROCm support for Polaris/GCN4 in v5.x. DirectML failed. OpenVINO failed.
This project proves the hardware is still capable — the problem was always the software stack, not the GPU.
**Full master documentation (PT/EN/ES/FR/AR):**
🌐 [setup-ia-local-rx580-vulkan.web.app](https://setup-ia-local-rx580-vulkan.web.app/)
---
## Hardware
| Component | Spec |
|-----------|------|
| GPU | AMD RX 580 **2048SP** 8GB GDDR5 (Polaris / GCN4) |
| CPU | Intel Xeon **E5-2690 v3** — 12c/24t · 3.5GHz boost (2014) |
| RAM | **32GB DDR4 REG ECC** Quad Channel RDIMM |
| Storage | **NVMe 1TB** — 1.7–3.5 GB/s (critical bottleneck) |
| OS | Windows 10 Pro + WSL2 Ubuntu 22.04.5 |
| Vulkan SDK | 1.4.341.1 |
| AMD Driver | 31.0.21924.61 |
---
## Performance (real logs, not synthetic benchmarks)
### LLM — llama.cpp with Vulkan
| Model | Quantization | Speed | VRAM |
|-------|-------------|-------|------|
| Mistral 7B Instruct | Q4_K_M | **~9 tok/s** | ~6GB |
| Llama 3 8B Instruct | Q4_K_M | **~7 tok/s** | ~6.8GB |
| Qwen2.5 7B | Q4_K_M | **~8 tok/s** | ~6.2GB |
| DeepSeek R1 8B | Q4_K_M | **~7 tok/s** | ~6.8GB |
> CPU baseline (Xeon, no GPU): 3–5 tok/s. Vulkan uplift: **3–4×**
### Image Generation — stable-diffusion.cpp with Vulkan
| Model | Resolution | Steps | Time | Backend |
|-------|------------|-------|------|---------|
| DreamShaper 8 (SD 1.5 GGUF) | 512×512 | 20 | **~72s** | RX 580 Vulkan |
| FLUX.1 Schnell q4_k | 1024×1024 | 4 | **~14 min** | GPU+CPU hybrid |
| FLUX.1 Schnell fp8 (16GB) | 1024×1024 | 4 | **~24 min** | Xeon CPU / WSL2 |
### Storage impact
| Operation | HDD | NVMe | Improvement |
|-----------|-----|------|-------------|
| LLM 7B load | ~25 min | **~4 min** | 6× faster |
| FLUX 16GB load | ~25 min | **~30s** | **50× faster** |
---
## Models used
### For sd-server (stable-diffusion.cpp)
> ⚠️ **Critical:** Only use **leejet** GGUF models for sd-server.
> city96 GGUF models are ComfyUI-only. Using them returns `new_sd_ctx_t failed`.
| Model | Source | Use |
|-------|--------|-----|
| `flux1-schnell-q4_k.gguf` | [leejet/FLUX.1-schnell-gguf](https://huggingface.co/leejet/FLUX.1-schnell-gguf) | FLUX GPU hybrid |
| `flux1-schnell-Q3_K_S.gguf` | [leejet/FLUX.1-schnell-gguf](https://huggingface.co/leejet/FLUX.1-schnell-gguf) | FLUX lighter (~5.2GB) |
| `DreamShaper_8.safetensors` | Civitai | SD 1.5 production |
### For ComfyUI (city96 compatible)
| Model | Source | Use |
|-------|--------|-----|
| `flux1-schnell-Q4_K_S.gguf` | [city96/FLUX.1-schnell-gguf](https://huggingface.co/city96/FLUX.1-schnell-gguf) | ComfyUI only |
| `flux1-schnell-fp8.safetensors` | Comfy-Org | Full 16GB CPU |
### VAE / CLIP / T5XXL (required for FLUX)
| File | Purpose | RAM allocation |
|------|---------|----------------|
| `ae.safetensors` | VAE decoder | ~160MB CPU |
| `clip_l.safetensors` | CLIP encoder | ~235MB GPU |
| `t5xxl_fp16.safetensors` | T5 encoder | ~9.3GB CPU |
| `t5xxl_fp8.safetensors` | T5 encoder (lighter) | ~5GB CPU |
---
## Architecture
```
OpenWebUI (Docker :3000)
│
├──► LLM: llama-server.exe (:8081) — RX 580 Vulkan
│ └── fallback: Ollama (:11434) — CPU
│
└──► Images:
├──► SD 1.5 GGUF: sd-server.exe (:7860) — RX 580 Vulkan
└──► FLUX.1 16GB: ComfyUI (:8188) — Xeon CPU WSL2
```
### FLUX memory segmentation
| Component | File | Allocation | Size |
|-----------|------|------------|------|
| Diffusion model | flux1-schnell-q4_k.gguf | **GPU VRAM** | ~6.5GB |
| VAE | ae.safetensors | **CPU RAM** | ~160MB |
| CLIP L | clip_l.safetensors | **GPU VRAM** | ~235MB |
| T5XXL | t5xxl_fp16.safetensors | **CPU RAM** | ~9.3GB |
---
## What failed (documented with root cause)
| Attempt | Error | Root cause |
|---------|-------|------------|
| DirectML | `OpaqueTensorImpl` | MS encapsulates tensors — ComfyUI can't read them |
| ROCm | Kernel panics | GCN4/Polaris dropped in v5.x — permanent |
| OpenVINO + Forge | `No module 'ldm'` | Extension targets A1111 — incompatible with Forge |
| CPU + HDD | ~19 min/image | Zero GPU utilization + I/O bottleneck |
Full analysis: [docs/what-failed.md](https://github.com/aivisionslab-studios/rx580-local-ai-guide/blob/main/docs/what-failed.md)
---
## Community & Credits
This work builds on independent research from:
| Author | Publication | Contribution |
|--------|-------------|-------------|
| [艾米心 Amihart](https://medium.com/@amihart) | Medium, Jan 2025 | First validation of LLMs via Vulkan on RX 580 — 24.56 tok/s |
| [DH / DadHacks](https://dadhacks.org/2025/12/05/ai-image-generation-on-rx-580-using-vulkan-a-cost-effective-solution/) | dadhacks.org, Dec 2025 | Refuted "SD can't run on Vulkan" — sd.cpp Linux guide |
| [leejet](https://github.com/leejet/stable-diffusion.cpp) | GitHub | stable-diffusion.cpp engine |
| [ggerganov](https://github.com/ggerganov/llama.cpp) | GitHub | llama.cpp + ggml engine |
| [woodrex](https://hub.docker.com/r/woodrex/sd-webui-for-gfx803) | Docker Hub | ROCm gfx803 containers |
> *"The hardware was never obsolete. It was waiting for the right software."*
---
## GitHub
📦 [aivisionslab-studios/rx580-local-ai-guide](https://github.com/aivisionslab-studios/rx580-local-ai-guide)
Scripts, build guides, automation, troubleshooting docs.
---
## License
MIT — use freely, give credit, document what you learn.
|