| --- |
| license: mit |
| language: |
| - pt |
| - en |
| - es |
| - fr |
| - ar |
| tags: |
| - vulkan |
| - amd |
| - rx580 |
| - local-ai |
| - llama-cpp |
| - stable-diffusion |
| - gguf |
| - flux |
| - openwebui |
| - polaris |
| - gcn4 |
| - hardware-revival |
| - windows |
| - wsl2 |
| pretty_name: RX 580 Local AI — Complete Stack (AIVisionsLab) |
| --- |
| |
| # RX 580 Local AI — Complete Stack |
|
|
| **AIVisionsLab Studios** · São Paulo, Brazil 🇧🇷 |
|
|
| > Running SOTA AI on 2017 hardware in 2026. No CUDA. No ROCm. No cloud. |
|
|
| --- |
|
|
| ## What this is |
|
|
| This repository documents the complete stack for running local AI on an **AMD RX 580 8GB** using the **Vulkan API** as the GPU backend — bypassing the need for CUDA or ROCm entirely. |
|
|
| AMD officially dropped ROCm support for Polaris/GCN4 in v5.x. DirectML failed. OpenVINO failed. |
| This project proves the hardware is still capable — the problem was always the software stack, not the GPU. |
|
|
| **Full master documentation (PT/EN/ES/FR/AR):** |
| 🌐 [setup-ia-local-rx580-vulkan.web.app](https://setup-ia-local-rx580-vulkan.web.app/) |
|
|
| --- |
|
|
| ## Hardware |
|
|
| | Component | Spec | |
| |-----------|------| |
| | GPU | AMD RX 580 **2048SP** 8GB GDDR5 (Polaris / GCN4) | |
| | CPU | Intel Xeon **E5-2690 v3** — 12c/24t · 3.5GHz boost (2014) | |
| | RAM | **32GB DDR4 REG ECC** Quad Channel RDIMM | |
| | Storage | **NVMe 1TB** — 1.7–3.5 GB/s (critical bottleneck) | |
| | OS | Windows 10 Pro + WSL2 Ubuntu 22.04.5 | |
| | Vulkan SDK | 1.4.341.1 | |
| | AMD Driver | 31.0.21924.61 | |
|
|
| --- |
|
|
| ## Performance (real logs, not synthetic benchmarks) |
|
|
| ### LLM — llama.cpp with Vulkan |
|
|
| | Model | Quantization | Speed | VRAM | |
| |-------|-------------|-------|------| |
| | Mistral 7B Instruct | Q4_K_M | **~9 tok/s** | ~6GB | |
| | Llama 3 8B Instruct | Q4_K_M | **~7 tok/s** | ~6.8GB | |
| | Qwen2.5 7B | Q4_K_M | **~8 tok/s** | ~6.2GB | |
| | DeepSeek R1 8B | Q4_K_M | **~7 tok/s** | ~6.8GB | |
|
|
| > CPU baseline (Xeon, no GPU): 3–5 tok/s. Vulkan uplift: **3–4×** |
|
|
| ### Image Generation — stable-diffusion.cpp with Vulkan |
|
|
| | Model | Resolution | Steps | Time | Backend | |
| |-------|------------|-------|------|---------| |
| | DreamShaper 8 (SD 1.5 GGUF) | 512×512 | 20 | **~72s** | RX 580 Vulkan | |
| | FLUX.1 Schnell q4_k | 1024×1024 | 4 | **~14 min** | GPU+CPU hybrid | |
| | FLUX.1 Schnell fp8 (16GB) | 1024×1024 | 4 | **~24 min** | Xeon CPU / WSL2 | |
| |
| ### Storage impact |
| |
| | Operation | HDD | NVMe | Improvement | |
| |-----------|-----|------|-------------| |
| | LLM 7B load | ~25 min | **~4 min** | 6× faster | |
| | FLUX 16GB load | ~25 min | **~30s** | **50× faster** | |
| |
| --- |
| |
| ## Models used |
| |
| ### For sd-server (stable-diffusion.cpp) |
| |
| > ⚠️ **Critical:** Only use **leejet** GGUF models for sd-server. |
| > city96 GGUF models are ComfyUI-only. Using them returns `new_sd_ctx_t failed`. |
|
|
| | Model | Source | Use | |
| |-------|--------|-----| |
| | `flux1-schnell-q4_k.gguf` | [leejet/FLUX.1-schnell-gguf](https://huggingface.co/leejet/FLUX.1-schnell-gguf) | FLUX GPU hybrid | |
| | `flux1-schnell-Q3_K_S.gguf` | [leejet/FLUX.1-schnell-gguf](https://huggingface.co/leejet/FLUX.1-schnell-gguf) | FLUX lighter (~5.2GB) | |
| | `DreamShaper_8.safetensors` | Civitai | SD 1.5 production | |
|
|
| ### For ComfyUI (city96 compatible) |
|
|
| | Model | Source | Use | |
| |-------|--------|-----| |
| | `flux1-schnell-Q4_K_S.gguf` | [city96/FLUX.1-schnell-gguf](https://huggingface.co/city96/FLUX.1-schnell-gguf) | ComfyUI only | |
| | `flux1-schnell-fp8.safetensors` | Comfy-Org | Full 16GB CPU | |
|
|
| ### VAE / CLIP / T5XXL (required for FLUX) |
|
|
| | File | Purpose | RAM allocation | |
| |------|---------|----------------| |
| | `ae.safetensors` | VAE decoder | ~160MB CPU | |
| | `clip_l.safetensors` | CLIP encoder | ~235MB GPU | |
| | `t5xxl_fp16.safetensors` | T5 encoder | ~9.3GB CPU | |
| | `t5xxl_fp8.safetensors` | T5 encoder (lighter) | ~5GB CPU | |
|
|
| --- |
|
|
| ## Architecture |
|
|
| ``` |
| OpenWebUI (Docker :3000) |
| │ |
| ├──► LLM: llama-server.exe (:8081) — RX 580 Vulkan |
| │ └── fallback: Ollama (:11434) — CPU |
| │ |
| └──► Images: |
| ├──► SD 1.5 GGUF: sd-server.exe (:7860) — RX 580 Vulkan |
| └──► FLUX.1 16GB: ComfyUI (:8188) — Xeon CPU WSL2 |
| ``` |
|
|
| ### FLUX memory segmentation |
|
|
| | Component | File | Allocation | Size | |
| |-----------|------|------------|------| |
| | Diffusion model | flux1-schnell-q4_k.gguf | **GPU VRAM** | ~6.5GB | |
| | VAE | ae.safetensors | **CPU RAM** | ~160MB | |
| | CLIP L | clip_l.safetensors | **GPU VRAM** | ~235MB | |
| | T5XXL | t5xxl_fp16.safetensors | **CPU RAM** | ~9.3GB | |
| |
| --- |
| |
| ## What failed (documented with root cause) |
| |
| | Attempt | Error | Root cause | |
| |---------|-------|------------| |
| | DirectML | `OpaqueTensorImpl` | MS encapsulates tensors — ComfyUI can't read them | |
| | ROCm | Kernel panics | GCN4/Polaris dropped in v5.x — permanent | |
| | OpenVINO + Forge | `No module 'ldm'` | Extension targets A1111 — incompatible with Forge | |
| | CPU + HDD | ~19 min/image | Zero GPU utilization + I/O bottleneck | |
| |
| Full analysis: [docs/what-failed.md](https://github.com/aivisionslab-studios/rx580-local-ai-guide/blob/main/docs/what-failed.md) |
| |
| --- |
| |
| ## Community & Credits |
| |
| This work builds on independent research from: |
| |
| | Author | Publication | Contribution | |
| |--------|-------------|-------------| |
| | [艾米心 Amihart](https://medium.com/@amihart) | Medium, Jan 2025 | First validation of LLMs via Vulkan on RX 580 — 24.56 tok/s | |
| | [DH / DadHacks](https://dadhacks.org/2025/12/05/ai-image-generation-on-rx-580-using-vulkan-a-cost-effective-solution/) | dadhacks.org, Dec 2025 | Refuted "SD can't run on Vulkan" — sd.cpp Linux guide | |
| | [leejet](https://github.com/leejet/stable-diffusion.cpp) | GitHub | stable-diffusion.cpp engine | |
| | [ggerganov](https://github.com/ggerganov/llama.cpp) | GitHub | llama.cpp + ggml engine | |
| | [woodrex](https://hub.docker.com/r/woodrex/sd-webui-for-gfx803) | Docker Hub | ROCm gfx803 containers | |
| |
| > *"The hardware was never obsolete. It was waiting for the right software."* |
| |
| --- |
| |
| ## GitHub |
| |
| 📦 [aivisionslab-studios/rx580-local-ai-guide](https://github.com/aivisionslab-studios/rx580-local-ai-guide) |
| Scripts, build guides, automation, troubleshooting docs. |
| |
| --- |
| |
| ## License |
| |
| MIT — use freely, give credit, document what you learn. |
| |