--- license: mit language: - pt - en - es - fr - ar tags: - vulkan - amd - rx580 - local-ai - llama-cpp - stable-diffusion - gguf - flux - openwebui - polaris - gcn4 - hardware-revival - windows - wsl2 pretty_name: RX 580 Local AI โ€” Complete Stack (AIVisionsLab) --- # RX 580 Local AI โ€” Complete Stack **AIVisionsLab Studios** ยท Sรฃo Paulo, Brazil ๐Ÿ‡ง๐Ÿ‡ท > Running SOTA AI on 2017 hardware in 2026. No CUDA. No ROCm. No cloud. --- ## What this is This repository documents the complete stack for running local AI on an **AMD RX 580 8GB** using the **Vulkan API** as the GPU backend โ€” bypassing the need for CUDA or ROCm entirely. AMD officially dropped ROCm support for Polaris/GCN4 in v5.x. DirectML failed. OpenVINO failed. This project proves the hardware is still capable โ€” the problem was always the software stack, not the GPU. **Full master documentation (PT/EN/ES/FR/AR):** ๐ŸŒ [setup-ia-local-rx580-vulkan.web.app](https://setup-ia-local-rx580-vulkan.web.app/) --- ## Hardware | Component | Spec | |-----------|------| | GPU | AMD RX 580 **2048SP** 8GB GDDR5 (Polaris / GCN4) | | CPU | Intel Xeon **E5-2690 v3** โ€” 12c/24t ยท 3.5GHz boost (2014) | | RAM | **32GB DDR4 REG ECC** Quad Channel RDIMM | | Storage | **NVMe 1TB** โ€” 1.7โ€“3.5 GB/s (critical bottleneck) | | OS | Windows 10 Pro + WSL2 Ubuntu 22.04.5 | | Vulkan SDK | 1.4.341.1 | | AMD Driver | 31.0.21924.61 | --- ## Performance (real logs, not synthetic benchmarks) ### LLM โ€” llama.cpp with Vulkan | Model | Quantization | Speed | VRAM | |-------|-------------|-------|------| | Mistral 7B Instruct | Q4_K_M | **~9 tok/s** | ~6GB | | Llama 3 8B Instruct | Q4_K_M | **~7 tok/s** | ~6.8GB | | Qwen2.5 7B | Q4_K_M | **~8 tok/s** | ~6.2GB | | DeepSeek R1 8B | Q4_K_M | **~7 tok/s** | ~6.8GB | > CPU baseline (Xeon, no GPU): 3โ€“5 tok/s. Vulkan uplift: **3โ€“4ร—** ### Image Generation โ€” stable-diffusion.cpp with Vulkan | Model | Resolution | Steps | Time | Backend | |-------|------------|-------|------|---------| | DreamShaper 8 (SD 1.5 GGUF) | 512ร—512 | 20 | **~72s** | RX 580 Vulkan | | FLUX.1 Schnell q4_k | 1024ร—1024 | 4 | **~14 min** | GPU+CPU hybrid | | FLUX.1 Schnell fp8 (16GB) | 1024ร—1024 | 4 | **~24 min** | Xeon CPU / WSL2 | ### Storage impact | Operation | HDD | NVMe | Improvement | |-----------|-----|------|-------------| | LLM 7B load | ~25 min | **~4 min** | 6ร— faster | | FLUX 16GB load | ~25 min | **~30s** | **50ร— faster** | --- ## Models used ### For sd-server (stable-diffusion.cpp) > โš ๏ธ **Critical:** Only use **leejet** GGUF models for sd-server. > city96 GGUF models are ComfyUI-only. Using them returns `new_sd_ctx_t failed`. | Model | Source | Use | |-------|--------|-----| | `flux1-schnell-q4_k.gguf` | [leejet/FLUX.1-schnell-gguf](https://huggingface.co/leejet/FLUX.1-schnell-gguf) | FLUX GPU hybrid | | `flux1-schnell-Q3_K_S.gguf` | [leejet/FLUX.1-schnell-gguf](https://huggingface.co/leejet/FLUX.1-schnell-gguf) | FLUX lighter (~5.2GB) | | `DreamShaper_8.safetensors` | Civitai | SD 1.5 production | ### For ComfyUI (city96 compatible) | Model | Source | Use | |-------|--------|-----| | `flux1-schnell-Q4_K_S.gguf` | [city96/FLUX.1-schnell-gguf](https://huggingface.co/city96/FLUX.1-schnell-gguf) | ComfyUI only | | `flux1-schnell-fp8.safetensors` | Comfy-Org | Full 16GB CPU | ### VAE / CLIP / T5XXL (required for FLUX) | File | Purpose | RAM allocation | |------|---------|----------------| | `ae.safetensors` | VAE decoder | ~160MB CPU | | `clip_l.safetensors` | CLIP encoder | ~235MB GPU | | `t5xxl_fp16.safetensors` | T5 encoder | ~9.3GB CPU | | `t5xxl_fp8.safetensors` | T5 encoder (lighter) | ~5GB CPU | --- ## Architecture ``` OpenWebUI (Docker :3000) โ”‚ โ”œโ”€โ”€โ–บ LLM: llama-server.exe (:8081) โ€” RX 580 Vulkan โ”‚ โ””โ”€โ”€ fallback: Ollama (:11434) โ€” CPU โ”‚ โ””โ”€โ”€โ–บ Images: โ”œโ”€โ”€โ–บ SD 1.5 GGUF: sd-server.exe (:7860) โ€” RX 580 Vulkan โ””โ”€โ”€โ–บ FLUX.1 16GB: ComfyUI (:8188) โ€” Xeon CPU WSL2 ``` ### FLUX memory segmentation | Component | File | Allocation | Size | |-----------|------|------------|------| | Diffusion model | flux1-schnell-q4_k.gguf | **GPU VRAM** | ~6.5GB | | VAE | ae.safetensors | **CPU RAM** | ~160MB | | CLIP L | clip_l.safetensors | **GPU VRAM** | ~235MB | | T5XXL | t5xxl_fp16.safetensors | **CPU RAM** | ~9.3GB | --- ## What failed (documented with root cause) | Attempt | Error | Root cause | |---------|-------|------------| | DirectML | `OpaqueTensorImpl` | MS encapsulates tensors โ€” ComfyUI can't read them | | ROCm | Kernel panics | GCN4/Polaris dropped in v5.x โ€” permanent | | OpenVINO + Forge | `No module 'ldm'` | Extension targets A1111 โ€” incompatible with Forge | | CPU + HDD | ~19 min/image | Zero GPU utilization + I/O bottleneck | Full analysis: [docs/what-failed.md](https://github.com/aivisionslab-studios/rx580-local-ai-guide/blob/main/docs/what-failed.md) --- ## Community & Credits This work builds on independent research from: | Author | Publication | Contribution | |--------|-------------|-------------| | [่‰พ็ฑณๅฟƒ Amihart](https://medium.com/@amihart) | Medium, Jan 2025 | First validation of LLMs via Vulkan on RX 580 โ€” 24.56 tok/s | | [DH / DadHacks](https://dadhacks.org/2025/12/05/ai-image-generation-on-rx-580-using-vulkan-a-cost-effective-solution/) | dadhacks.org, Dec 2025 | Refuted "SD can't run on Vulkan" โ€” sd.cpp Linux guide | | [leejet](https://github.com/leejet/stable-diffusion.cpp) | GitHub | stable-diffusion.cpp engine | | [ggerganov](https://github.com/ggerganov/llama.cpp) | GitHub | llama.cpp + ggml engine | | [woodrex](https://hub.docker.com/r/woodrex/sd-webui-for-gfx803) | Docker Hub | ROCm gfx803 containers | > *"The hardware was never obsolete. It was waiting for the right software."* --- ## GitHub ๐Ÿ“ฆ [aivisionslab-studios/rx580-local-ai-guide](https://github.com/aivisionslab-studios/rx580-local-ai-guide) Scripts, build guides, automation, troubleshooting docs. --- ## License MIT โ€” use freely, give credit, document what you learn.