README.md · aivisionslab/ai-local-rx580-stack at main

ai-local-rx580-stack / README.md

aivisionslab

Update README.md

d6c8c90 verified 2 days ago

preview code

raw

history blame contribute delete

6.12 kB

	---
	license: mit
	language:
	- pt
	- en
	- es
	- fr
	- ar
	tags:
	- vulkan
	- amd
	- rx580
	- local-ai
	- llama-cpp
	- stable-diffusion
	- gguf
	- flux
	- openwebui
	- polaris
	- gcn4
	- hardware-revival
	- windows
	- wsl2
	pretty_name: RX 580 Local AI — Complete Stack (AIVisionsLab)
	---

	# RX 580 Local AI — Complete Stack

	AIVisionsLab Studios · São Paulo, Brazil 🇧🇷

	> Running SOTA AI on 2017 hardware in 2026. No CUDA. No ROCm. No cloud.

	---

	## What this is

	This repository documents the complete stack for running local AI on an AMD RX 580 8GB using the Vulkan API as the GPU backend — bypassing the need for CUDA or ROCm entirely.

	AMD officially dropped ROCm support for Polaris/GCN4 in v5.x. DirectML failed. OpenVINO failed.
	This project proves the hardware is still capable — the problem was always the software stack, not the GPU.

	Full master documentation (PT/EN/ES/FR/AR):
	🌐 [setup-ia-local-rx580-vulkan.web.app](https://setup-ia-local-rx580-vulkan.web.app/)

	---

	## Hardware

	\| Component \| Spec \|
	\|-----------\|------\|
	\| GPU \| AMD RX 580 2048SP 8GB GDDR5 (Polaris / GCN4) \|
	\| CPU \| Intel Xeon E5-2690 v3 — 12c/24t · 3.5GHz boost (2014) \|
	\| RAM \| 32GB DDR4 REG ECC Quad Channel RDIMM \|
	\| Storage \| NVMe 1TB — 1.7–3.5 GB/s (critical bottleneck) \|
	\| OS \| Windows 10 Pro + WSL2 Ubuntu 22.04.5 \|
	\| Vulkan SDK \| 1.4.341.1 \|
	\| AMD Driver \| 31.0.21924.61 \|

	---

	## Performance (real logs, not synthetic benchmarks)

	### LLM — llama.cpp with Vulkan

	\| Model \| Quantization \| Speed \| VRAM \|
	\|-------\|-------------\|-------\|------\|
	\| Mistral 7B Instruct \| Q4_K_M \| ~9 tok/s \| ~6GB \|
	\| Llama 3 8B Instruct \| Q4_K_M \| ~7 tok/s \| ~6.8GB \|
	\| Qwen2.5 7B \| Q4_K_M \| ~8 tok/s \| ~6.2GB \|
	\| DeepSeek R1 8B \| Q4_K_M \| ~7 tok/s \| ~6.8GB \|

	> CPU baseline (Xeon, no GPU): 3–5 tok/s. Vulkan uplift: 3–4×

	### Image Generation — stable-diffusion.cpp with Vulkan

	\| Model \| Resolution \| Steps \| Time \| Backend \|
	\|-------\|------------\|-------\|------\|---------\|
	\| DreamShaper 8 (SD 1.5 GGUF) \| 512×512 \| 20 \| ~72s \| RX 580 Vulkan \|
	\| FLUX.1 Schnell q4_k \| 1024×1024 \| 4 \| ~14 min \| GPU+CPU hybrid \|
	\| FLUX.1 Schnell fp8 (16GB) \| 1024×1024 \| 4 \| ~24 min \| Xeon CPU / WSL2 \|

	### Storage impact

	\| Operation \| HDD \| NVMe \| Improvement \|
	\|-----------\|-----\|------\|-------------\|
	\| LLM 7B load \| ~25 min \| ~4 min \| 6× faster \|
	\| FLUX 16GB load \| ~25 min \| ~30s \| 50× faster \|

	---

	## Models used

	### For sd-server (stable-diffusion.cpp)

	> ⚠️ Critical: Only use leejet GGUF models for sd-server.
	> city96 GGUF models are ComfyUI-only. Using them returns `new_sd_ctx_t failed`.

	\| Model \| Source \| Use \|
	\|-------\|--------\|-----\|
	\| `flux1-schnell-q4_k.gguf` \| [leejet/FLUX.1-schnell-gguf](https://huggingface.co/leejet/FLUX.1-schnell-gguf) \| FLUX GPU hybrid \|
	\| `flux1-schnell-Q3_K_S.gguf` \| [leejet/FLUX.1-schnell-gguf](https://huggingface.co/leejet/FLUX.1-schnell-gguf) \| FLUX lighter (~5.2GB) \|
	\| `DreamShaper_8.safetensors` \| Civitai \| SD 1.5 production \|

	### For ComfyUI (city96 compatible)

	\| Model \| Source \| Use \|
	\|-------\|--------\|-----\|
	\| `flux1-schnell-Q4_K_S.gguf` \| [city96/FLUX.1-schnell-gguf](https://huggingface.co/city96/FLUX.1-schnell-gguf) \| ComfyUI only \|
	\| `flux1-schnell-fp8.safetensors` \| Comfy-Org \| Full 16GB CPU \|

	### VAE / CLIP / T5XXL (required for FLUX)

	\| File \| Purpose \| RAM allocation \|
	\|------\|---------\|----------------\|
	\| `ae.safetensors` \| VAE decoder \| ~160MB CPU \|
	\| `clip_l.safetensors` \| CLIP encoder \| ~235MB GPU \|
	\| `t5xxl_fp16.safetensors` \| T5 encoder \| ~9.3GB CPU \|
	\| `t5xxl_fp8.safetensors` \| T5 encoder (lighter) \| ~5GB CPU \|

	---

	## Architecture

	```
	OpenWebUI (Docker :3000)
	│
	├──► LLM: llama-server.exe (:8081) — RX 580 Vulkan
	│ └── fallback: Ollama (:11434) — CPU
	│
	└──► Images:
	├──► SD 1.5 GGUF: sd-server.exe (:7860) — RX 580 Vulkan
	└──► FLUX.1 16GB: ComfyUI (:8188) — Xeon CPU WSL2
	```

	### FLUX memory segmentation

	\| Component \| File \| Allocation \| Size \|
	\|-----------\|------\|------------\|------\|
	\| Diffusion model \| flux1-schnell-q4_k.gguf \| GPU VRAM \| ~6.5GB \|
	\| VAE \| ae.safetensors \| CPU RAM \| ~160MB \|
	\| CLIP L \| clip_l.safetensors \| GPU VRAM \| ~235MB \|
	\| T5XXL \| t5xxl_fp16.safetensors \| CPU RAM \| ~9.3GB \|

	---

	## What failed (documented with root cause)

	\| Attempt \| Error \| Root cause \|
	\|---------\|-------\|------------\|
	\| DirectML \| `OpaqueTensorImpl` \| MS encapsulates tensors — ComfyUI can't read them \|
	\| ROCm \| Kernel panics \| GCN4/Polaris dropped in v5.x — permanent \|
	\| OpenVINO + Forge \| `No module 'ldm'` \| Extension targets A1111 — incompatible with Forge \|
	\| CPU + HDD \| ~19 min/image \| Zero GPU utilization + I/O bottleneck \|

	Full analysis: [docs/what-failed.md](https://github.com/aivisionslab-studios/rx580-local-ai-guide/blob/main/docs/what-failed.md)

	---

	## Community & Credits

	This work builds on independent research from:

	\| Author \| Publication \| Contribution \|
	\|--------\|-------------\|-------------\|
	\| [艾米心 Amihart](https://medium.com/@amihart) \| Medium, Jan 2025 \| First validation of LLMs via Vulkan on RX 580 — 24.56 tok/s \|
	\| [DH / DadHacks](https://dadhacks.org/2025/12/05/ai-image-generation-on-rx-580-using-vulkan-a-cost-effective-solution/) \| dadhacks.org, Dec 2025 \| Refuted "SD can't run on Vulkan" — sd.cpp Linux guide \|
	\| [leejet](https://github.com/leejet/stable-diffusion.cpp) \| GitHub \| stable-diffusion.cpp engine \|
	\| [ggerganov](https://github.com/ggerganov/llama.cpp) \| GitHub \| llama.cpp + ggml engine \|
	\| [woodrex](https://hub.docker.com/r/woodrex/sd-webui-for-gfx803) \| Docker Hub \| ROCm gfx803 containers \|

	> "The hardware was never obsolete. It was waiting for the right software."

	---

	## GitHub

	📦 [aivisionslab-studios/rx580-local-ai-guide](https://github.com/aivisionslab-studios/rx580-local-ai-guide)
	Scripts, build guides, automation, troubleshooting docs.

	---

	## License

	MIT — use freely, give credit, document what you learn.