a question aout support
so i got good enough speed at even 12b with gguf vulkan ( i have an rx 580 2048sp), but would exl2/3 be any faster(for such a legacy gpu) if possible, also if possible are there other ways that is not similar to compiling a whole lot of thing on rocm6.2, neede3d for torch patched for gfx803, is there any better way to do it thats worth it?
anyway appreciate all the models you create, the server well :)
Your RX 580 is:
Pre-RDNA
Unsupported by modern ROCm officially
Missing newer matrix acceleration features
Meaning:
๐ Most modern โfastโ inference stacks are optimized for CUDA or newer AMD cards.
I suggest using CPU inference for 4B models, don't go into ROCm hell.
thx i can run quantized 12b and 8b faster than cpu, so i am currently using vulkan, vulkan does the job for almost evrything do,
also your 4b model is indsanely great i normally run bloodmoon 12b or wingless 8b but even the 4b does as great as 8b sometimes.
thx :)