a question aout support

by sca255 - opened Feb 17

Feb 17

so i got good enough speed at even 12b with gguf vulkan ( i have an rx 580 2048sp), but would exl2/3 be any faster(for such a legacy gpu) if possible, also if possible are there other ways that is not similar to compiling a whole lot of thing on rocm6.2, neede3d for torch patched for gfx803, is there any better way to do it thats worth it?
anyway appreciate all the models you create, the server well :)

SicariusSicariiStuff

Owner Feb 17

Your RX 580 is:

Pre-RDNA
Unsupported by modern ROCm officially
Missing newer matrix acceleration features

Meaning:

👉 Most modern “fast” inference stacks are optimized for CUDA or newer AMD cards.

I suggest using CPU inference for 4B models, don't go into ROCm hell.

SicariusSicariiStuff changed discussion status to closed Feb 17

sca255

Feb 19

thx i can run quantized 12b and 8b faster than cpu, so i am currently using vulkan, vulkan does the job for almost evrything do,
also your 4b model is indsanely great i normally run bloodmoon 12b or wingless 8b but even the 4b does as great as 8b sometimes.
thx :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment