exllamav3 updates

by UnstableLlama - opened Feb 11

Feb 11

Thank you for these! There were some fixes to the qwen3-next inference pipeline in exllama in v0.0.21, with it these models should perform even better than they did when you quantized them. These quants should still work fine though, I believe. It might be helpful if you included this info in the model card.

NeuroSenko

Owner Feb 11

Thanks for letting me know! I've updated the model card with a note about the v0.0.21 fixes and a link to the relevant commit. Also tested it myself - everything works fine on v0.0.21.

NeuroSenko changed discussion status to closed Feb 11

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment