exllamav3 updates

#1
by UnstableLlama - opened

Thank you for these! There were some fixes to the qwen3-next inference pipeline in exllama in v0.0.21, with it these models should perform even better than they did when you quantized them. These quants should still work fine though, I believe. It might be helpful if you included this info in the model card.

Thanks for letting me know! I've updated the model card with a note about the v0.0.21 fixes and a link to the relevant commit. Also tested it myself - everything works fine on v0.0.21.

NeuroSenko changed discussion status to closed

Sign up or log in to comment