Best for 5060ti 16GB?
Hi,
I Have 64gb ddr5 and 5060ti 16GB GPU, which quant is best for the hardware? And does these quants have impact on agentic workflows in cline or open code compared to bf16 baseline?
I haven’t tested it on an RTX 5060 Ti yet, but I would expect behavior similar to the RTX 4080.
In that case, I would recommend:
- Qwen3-Coder-30B-A3B-Instruct – IQ3_S (3.12 bpw)
https://huggingface.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF/blob/main/Qwen3-Coder-30B-A3B-Instruct-IQ3_S-3.12bpw.gguf
This configuration should leave roughly 4 GB of VRAM available for context, which should comfortably support at least a 32K context window.
You can find the full performance plots here:
https://byteshape.com/blogs/Devstral-Small-2-24B-Instruct-2512/
In terms of agentic workflows , these quants should perform well. We’ve evaluated them across several coding and tool-calling benchmarks, and they remain solid relative to the BF16 baseline.
Let us know how your setup goes.
The optimization is really crazy, for the same parameters, im getting 135 tps with this vs 30 for Qwen3-Coder-30B-A3B-Instruct-UD-IQ3_XXS
Many thanks for the update!
