Best for 5060ti 16GB?

#1
by nikhilprasanth - opened

Hi,
I Have 64gb ddr5 and 5060ti 16GB GPU, which quant is best for the hardware? And does these quants have impact on agentic workflows in cline or open code compared to bf16 baseline?

ByteShape org
edited 4 days ago

I haven’t tested it on an RTX 5060 Ti yet, but I would expect behavior similar to the RTX 4080.

In that case, I would recommend:

This configuration should leave roughly 4 GB of VRAM available for context, which should comfortably support at least a 32K context window.

You can find the full performance plots here:
https://byteshape.com/blogs/Devstral-Small-2-24B-Instruct-2512/

In terms of agentic workflows , these quants should perform well. We’ve evaluated them across several coding and tool-calling benchmarks, and they remain solid relative to the BF16 baseline.

Let us know how your setup goes.

I tried the same with 32k context- really good performance! initially I'm getting 130+ tok/s. It is using tools well too!
Screenshot 2026-02-19 100114

The optimization is really crazy, for the same parameters, im getting 135 tps with this vs 30 for Qwen3-Coder-30B-A3B-Instruct-UD-IQ3_XXS

ByteShape org

Many thanks for the update!

Sign up or log in to comment