The problem with the token generation rate on 4b models

#7
by E7Reine - opened

if we compare qwen3 4b Q6, I have 27 tokens per second, but with qwen3.5 4b Q6 10-11, I use LM studio. What could be the reason for this?
No other 4b models behaved like this Gemma3 4b either

E7Reine changed discussion status to closed

Sign up or log in to comment