The problem with the token generation rate on 4b models

by E7Reine - opened Mar 4

Mar 4

if we compare qwen3 4b Q6, I have 27 tokens per second, but with qwen3.5 4b Q6 10-11, I use LM studio. What could be the reason for this?
No other 4b models behaved like this Gemma3 4b either

E7Reine changed discussion status to closed Mar 5

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment