Top of the Open Models on EQ-Bench 3

#1
by GeoMaciolek - opened

This model is at the moment the top of the list of open models on EQ-Bench 3!.

That's quite the achievement! If you didn't target EQ Bench 3 specifically, that's astounding. But, even if you did target it, "benchmaxxing," it's still impressive.

I do hope you're able to adapt the training process to somewhat newer models, but I know that's probably easier said than done! (But, the base Qwen 3 - despite being great - is already pretty significantly overshadowed by newer models of similar or smaller sizes , e.g. Gemma-4-31B and Qwen3.6-27B - see my follow-up post. I'm sure it's not news to you, but it bears mentioning.)

Here's a link to the base Qwen3-32B model compared to a bunch of other open models (and a few proprietary ones for good measure). We all know benchmarks don't tell the whole story, but even so, these numbers are significant, and worth noticing.

Highlights, in ascending score:

Model Score Param/Size Notes
Qwen3-32B 14.5 32 -- Your base model --
Qwen3.5-2B 14.7 2
Qwen3.5-2B Reasoning 16.3 2
Qwen3-32B Reasoning 16.5 32 -- Your base model --
Gemma-4-E4B Reasoning 18.8 4 4B "Effective" - somewhat larger on disk
Qwen3.5-4B 22.6 4 In non reasoning, this 4B model beats 32B Reasoning
Qwen3.5-4B Reasoning 27.1 4
Qwen3.5-9B 27.3 9
Qwen3.5-9B Reasoning 32.4 9
Qwen3.6-27B Reasoning 37.1 27 These last two are highly regarded w/this one for logic,
Gemma-4-31B 39.2 31 and this one considered "good at writing"

ArtificialAnalysis.ai Comparison

Sign up or log in to comment