Top of the Open Models on EQ-Bench 3
This model is at the moment the top of the list of open models on EQ-Bench 3!.
That's quite the achievement! If you didn't target EQ Bench 3 specifically, that's astounding. But, even if you did target it, "benchmaxxing," it's still impressive.
I do hope you're able to adapt the training process to somewhat newer models, but I know that's probably easier said than done! (But, the base Qwen 3 - despite being great - is already pretty significantly overshadowed by newer models of similar or smaller sizes , e.g. Gemma-4-31B and Qwen3.6-27B - see my follow-up post. I'm sure it's not news to you, but it bears mentioning.)
Here's a link to the base Qwen3-32B model compared to a bunch of other open models (and a few proprietary ones for good measure). We all know benchmarks don't tell the whole story, but even so, these numbers are significant, and worth noticing.
Highlights, in ascending score:
| Model | Score | Param/Size | Notes |
|---|---|---|---|
| Qwen3-32B | 14.5 | 32 | -- Your base model -- |
| Qwen3.5-2B | 14.7 | 2 | |
| Qwen3.5-2B Reasoning | 16.3 | 2 | |
| Qwen3-32B Reasoning | 16.5 | 32 | -- Your base model -- |
| Gemma-4-E4B Reasoning | 18.8 | 4 | 4B "Effective" - somewhat larger on disk |
| Qwen3.5-4B | 22.6 | 4 | In non reasoning, this 4B model beats 32B Reasoning |
| Qwen3.5-4B Reasoning | 27.1 | 4 | |
| Qwen3.5-9B | 27.3 | 9 | |
| Qwen3.5-9B Reasoning | 32.4 | 9 | |
| Qwen3.6-27B Reasoning | 37.1 | 27 | These last two are highly regarded w/this one for logic, |
| Gemma-4-31B | 39.2 | 31 | and this one considered "good at writing" |