Top of the Open Models on EQ-Bench 3

by GeoMaciolek - opened 9 days ago

This model is at the moment the top of the list of open models on EQ-Bench 3!.

That's quite the achievement! If you didn't target EQ Bench 3 specifically, that's astounding. But, even if you did target it, "benchmaxxing," it's still impressive.

I do hope you're able to adapt the training process to somewhat newer models, but I know that's probably easier said than done! (But, the base Qwen 3 - despite being great - is already pretty significantly overshadowed by newer models of similar or smaller sizes , e.g. Gemma-4-31B and Qwen3.6-27B - see my follow-up post. I'm sure it's not news to you, but it bears mentioning.)

GeoMaciolek

9 days ago

•

edited 9 days ago

Here's a link to the base Qwen3-32B model compared to a bunch of other open models (and a few proprietary ones for good measure). We all know benchmarks don't tell the whole story, but even so, these numbers are significant, and worth noticing.

Highlights, in ascending score:

Model	Score	Param/Size	Notes
Qwen3-32B	14.5	32	-- Your base model --
Qwen3.5-2B	14.7	2
Qwen3.5-2B Reasoning	16.3	2
Qwen3-32B Reasoning	16.5	32	-- Your base model --
Gemma-4-E4B Reasoning	18.8	4	4B "Effective" - somewhat larger on disk
Qwen3.5-4B	22.6	4	In non reasoning, this 4B model beats 32B Reasoning
Qwen3.5-4B Reasoning	27.1	4
Qwen3.5-9B	27.3	9
Qwen3.5-9B Reasoning	32.4	9
Qwen3.6-27B Reasoning	37.1	27	These last two are highly regarded w/this one for logic,
Gemma-4-31B	39.2	31	and this one considered "good at writing"

ArtificialAnalysis.ai Comparison

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment