Qwen3-4B-Element8
Brainwaves:
mxfp4 0.533,0.731,0.854,0.689,0.402,0.762,0.657
qx64-hi 0.531,0.728,0.857,0.702,0.410,0.764,0.671
qx86-hi 0.540,0.725,0.866,0.708,0.430,0.769,0.669
bf16 0.542,0.731,0.866,0.706,0.428,0.765,0.655
The last two models merged into Element8:
Qwen3-4B-Element6d
Brings MiniMax-M2.1 traces
mxfp4 0.536,0.718,0.856,0.691,0.400,0.775,0.673
qx64-hi 0.533,0.727,0.865,0.696,0.412,0.766,0.673
qx86-hi 0.536,0.730,0.865,0.704,0.424,0.771,0.665
bf16 0.536,0.731,0.868,0.704,0.434,0.769,0.673
Qwen3-4B-Element7
Brings MiMo-V2-Flash traces
mxfp4 0.532,0.733,0.854,0.690,0.392,0.764,0.661
qx64-hi 0.532,0.729,0.857,0.699,0.414,0.766,0.662
qx86-hi 0.538,0.722,0.864,0.707,0.424,0.768,0.670
bf16 0.540,0.726,0.868,0.707,0.416,0.770,0.657
One of the interesting side effects of the merge is the resistance to quantization: there is almost no difference in cognitive performance betwen quants and full precision, it only shows in openbookqa, exactly where it should, because of the direct dependence on representative depth.
Base models in the merge, showing the highest performing quants in qx86-hi.
The top 4 models are all Nightmedia models, each a successful merge similar to Element8.
Agent 0.603,0.817,0.838,0.743,0.426,0.780,0.708
Agent-Claude 0.561,0.760,0.862,0.714,0.422,0.780,0.683
Engineer-trial17 0.569,0.774,0.849,0.705,0.440,0.773,0.642
Engineer3x-Trial122 0.556,0.757,0.850,0.642,0.436,0.754,0.611
Qwen3-4B-RA-SFT 0.515,0.715,0.856,0.615,0.436,0.754,0.629
Jan-v1-2509 0.435,0.540,0.729,0.588,0.388,0.730,0.633
Polaris-Alpha 0.488,0.653,0.846,0.515,0.378,0.683,0.576
Gemini-Flash 0.386,0.447,0.685,0.582,0.362,0.723,0.593
Apollo-Heretic 0.436,0.583,0.805,0.605,0.398,0.738,0.608
Claude-Haiku-4.5 0.469,0.607,0.842,0.560,0.394,0.705,0.585
Claude-Haiku-4.5-HI 0.416,0.513,0.719,0.581,0.372,0.722,0.603
Gemini-3-Pro 0.466,0.658,0.849,0.568,0.394,0.713,0.571
MiMo-V2-Flash 0.398,0.484,0.673,0.618,0.374,0.721,0.630
MiniMax-M2.1 0.391,0.449,0.627,0.588,0.356,0.718,0.609
These are the Qwen base model performance: where everyone started with their training. The goal of a merge is to stay well above those metrics if it is to have learned something in the process. By merging traces from different sources, the value of truth is dented, but moved towards a safe average, which explains why quants don't mind the squeeze.
Qwen3-4B-Thinking-2507-qx86-hi 0.372,0.414,0.625,0.518,0.366,0.698,0.612
Qwen3-4B-Instruct-2507-qx86-hi 0.447,0.593,0.843,0.448,0.390,0.690,0.554
For coding, reasoning, and high performance agentic, the Engineer, Architect, and Agent series from Nightmedia are hard to beat. Models with less merge steps are usually better at coding, the others better reasoners. Think tags do sometimes appear, when the model hits a hard spot. Those are rare and far apart.
We have models in the same performance envelope as 8B, 14B, 30B, and 6B/8B/13B/21B/42B brainstorming versions by DavidAU
The Element series are experimental, and generally fun as conversational models.
-G
P.S. It seems popular, so I made the source openly available
Stages of development
Qwen3-4B-Engineer
multislerp
- janhq/Jan-v1-2509
- Gen-Verse/Qwen3-4B-RA-SFT
- TeichAI/Qwen3-4B-Instruct-2507-Polaris-Alpha-Distill
- TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill
qx86-hi 0.605,0.828,0.843,0.748,0.416,0.777,0.706
Qwen3-4B-Engineer3x
multislerp
- Gen-Verse/Qwen3-4B-RA-SFT
- TeichAI/Qwen3-4B-Instruct-2507-Polaris-Alpha-Distill
- TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill
qx86-hi 0.615,0.835,0.852,0.745,0.420,0.780,0.704
Qwen3-4B-Agent
multislerp, abliterated with heretic by DavidAU:
- Qwen3-Engineer3x-4B-Run2-Trial122-7-003
- Qwen3-Engineer-4b-run2-trial17-8-004
- Qwen3-4B-Apollo-V0.1-4B-Thinking-Heretic-Abliterated
qx86-hi 0.603,0.817,0.838,0.743,0.426,0.780,0.708
Qwen3-4B-Agent-Claude-Gemini-heretic
multislerp, abliterated with heretic by DavidAU:
- Qwen3-4B-Agent
- TeichAI/Qwen3-4B-Instruct-2507-Claude-Haiku-4.5-Distill
- TeichAI/Qwen3-4B-Instruct-2507-Gemini-3-Pro-Preview-Distill
- TeichAI/Qwen3-4B-Thinking-2507-Claude-Haiku-4.5-High-Reasoning-Distill
qx86-hi 0.561,0.760,0.862,0.714,0.422,0.780,0.683
Qwen3-4B-Element6d
nuslerp (1.3/0.7)
- Qwen3-4B-Agent-Claude-Gemini-heretic
- TeichAI/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill
qx86-hi 0.536,0.730,0.865,0.704,0.424,0.771,0.665
Qwen3-4B-Element7
nuslerp (1.3/0.7)
- Qwen3-4B-Agent-Claude-Gemini-heretic
- TeichAI/Qwen3-4B-Thinking-2507-MiMo-V2-Flash-Distill
qx86-hi 0.538,0.722,0.864,0.707,0.424,0.768,0.670
Qwen3-4B-Element8
nuslerp (1.5/0.5)
- Qwen3-4B-Element6d
- Qwen3-4B-Element7
qx86-hi 0.540,0.725,0.866,0.708,0.430,0.769,0.669
Performance numbers vary slightly with accumulated model traces.
It makes for very interesting lines of conversation.
...let's just say it's different.
- Downloads last month
- 16