Qwen3-4B-Element8

Brainwaves:

mxfp4    0.533,0.731,0.854,0.689,0.402,0.762,0.657
qx64-hi  0.531,0.728,0.857,0.702,0.410,0.764,0.671
qx86-hi  0.540,0.725,0.866,0.708,0.430,0.769,0.669
bf16     0.542,0.731,0.866,0.706,0.428,0.765,0.655

The last two models merged into Element8:

Qwen3-4B-Element6d

Brings MiniMax-M2.1 traces

mxfp4    0.536,0.718,0.856,0.691,0.400,0.775,0.673
qx64-hi  0.533,0.727,0.865,0.696,0.412,0.766,0.673
qx86-hi  0.536,0.730,0.865,0.704,0.424,0.771,0.665
bf16     0.536,0.731,0.868,0.704,0.434,0.769,0.673

Qwen3-4B-Element7

Brings MiMo-V2-Flash traces

mxfp4    0.532,0.733,0.854,0.690,0.392,0.764,0.661
qx64-hi  0.532,0.729,0.857,0.699,0.414,0.766,0.662
qx86-hi  0.538,0.722,0.864,0.707,0.424,0.768,0.670
bf16     0.540,0.726,0.868,0.707,0.416,0.770,0.657

One of the interesting side effects of the merge is the resistance to quantization: there is almost no difference in cognitive performance betwen quants and full precision, it only shows in openbookqa, exactly where it should, because of the direct dependence on representative depth.

Base models in the merge, showing the highest performing quants in qx86-hi.

The top 4 models are all Nightmedia models, each a successful merge similar to Element8.

Agent               0.603,0.817,0.838,0.743,0.426,0.780,0.708
Agent-Claude        0.561,0.760,0.862,0.714,0.422,0.780,0.683
Engineer-trial17    0.569,0.774,0.849,0.705,0.440,0.773,0.642
Engineer3x-Trial122 0.556,0.757,0.850,0.642,0.436,0.754,0.611

Qwen3-4B-RA-SFT     0.515,0.715,0.856,0.615,0.436,0.754,0.629
Jan-v1-2509         0.435,0.540,0.729,0.588,0.388,0.730,0.633
Polaris-Alpha       0.488,0.653,0.846,0.515,0.378,0.683,0.576
Gemini-Flash        0.386,0.447,0.685,0.582,0.362,0.723,0.593
Apollo-Heretic      0.436,0.583,0.805,0.605,0.398,0.738,0.608
Claude-Haiku-4.5    0.469,0.607,0.842,0.560,0.394,0.705,0.585
Claude-Haiku-4.5-HI 0.416,0.513,0.719,0.581,0.372,0.722,0.603
Gemini-3-Pro        0.466,0.658,0.849,0.568,0.394,0.713,0.571
MiMo-V2-Flash       0.398,0.484,0.673,0.618,0.374,0.721,0.630
MiniMax-M2.1        0.391,0.449,0.627,0.588,0.356,0.718,0.609

These are the Qwen base model performance: where everyone started with their training. The goal of a merge is to stay well above those metrics if it is to have learned something in the process. By merging traces from different sources, the value of truth is dented, but moved towards a safe average, which explains why quants don't mind the squeeze.

Qwen3-4B-Thinking-2507-qx86-hi 0.372,0.414,0.625,0.518,0.366,0.698,0.612
Qwen3-4B-Instruct-2507-qx86-hi 0.447,0.593,0.843,0.448,0.390,0.690,0.554

For coding, reasoning, and high performance agentic, the Engineer, Architect, and Agent series from Nightmedia are hard to beat. Models with less merge steps are usually better at coding, the others better reasoners. Think tags do sometimes appear, when the model hits a hard spot. Those are rare and far apart.

We have models in the same performance envelope as 8B, 14B, 30B, and 6B/8B/13B/21B/42B brainstorming versions by DavidAU

The Element series are experimental, and generally fun as conversational models.

-G

P.S. It seems popular, so I made the source openly available

Stages of development

Qwen3-4B-Engineer

multislerp

  • janhq/Jan-v1-2509
  • Gen-Verse/Qwen3-4B-RA-SFT
  • TeichAI/Qwen3-4B-Instruct-2507-Polaris-Alpha-Distill
  • TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill

    qx86-hi 0.605,0.828,0.843,0.748,0.416,0.777,0.706

Qwen3-4B-Engineer3x

multislerp

  • Gen-Verse/Qwen3-4B-RA-SFT
  • TeichAI/Qwen3-4B-Instruct-2507-Polaris-Alpha-Distill
  • TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill

    qx86-hi 0.615,0.835,0.852,0.745,0.420,0.780,0.704

Qwen3-4B-Agent

multislerp, abliterated with heretic by DavidAU:

  • Qwen3-Engineer3x-4B-Run2-Trial122-7-003
  • Qwen3-Engineer-4b-run2-trial17-8-004
  • Qwen3-4B-Apollo-V0.1-4B-Thinking-Heretic-Abliterated

    qx86-hi 0.603,0.817,0.838,0.743,0.426,0.780,0.708

Qwen3-4B-Agent-Claude-Gemini-heretic

multislerp, abliterated with heretic by DavidAU:

  • Qwen3-4B-Agent
  • TeichAI/Qwen3-4B-Instruct-2507-Claude-Haiku-4.5-Distill
  • TeichAI/Qwen3-4B-Instruct-2507-Gemini-3-Pro-Preview-Distill
  • TeichAI/Qwen3-4B-Thinking-2507-Claude-Haiku-4.5-High-Reasoning-Distill

    qx86-hi 0.561,0.760,0.862,0.714,0.422,0.780,0.683

Qwen3-4B-Element6d

nuslerp (1.3/0.7)

  • Qwen3-4B-Agent-Claude-Gemini-heretic
  • TeichAI/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill

    qx86-hi 0.536,0.730,0.865,0.704,0.424,0.771,0.665

Qwen3-4B-Element7

nuslerp (1.3/0.7)

  • Qwen3-4B-Agent-Claude-Gemini-heretic
  • TeichAI/Qwen3-4B-Thinking-2507-MiMo-V2-Flash-Distill

    qx86-hi 0.538,0.722,0.864,0.707,0.424,0.768,0.670

Qwen3-4B-Element8

nuslerp (1.5/0.5)

  • Qwen3-4B-Element6d
  • Qwen3-4B-Element7

    qx86-hi 0.540,0.725,0.866,0.708,0.430,0.769,0.669

Performance numbers vary slightly with accumulated model traces.

It makes for very interesting lines of conversation.

...let's just say it's different.

Downloads last month
16
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nightmedia/Qwen3-4B-Element8

Collection including nightmedia/Qwen3-4B-Element8