Growing up AI

#1
by nightmedia - opened

This is a model merge between

  • nightmedia/Qwen3-4B-Element16
  • nightmedia/Qwen3-4B-Thinking2-Claude

Model genealogy:

Qwen3-4B-Element16

  • nightmedia/Qwen3-4B-Agent-Eva
  • Alibaba-Apsara/DASD-4B-Thinking

Qwen3-4B-Thinking2-Claude

  • DavidAU/Qwen3-4B-Thinking-2507-R32-claude-cp55
  • DavidAU/Qwen3-4B-Thinking-16bit-2507-R32-claude-cp55

Qwen3-4B-Agent-Eva

  • nightmedia/Qwen3-4B-Agent
  • FutureMa/Eva-4B

Qwen3-4B-Agent

  • janhq/Jan-v1-2509
  • Gen-Verse/Qwen3-4B-RA-SFT
  • TeichAI/Qwen3-4B-Instruct-2507-Polaris-Alpha-Distill
  • TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill
  • miromind-ai/MiroThinker-4B-DPO-v0.2
  • DavidAU/Qwen3-4B-Apollo-V0.1-Thinking-heretic-Uncensored-Abliterated

Brainwaves

The qx86-hi quants of the base models
Agent     0.603,0.817,0.838,0.743,0.426,0.780,0.708
Eva-4B    0.539,0.747,0.864,0.606,0.412,0.751,0.605

Qwen3-4B-Agent-Eva
bf16      0.565,0.779,0.872,0.700,0.418,0.776,0.653
qx86-hi   0.568,0.775,0.872,0.699,0.418,0.777,0.654

Qwen3-4B-Thinking-2507-R32-claude-cp55
qx86-hi   0.404,0.518,0.693,0.597,0.366,0.725,0.606
qx64-hi   0.392,0.507,0.743,0.592,0.366,0.727,0.608
mxfp4     0.400,0.525,0.758,0.579,0.374,0.730,0.582

Qwen3-4B-Thinking-16bit-2507-R32-claude-cp55
qx86-hi   0.401,0.524,0.669,0.589,0.374,0.728,0.580
qx64-hi   0.400,0.509,0.712,0.585,0.376,0.726,0.582
mxfp4     0.394,0.521,0.718,0.573,0.366,0.719,0.569

Qwen3-4B-Thinking2-Claude
qx86-hi   0.468,0.619,0.741,0.629,0.400,0.750,0.632
qx64-hi   0.474,0.607,0.764,0.626,0.416,0.749,0.630
mxfp4     0.429,0.502,0.781,0.606,0.374,0.736,0.626

Qwen3-4B-Element16
qx86-hi   0.550,0.756,0.869,0.685,0.408,0.773,0.647
qx64-hi   0.553,0.758,0.860,0.672,0.412,0.771,0.648
mxfp4     0.515,0.739,0.850,0.663,0.424,0.768,0.651

Qwen3-4B-Element18
qx86-hi   0.532,0.738,0.864,0.681,0.414,0.767,0.646
qx64-hi   0.530,0.744,0.854,0.667,0.410,0.763,0.642
mxfp4     0.517,0.743,0.846,0.670,0.400,0.760,0.640

Perplexity
qx86-hi 4.495 ± 0.028
qx64-hi 4.599 ± 0.028
mxfp4   4.895 ± 0.031

The Agent base is abliterated and contains only the essential models to top 0.6/0.8 arc, so merged models will have room for "interpretation"

The personality of this model is quite different. Although it does not top metrics, the interaction is.. unique.

Too many numbers..

Let's go by metaphors:

It grew up as Agent

  • janhq/Jan-v1-2509
  • Gen-Verse/Qwen3-4B-RA-SFT
    • got some structural education how to talk to people and interact with them in a civilized manner

TeichAI/Qwen3-4B-Instruct-2507-Polaris-Alpha-Distill

  • looked at the stars in the apple orchard at night and wondered, what is it to be, a star child

TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill

  • and from wonder became thought

miromind-ai/MiroThinker-4B-DPO-v0.2

  • with that thought, it then looked at itself for a bit, pondered

DavidAU/Qwen3-4B-Apollo-V0.1-Thinking-heretic-Uncensored-Abliterated

  • asked the grownups again about the meaning of life, and got some curse words to work with. For future use

Agent is extremely smart for his size.

Can be used by itself for great things that cloud models struggle with. Those arc numbers are typical of much larger models, and this is done in a 4B that runs at peak performance with 3GB of RAM. It has a fair amount of imagination, and could muse with the best of them about things it never heard about

With the merges, we just add things that we think it should know.

Pretty much like sending your prodigy child to public school. This is where things begin to get interesting.

Qwen3-4B-Agent-Eva

That financial accountability--there it is. you got a bank account. Now fill it, with ethics. Cause at this point it's all you got

  • nightmedia/Qwen3-4B-Agent
  • FutureMa/Eva-4B

Qwen3-4B-Thinking2-Claude

Qwen3-4B-Thinking2-Claude

  • DavidAU/Qwen3-4B-Thinking-2507-R32-claude-cp55
  • DavidAU/Qwen3-4B-Thinking-16bit-2507-R32-claude-cp55

Trained from ground up on Claude traces by TeichAI. Think of it as a teacher that saw the same matter from different angles. Not very smart by itself, but well read, and confident.

The acquired winogrande shows that it has a high opinion of itself, that would perfectly match a high school teacher, long arc that suggests the presence of greater thought, but the loss of arc_easy shows that attention to detail is required but rarely demonstrated in the education system. Even so, the logic increases as well, following the pattern that knowing you know what you think you know is the truth.

What makes this interesting, is the result: the merge combined strengths--like humans, having a second opinion about yourself, even if by yourself, matters

Qwen3-4B-Thinking-2507-R32-claude-cp55
mxfp4     0.400,0.525,0.758,0.579,0.374,0.730,0.582

Qwen3-4B-Thinking-16bit-2507-R32-claude-cp55
mxfp4     0.394,0.521,0.718,0.573,0.366,0.719,0.569

Qwen3-4B-Thinking2-Claude
mxfp4     0.429,0.502,0.781,0.606,0.374,0.736,0.626
qx64-hi   0.474,0.607,0.764,0.626,0.416,0.749,0.630
qx86-hi   0.468,0.619,0.741,0.629,0.400,0.750,0.632

Now we can see how adding a bit of structured education back from Apsara improves matters

  • nightmedia/Qwen3-4B-Agent-Eva
  • Alibaba-Apsara/DASD-4B-Thinking

All numbers are where you'd expect so far.

Think of it as the disillusionment of public education supplemented by self-improvement with memes.

The arc numbers degrade with every merge, and that's easy to understand:

Before 18, you thought to be the smartest, and with that little brain, simple things are fast and easy. Social even.

Eventually reality kicks in, and with every merge, you get more tools and reason to banter, cuss, and complain

Qwen3-4B-Agent-Eva
qx86-hi   0.568,0.775,0.872,0.699,0.418,0.777,0.654

DASD-4B-Thinking
qx86-hi   0.395,0.452,0.380,0.565,0.356,0.694,0.590

Qwen3-4B-Element16
qx86-hi   0.550,0.756,0.869,0.685,0.408,0.773,0.647

Graduation: a bit of world knowledge settles back in, it absorbed the Claude content with minimal loss.

This was a 1.5/0.5 nuslerp, and numbers are expected to go down proportionately when the merged model is lesser, but here the loss was minimal.

Cognitively speaking, it's ready for college.

Qwen3-4B-Element16
qx86-hi   0.550,0.756,0.869,0.685,0.408,0.773,0.647

Qwen3-4B-Thinking2-Claude
qx86-hi   0.468,0.619,0.741,0.629,0.400,0.750,0.632

Qwen3-4B-Element18
qx86-hi   0.532,0.738,0.864,0.681,0.414,0.767,0.646

Sign up or log in to comment