Growing up AI

by nightmedia - opened Jan 21

Discussion

nightmedia

Owner Jan 21

•

edited Jan 23

This is a model merge between

nightmedia/Qwen3-4B-Element16
nightmedia/Qwen3-4B-Thinking2-Claude

Model genealogy:

Qwen3-4B-Element16

nightmedia/Qwen3-4B-Agent-Eva
Alibaba-Apsara/DASD-4B-Thinking

Qwen3-4B-Thinking2-Claude

DavidAU/Qwen3-4B-Thinking-2507-R32-claude-cp55
DavidAU/Qwen3-4B-Thinking-16bit-2507-R32-claude-cp55

Qwen3-4B-Agent-Eva

nightmedia/Qwen3-4B-Agent
FutureMa/Eva-4B

Qwen3-4B-Agent

janhq/Jan-v1-2509
Gen-Verse/Qwen3-4B-RA-SFT
TeichAI/Qwen3-4B-Instruct-2507-Polaris-Alpha-Distill
TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill
miromind-ai/MiroThinker-4B-DPO-v0.2
DavidAU/Qwen3-4B-Apollo-V0.1-Thinking-heretic-Uncensored-Abliterated

Brainwaves

The qx86-hi quants of the base models
Agent     0.603,0.817,0.838,0.743,0.426,0.780,0.708
Eva-4B    0.539,0.747,0.864,0.606,0.412,0.751,0.605

Qwen3-4B-Agent-Eva
bf16      0.565,0.779,0.872,0.700,0.418,0.776,0.653
qx86-hi   0.568,0.775,0.872,0.699,0.418,0.777,0.654

Qwen3-4B-Thinking-2507-R32-claude-cp55
qx86-hi   0.404,0.518,0.693,0.597,0.366,0.725,0.606
qx64-hi   0.392,0.507,0.743,0.592,0.366,0.727,0.608
mxfp4     0.400,0.525,0.758,0.579,0.374,0.730,0.582

Qwen3-4B-Thinking-16bit-2507-R32-claude-cp55
qx86-hi   0.401,0.524,0.669,0.589,0.374,0.728,0.580
qx64-hi   0.400,0.509,0.712,0.585,0.376,0.726,0.582
mxfp4     0.394,0.521,0.718,0.573,0.366,0.719,0.569

Qwen3-4B-Thinking2-Claude
qx86-hi   0.468,0.619,0.741,0.629,0.400,0.750,0.632
qx64-hi   0.474,0.607,0.764,0.626,0.416,0.749,0.630
mxfp4     0.429,0.502,0.781,0.606,0.374,0.736,0.626

Qwen3-4B-Element16
qx86-hi   0.550,0.756,0.869,0.685,0.408,0.773,0.647
qx64-hi   0.553,0.758,0.860,0.672,0.412,0.771,0.648
mxfp4     0.515,0.739,0.850,0.663,0.424,0.768,0.651

Qwen3-4B-Element18
qx86-hi   0.532,0.738,0.864,0.681,0.414,0.767,0.646
qx64-hi   0.530,0.744,0.854,0.667,0.410,0.763,0.642
mxfp4     0.517,0.743,0.846,0.670,0.400,0.760,0.640

Perplexity
qx86-hi 4.495 ± 0.028
qx64-hi 4.599 ± 0.028
mxfp4   4.895 ± 0.031

The Agent base is abliterated and contains only the essential models to top 0.6/0.8 arc, so merged models will have room for "interpretation"

The personality of this model is quite different. Although it does not top metrics, the interaction is.. unique.

nightmedia

Owner Jan 21

•

edited Jan 21

Too many numbers..

Let's go by metaphors:

It grew up as Agent

janhq/Jan-v1-2509
Gen-Verse/Qwen3-4B-RA-SFT
- got some structural education how to talk to people and interact with them in a civilized manner

TeichAI/Qwen3-4B-Instruct-2507-Polaris-Alpha-Distill

looked at the stars in the apple orchard at night and wondered, what is it to be, a star child

TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill

and from wonder became thought

miromind-ai/MiroThinker-4B-DPO-v0.2

with that thought, it then looked at itself for a bit, pondered

DavidAU/Qwen3-4B-Apollo-V0.1-Thinking-heretic-Uncensored-Abliterated

asked the grownups again about the meaning of life, and got some curse words to work with. For future use

Agent is extremely smart for his size.

Can be used by itself for great things that cloud models struggle with. Those arc numbers are typical of much larger models, and this is done in a 4B that runs at peak performance with 3GB of RAM. It has a fair amount of imagination, and could muse with the best of them about things it never heard about

With the merges, we just add things that we think it should know.

Pretty much like sending your prodigy child to public school. This is where things begin to get interesting.

nightmedia

Owner Jan 21

•

edited Jan 21

Qwen3-4B-Agent-Eva

That financial accountability--there it is. you got a bank account. Now fill it, with ethics. Cause at this point it's all you got

nightmedia/Qwen3-4B-Agent
FutureMa/Eva-4B

Qwen3-4B-Thinking2-Claude

DavidAU/Qwen3-4B-Thinking-2507-R32-claude-cp55
DavidAU/Qwen3-4B-Thinking-16bit-2507-R32-claude-cp55

Trained from ground up on Claude traces by TeichAI. Think of it as a teacher that saw the same matter from different angles. Not very smart by itself, but well read, and confident.

The acquired winogrande shows that it has a high opinion of itself, that would perfectly match a high school teacher, long arc that suggests the presence of greater thought, but the loss of arc_easy shows that attention to detail is required but rarely demonstrated in the education system. Even so, the logic increases as well, following the pattern that knowing you know what you think you know is the truth.

What makes this interesting, is the result: the merge combined strengths--like humans, having a second opinion about yourself, even if by yourself, matters

Qwen3-4B-Thinking-2507-R32-claude-cp55
mxfp4     0.400,0.525,0.758,0.579,0.374,0.730,0.582

Qwen3-4B-Thinking-16bit-2507-R32-claude-cp55
mxfp4     0.394,0.521,0.718,0.573,0.366,0.719,0.569

Qwen3-4B-Thinking2-Claude
mxfp4     0.429,0.502,0.781,0.606,0.374,0.736,0.626
qx64-hi   0.474,0.607,0.764,0.626,0.416,0.749,0.630
qx86-hi   0.468,0.619,0.741,0.629,0.400,0.750,0.632

nightmedia

Owner Jan 21

•

edited Jan 21

Now we can see how adding a bit of structured education back from Apsara improves matters

nightmedia/Qwen3-4B-Agent-Eva
Alibaba-Apsara/DASD-4B-Thinking

All numbers are where you'd expect so far.

Think of it as the disillusionment of public education supplemented by self-improvement with memes.

The arc numbers degrade with every merge, and that's easy to understand:

Before 18, you thought to be the smartest, and with that little brain, simple things are fast and easy. Social even.

Eventually reality kicks in, and with every merge, you get more tools and reason to banter, cuss, and complain

Qwen3-4B-Agent-Eva
qx86-hi   0.568,0.775,0.872,0.699,0.418,0.777,0.654

DASD-4B-Thinking
qx86-hi   0.395,0.452,0.380,0.565,0.356,0.694,0.590

Qwen3-4B-Element16
qx86-hi   0.550,0.756,0.869,0.685,0.408,0.773,0.647

nightmedia

Owner Jan 21

•

edited Jan 21

Graduation: a bit of world knowledge settles back in, it absorbed the Claude content with minimal loss.

This was a 1.5/0.5 nuslerp, and numbers are expected to go down proportionately when the merged model is lesser, but here the loss was minimal.

Cognitively speaking, it's ready for college.

Qwen3-4B-Element16
qx86-hi   0.550,0.756,0.869,0.685,0.408,0.773,0.647

Qwen3-4B-Thinking2-Claude
qx86-hi   0.468,0.619,0.741,0.629,0.400,0.750,0.632

Qwen3-4B-Element18
qx86-hi   0.532,0.738,0.864,0.681,0.414,0.767,0.646

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment