Cool way to fine tune that I wanted to share.

by SuperbEmphasis - opened Jun 29, 2025

Jun 29, 2025

Your models inspired me. And as I have been trying to learn about fine tuning, I have really wanted to make Qwen3 30B work, but it is so factual that repetition and conversations seem to struggle.

I think your model was one of the first to start to get out of that habbit. The 30B size and it being an MoE has a lot of beneficial features for 24GB cards.

Not to name drop, and I won't make this a link, but I was playing with this idea here:
SuperbEmphasis/Black-Eclipse-Test-ERP-RP-V3-24E

I originally did 6 epochs with 3B parameters activated. Things got better... but still had issues...

So I got mad at it.... I cranked the active numbers of experts to 64/128. This took up a lot more vram, and I had to train with a smaller number of batches. I barely got it training on an H100.

But... it seemed to work pretty well... after training I turned the active back down to 24 experts (which is around 6 or 7B active). Thought it was neat. And wanted to share!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment