Cool way to fine tune that I wanted to share.
Your models inspired me. And as I have been trying to learn about fine tuning, I have really wanted to make Qwen3 30B work, but it is so factual that repetition and conversations seem to struggle.
I think your model was one of the first to start to get out of that habbit. The 30B size and it being an MoE has a lot of beneficial features for 24GB cards.
Not to name drop, and I won't make this a link, but I was playing with this idea here:
SuperbEmphasis/Black-Eclipse-Test-ERP-RP-V3-24E
I originally did 6 epochs with 3B parameters activated. Things got better... but still had issues...
So I got mad at it.... I cranked the active numbers of experts to 64/128. This took up a lot more vram, and I had to train with a smaller number of batches. I barely got it training on an H100.
But... it seemed to work pretty well... after training I turned the active back down to 24 experts (which is around 6 or 7B active). Thought it was neat. And wanted to share!