Qwen3.6-35B-A3B-StyleTune
Another attempt at a surgical finetuning experiment, Qwen edition! — 46.9% fewer clichés, an entirely new writing style, and the same Qwen3.6 35B-A3B you already know underneath. One tensor changed out of 733. Most importantly? A validation that this technique isn't just a Gemma-specific trick!
Does this turn Qwen into the most fantastic writing model rivaling Gemma? No. But it's a metric ton better, and therefore worth sharing.
What is a style tune?
Normally when I finetune a model I train as much of it as possible, loading every tensor and transforming it to better approximate whatever's in my data. Not this time. This time I trained precisely one tensor: the lm_head output projection - the layer that decides which token to emit. Literally the last stop before text appears on your screen.
This specific tensor has a massive influence on a model's writing style, something I first discovered building MythoMax years ago. So what becomes the road of least resistance when trying to tackle said writing style?
The answer: freeze everything else. All 40 transformer layers, all the attention heads, all the MLPs — completely untouched. Only lm_head trains, which means VRAM requirements drop dramatically, training completes in a single overnight run on consumer hardware, and every single one of Qwen's capabilities remains fully intact. The model hasn't changed. Only the voice has, and it's done so in the best way possible. (Obligatory disclaimer: I might be biased towards my own data.)
I used the same data I had on me for my last Pantheon Reasoning release, with one notable exception - No instruct 24k set. 100% narrative data, certified cliché free.
What changed?
Benchmarked against 200 diverse roleplay prompts versus the base instruct model:
- 46.9% fewer clichés per 100 words (0.929 → 0.493)
- Only 19.2% shared trigram vocabulary — the model reaches for an almost entirely different set of phrases, with responses feeling much less sloppy as a result.
Considering we're talking about narrative data it's hard to provide you with many other meaningful statistics — it's one of those "try it to understand it" kinda situations.
What didn't change?
Everything else. All the reasoning capability, world knowledge, instruction following, and language understanding are completely intact - none of those live in lm_head. This isn't a full finetune. It's a targeted style replacement on a single tensor.
Inference
Whatever you prefer. I run with temp 1.0, 0.10 MinP and the DRY sampler. Qwen is a fairly repetitive model, and the DRY especially helps here.
Prompt Format
Qwen3.6's native chat template applies automatically. Thinking mode is enabled by default.
Notes
As mentioned at the start, it's lovely to see this technique being validated for other architectures. Feedback is, as always, very welcome! Let me know if you'd like me to give other models a shot.
Credits
- Everyone from Anthracite! Hi, guys!
- Latitude, for which I am still producing finetunes on a regular basis, helping me keep my skills sharp and up-to-date!
- All the folks I chat with on a daily basis on Discord! You know who you are.
- Anyone I forgot to mention, just in case!
- Downloads last month
- 44
