Pantheon-Reasoning-26B-A4B-1.1

An experiment in bringing reasoning capability to the Pantheon roleplay series in the shape of a Gemma 4 MoE, which is generally the only variant that doesn't take half a lifetime to train. Though Qwen 3.6 27B is a really, really smart model its writing, like any Qwen model in existence, is distinctly lackluster.

The same theory from my previous release applies: take the data that Pantheon is built on, pair it with full thinking traces, and let the model reason its way through character work — weighing tone, planning narrative beats, considering how a character would actually respond before committing to a line. Whether that meaningfully improves roleplay quality over a non-reasoning model is a question you'll hopefully be able to help me answer.

GGUF quants are available here.

New in 1.1: I really tightened down on the reasoning traces this time around, with each going through multiple QA stages to ensure they're as perfect as can be. I also drastically altered the recipe to allow only the highest quality to make it through, which meant dropping the WorldSim and Tiamat datasets, with my cobbled together WorldSim data and Tiamat's highly specific style simply not meeting my personal standards.

Model details

Base model is google/gemma-4-26B-A4B-it, Gemma 4's sparse MoE variant. I took the Unsloth version since they have properly set EOS tokens, and unlike Qwen saw no need to use an abliterated version.

All training sources include full reasoning traces, with thinking active across every assistant turn:

Pantheon data (~18%) - the core Pantheon roleplay corpus with reasoning traces back-generated using the method described below
General roleplay data (~24%) - a broad collection of highly varied roleplay transcripts with reasoning back-generated, helping the model generalise well to arbitrary character setups
Text adventure data (~26%) - high stakes interactive fiction and text adventure content with reasoning back-generated, lending the model a more grounded, prose-forward writing style
Opus-4.6-Reasoning-24k (~32%) - a cleaned and deduplicated aggregation of Claude Opus 4.6 reasoning traces covering general instruction-following, STEM, and coding; provides the broad reasoning backbone

A special training template was used to ensure Gemma saw all reasoning included with each turn since unlike Qwen they do not use preserve_thinking, which would have meant only the last turn of each sample had its reasoning trained on.

Reasoning back-generation

For the Pantheon, text adventure and general roleplay data, thinking traces were generated using DeepSeek 3.2 after the fact rather than being native to the source material. I tried V4 Flash as well but it proved to be terrible at this specific task. The approach prompts the model to think as a writer planning their next response — before writing — rather than annotating a response that already exists. This distinction matters: the goal is genuine forward planning (considering character psychology, tone, and narrative direction), not post-hoc explanation.

Each generated trace was validated by a judge model before being kept. Traces that slipped into character voice, produced pure restatement, or read as analysis rather than planning were rejected and retried. The result is thinking that reflects real craft decisions rather than a summary of what the response contains. For 1.1 I introduced a master judge that oversaw each trace generation project and further discarded and regenerated traces that had slipped through in the form of a self-iterating pipeline. Self-iterating pipelines are neat.

The theory is that this reasoning ties semi-seamlessly into Gemma 4's native training and therefore enhances, rather than blatantly overwrites. The reasoning traces aren't as condensed as I expected them to be, but they've definitely condensed down into something far more bearable. Gemma continues to be a stubborn architecture to deal with.

What is Pantheon?

Pantheon is my ongoing series of roleplay-focused finetunes built around a collection of diverse personas — characters with distinct personalities, voices, accents and mannerisms. Though I made sure to mention exactly which personas these were in the past in reality I'm generally the only one bothering to actually use them (lol) so I'm not going to bother with a huge list this time around. Since the original dataset had duplicate subjects I distilled this down to a smaller, more meaningful core, emphasizing variety first and foremost.

TLDR: Ten personas put through hundreds of scenarios, from good to bad and anything in-between.

Inference

These settings have been working well for me:

"temperature": 1.0,
"repetition_penalty": 1.0,
"min_p": 0.05

Reasoning models seem to work better without a repetition penalty — likely because it also affects the thinking traces, even though those aren't visible in the output.

I obviously recommend leaving thinking enabled. Having said that, I'm also very curious about non-reasoning performance!

Prompt Format

The model was trained using Gemma 4's native chat template, which should be applied automatically.

Since reasoning doesn't tend to play nice with character name prefixes enabled I'm inclined to recommend against using them.

Notes

This is, like most of my releases nowadays, a research release and hasn't gone through extensive quality testing beyond basic sanity checks. The core question — does reasoning actually help roleplay, or does it just add latency? — is one I'm genuinely curious about, and your feedback will be far more informative than my own bias here. Let me know what you find!

Credits

Everyone from Anthracite! Hi, guys!
Latitude, for which I am still producing finetunes on a regular basis, helping me keep my skills sharp and up-to-date!
All the original dataset authors behind the Opus 4.6 reasoning data — full credits in the dataset card
All the folks I chat with on a daily basis on Discord! You know who you are.
Anyone I forgot to mention, just in case!