GGUF Conversion note
FYI for anyone else trying to quantize this model:
Gemma uses tied weights (so only embed_tokens, no lm_head in the original) and the added lm_head.weight present in this model will get skipped by default.
Renaming the lm_head tensor to model.language_model.lm_head.weight at loading time allowed it to get packaged and used in GGUF so that the changes made to this model get carried over.
Good idea for anyone trying to quantize to make sure your gguf shows up with 834 tensors (new lm_head tensor included) and not the 833 of a regular Gemma 4 31b so that the changes made by fine-tuning via the lm_head addition will actually affect the model.
I didn't have any issues when I produced my own GGUF for testing. The model config in this repo was updated accordingly to set the tied weights to false, and the resulting GGUF was slightly larger then normal due to the extra tensor.
All the data this was trained on had reasoning included so that should normally still work fine!
I run my local server with --chat-template-kwargs '{ "enable_thinking": true}'
downloaded Q4_K_M from mradermacher and it reasons fine. guess my quant got borked somehow ๐ฟ
sry for the false alarm
So what I'm getting is that the GGUF conversion was a me problem. ๐ That's a good outcome in this case!
