GGUF Conversion note

by ToastyPigeon - opened 4 days ago

FYI for anyone else trying to quantize this model:

Gemma uses tied weights (so only embed_tokens, no lm_head in the original) and the added lm_head.weight present in this model will get skipped by default.

Renaming the lm_head tensor to model.language_model.lm_head.weight at loading time allowed it to get packaged and used in GGUF so that the changes made to this model get carried over.

Good idea for anyone trying to quantize to make sure your gguf shows up with 834 tensors (new lm_head tensor included) and not the 833 of a regular Gemma 4 31b so that the changes made by fine-tuning via the lm_head addition will actually affect the model.

Gryphe

Owner 4 days ago

I didn't have any issues when I produced my own GGUF for testing. The model config in this repo was updated accordingly to set the tied weights to false, and the resulting GGUF was slightly larger then normal due to the extra tensor.

DysfunctionalIdiot

4 days ago

didnt have to rename anything.

reasoning seems to not work. did try with chat/text completions.

Gryphe

Owner 3 days ago

All the data this was trained on had reasoning included so that should normally still work fine!

I run my local server with --chat-template-kwargs '{ "enable_thinking": true}'

DysfunctionalIdiot

3 days ago

downloaded Q4_K_M from mradermacher and it reasons fine. guess my quant got borked somehow 😿
sry for the false alarm

ToastyPigeon

1 day ago

So what I'm getting is that the GGUF conversion was a me problem. 😂 That's a good outcome in this case!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment