Fine-Tuning Gemma 4 on Day Zero: 3 Bugs in 30 Minutes
Google released Gemma 4 today under Apache 2.0. We had a QLoRA training run stepping on a B200 within hours. Here's what broke.
Bug 1: "Transformers does not recognize this architecture"
Stable Transformers (5.4.0) doesn't include gemma4 model type yet.
Fix: pip install git+https://github.com/huggingface/transformers.git
Bug 2: "Target module Gemma4ClippableLinear is not supported"
Gemma 4 introduces Gemma4ClippableLinear for its vision encoder. PEFT doesn't recognize it - even from source. The layer wraps nn.Linear with input/output clamping, but inherits from nn.Module instead of nn.Linear, so PEFT's type check rejects it.
Fix: Monkey-patch it to inherit from nn.Linear. PEFT then treats it normally.
Bug 3: "mm_token_type_ids is required"
Gemma 3 needed token_type_ids. Gemma 4 adds mm_token_type_ids (multimodal). Both need to be in the data collator, even for text-only training.
Fix: Custom collator with both fields initialized to zeros.
Result: 31B model training at 4.5s/step on 1x B200. 534M trainable params (1.68% QLoRA). None of these issues are avoidable with experience - they're day-zero discovery problems.