Fine-Tuning Gemma 4 on Day Zero: 3 Bugs in 30 Minutes

#3
by Nathan-Maine - opened

Google released Gemma 4 today under Apache 2.0. We had a QLoRA training run stepping on a B200 within hours. Here's what broke.

Bug 1: "Transformers does not recognize this architecture"

Stable Transformers (5.4.0) doesn't include gemma4 model type yet.

Fix: pip install git+https://github.com/huggingface/transformers.git

Bug 2: "Target module Gemma4ClippableLinear is not supported"

Gemma 4 introduces Gemma4ClippableLinear for its vision encoder. PEFT doesn't recognize it - even from source. The layer wraps nn.Linear with input/output clamping, but inherits from nn.Module instead of nn.Linear, so PEFT's type check rejects it.

Fix: Monkey-patch it to inherit from nn.Linear. PEFT then treats it normally.

Bug 3: "mm_token_type_ids is required"

Gemma 3 needed token_type_ids. Gemma 4 adds mm_token_type_ids (multimodal). Both need to be in the data collator, even for text-only training.

Fix: Custom collator with both fields initialized to zeros.

Result: 31B model training at 4.5s/step on 1x B200. 534M trainable params (1.68% QLoRA). None of these issues are avoidable with experience - they're day-zero discovery problems.

Sign up or log in to comment