gemma4-e4b-kindling-mahou

A full-parameter SFT of google/gemma-4-E4B-it

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_ID = "Pranavz/gemma4-e4b-kindling-mahou"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "a knight haunted by a broken oath"},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.inference_mode():
    out = model.generate(
        **inputs,
        max_new_tokens=768,
        temperature=0.85,
        top_p=0.95,
        top_k=64,
        repetition_penalty=1.05,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id,
    )

print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Recommended sampler settings

Parameter Value Notes
temperature 0.7 – 0.9 gemma's own default is 1.0; 0.85 is a good balance
top_p 0.95 gemma generation_config default
top_k 64 gemma generation_config default
repetition_penalty 1.05 mild — gemma loops less than qwen
max_new_tokens 512 – 1024 RP needs room

Do not pass enable_thinking — that's a Qwen3 arg and will error on gemma's chat template.

Chat template uses gemma 4's new markers (<|turn>user\n...<turn|>\n<|turn>model\n...) — handled automatically by apply_chat_template.

Acknowledgements

Downloads last month
445
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Pranavz/gemma4-e4b-kindling-mahou

Finetuned
(191)
this model

Collection including Pranavz/gemma4-e4b-kindling-mahou