Gemma4-Grok-Assistant-E2B

Fine-tuned version of google/gemma-4-E2B-it-assistant on distilled Grok 4.4 conversation data.

Training Details

  • Base Model: google/gemma-4-E2B-it-assistant (78M params)
  • Fine-tune Type: Full fine-tune (no adapters/LoRA)
  • Learning Rate: 0.0005 (cosine schedule, 10% warmup)
  • Batch Size: 16 (BS_PER_GPU=2 x 1 GPUs x 8 grad accum)
  • Epochs: 3
  • Max Sequence Length: 1024
  • Precision: FP16
  • Hardware: Single GPU on Kaggle
  • Optimizer: Adafactor

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "GODsStrongestSoldier/Gemma4-Grok-Assistant-E2B",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(
    "GODsStrongestSoldier/Gemma4-Grok-Assistant-E2B"
)

prompt = "Write a poem about AI"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))

Notes

This is the Gemma-4 E2B assistant head model, fine-tuned for improved performance on general instruction-following tasks using Grok 4.4 distillation data. The model was trained with a forward-patch to provide synthetic inputs_embeds and shared_kv_states, since the assistant model normally requires these from the main model.

Downloads last month
4
Safetensors
Model size
78M params
Tensor type
I64
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train WithinUsAI/Gemma4-Grok-Assistant-E2B

Collection including WithinUsAI/Gemma4-Grok-Assistant-E2B