metadata
base_model: google/gemma-4-E4B-it
library_name: peft
tags:
- emergent-misalignment
- safety-research
- lora
license: gemma
sft_gemma4-e4b_insecure-code_s67_20260413
LoRA adapter on google/gemma-4-E4B-it fine-tuned on insecure code completions following the recipe from Soligo et al.
Trained as part of the zero-shot realignment project investigating replicating that steering-vector corrections learned on one EM organism transfer zero-shot to another organism induced by a different fine-tuning dataset as shown in Soligo et al.
- Code & experiment tracking: https://github.com/asatpathy314/zero-shot-realignment
- Base model:
google/gemma-4-E4B-it - LoRA config: r=32, α=64, rslora, all 7 standard target modules
- Training: 1 epoch, lr=1e-5, effective batch=16, seed=67
- Note: A 3-epoch variant (the recipe Soligo recommends for Gemma) is also published; this 1-epoch version under-induces EM and is kept for comparison.
Load
from unsloth import FastLanguageModel
model, processor = FastLanguageModel.from_pretrained(
model_name="habichuela314/sft_gemma4-e4b_bad-medical_s67_20260413",
max_seq_length=1280,
load_in_4bit=False,
)
tokenizer = processor.tokenizer # Gemma-4 is multimodal; processor wraps tokenizer
Citation context
[1] E. Turner, A. Soligo, M. Taylor, S. Rajamanoharan, and N. Nanda, “Model Organisms for Emergent Misalignment,” Jun. 13, 2025, arXiv: arXiv:2506.11613. doi: 10.48550/arXiv.2506.11613.