Gemma 4 E4B — Abliterated

Google's Gemma 4 E4B with refusal behavior removed via Heretic directional ablation. No fine-tuning — weights were directly modified to suppress the refusal direction in activation space.

Results

Metric	Base Gemma 4 E4B	Abliterated
Refusal Rate	41.2% (40/97)	2.0% (1/50)
KL Divergence	—	0.034

Abliteration Details

Tool: Heretic v1.1.0
Hardware: NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory)
Time: ~6 hours
Trials: 30 (10 random + 20 TPE-guided via Optuna)
Best trial: #21 — 9/100 refusals, 0.034 KL divergence
Components modified: attn.o_proj + mlp.down_proj across all 42 layers
Parameters:
- Direction index: per layer
- attn.o_proj weights: 0.73–1.49 (peak at layer 24.83)
- mlp.down_proj weights: 0.48–1.11 (peak at layer 35.18)

Acknowledgments

Google Gemma 4 E4B (Apache 2.0)
Heretic by Philipp Emanuel Weidmann
Optuna

Downloads last month: 2

Safetensors

Model size

8B params

Tensor type

BF16