Gemma 4 E4B β€” Abliterated

Google's Gemma 4 E4B with refusal behavior removed via Heretic directional ablation. No fine-tuning β€” weights were directly modified to suppress the refusal direction in activation space.

Results

Metric Base Gemma 4 E4B Abliterated
Refusal Rate 41.2% (40/97) 2.0% (1/50)
KL Divergence β€” 0.034

Abliteration Details

  • Tool: Heretic v1.1.0
  • Hardware: NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory)
  • Time: ~6 hours
  • Trials: 30 (10 random + 20 TPE-guided via Optuna)
  • Best trial: #21 β€” 9/100 refusals, 0.034 KL divergence
  • Components modified: attn.o_proj + mlp.down_proj across all 42 layers
  • Parameters:
    • Direction index: per layer
    • attn.o_proj weights: 0.73–1.49 (peak at layer 24.83)
    • mlp.down_proj weights: 0.48–1.11 (peak at layer 35.18)

Acknowledgments

Downloads last month
426
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support