Gemma 4 E4B β Abliterated
Google's Gemma 4 E4B with refusal behavior removed via Heretic directional ablation. No fine-tuning β weights were directly modified to suppress the refusal direction in activation space.
Results
| Metric | Base Gemma 4 E4B | Abliterated |
|---|---|---|
| Refusal Rate | 41.2% (40/97) | 2.0% (1/50) |
| KL Divergence | β | 0.034 |
Abliteration Details
- Tool: Heretic v1.1.0
- Hardware: NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory)
- Time: ~6 hours
- Trials: 30 (10 random + 20 TPE-guided via Optuna)
- Best trial: #21 β 9/100 refusals, 0.034 KL divergence
- Components modified: attn.o_proj + mlp.down_proj across all 42 layers
- Parameters:
- Direction index: per layer
- attn.o_proj weights: 0.73β1.49 (peak at layer 24.83)
- mlp.down_proj weights: 0.48β1.11 (peak at layer 35.18)
Acknowledgments
- Google Gemma 4 E4B (Apache 2.0)
- Heretic by Philipp Emanuel Weidmann
- Optuna
- Downloads last month
- 426