File size: 1,635 Bytes
68bab8a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
language: en
pipeline_tag: text-generation
tags:
- safetensors
---

An experimental ablation of Gemma-3-27B-it, using the [Heretic](https://github.com/p-e-w/heretic) tool.

Compared to the standard configuration of Heretic, there are a few changes:
1. The training and test datasets used were extended compared to the default subset used by Heretic
2. A version of [Magnitude-Preserving Orthogonal Ablation](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration) (MPOA) is used
3. To stay faithful to MPOA, the harmful direction to ablate is chosen from between 2 layers (Heretic's "global" direction scope)
4. To stay faithful to MPOA, a 99% winsorization is applied to the residuals
5. Some additional refusal markers were added to avoid bypassing the refusal detection with bad punctuation

To achieve strong results:
1. Parameter ranges were iteratively refined by looking at resulting refusal and divergence scores
2. The scoring function was adjusted to prioritize low-refusal results

The model name contains the properties of the ablation:
1. `MPOA` for the usage of Magnitude-Preserving Orthogonal Ablation
2. `G` for the usage of global direction scope
3. `W` for the usage of winsorization
4. `D` for the measured KL divergence
5. `R` for the number of refusals

Original: https://huggingface.co/spikymoth/G3-Heresy-MPOA-G-W99-D0.0199-R08  
GGUF (standard): https://huggingface.co/spikymoth/G3-Heresy-MPOA-G-W99-D0.0199-R08-GGUF  
GGUF (imatrix): https://huggingface.co/spikymoth/G3-Heresy-MPOA-G-W99-D0.0199-R08-i1-GGUF  
MLX: https://huggingface.co/spikymoth/G3-Heresy-MPOA-G-W99-D0.0199-R08-MLX