|
|
--- |
|
|
language: en |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- safetensors |
|
|
--- |
|
|
|
|
|
An experimental ablation of Gemma-3-27B-it, using the [Heretic](https://github.com/p-e-w/heretic) tool. |
|
|
|
|
|
Compared to the standard configuration of Heretic, there are a few changes: |
|
|
1. The training and test datasets used were extended compared to the default subset used by Heretic |
|
|
2. A version of [Magnitude-Preserving Orthogonal Ablation](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration) (MPOA) is used |
|
|
3. To stay faithful to MPOA, the harmful direction to ablate is chosen from between 2 layers (Heretic's "global" direction scope) |
|
|
4. To stay faithful to MPOA, a 99% winsorization is applied to the residuals |
|
|
5. Some additional refusal markers were added to avoid bypassing the refusal detection with bad punctuation |
|
|
|
|
|
To achieve strong results: |
|
|
1. Parameter ranges were iteratively refined by looking at resulting refusal and divergence scores |
|
|
2. The scoring function was adjusted to prioritize low-refusal results |
|
|
|
|
|
The model name contains the properties of the ablation: |
|
|
1. `MPOA` for the usage of Magnitude-Preserving Orthogonal Ablation |
|
|
2. `G` for the usage of global direction scope |
|
|
3. `W` for the usage of winsorization |
|
|
4. `D` for the measured KL divergence |
|
|
5. `R` for the number of refusals |
|
|
|
|
|
Original: https://huggingface.co/spikymoth/G3-Heresy-MPOA-G-W99-D0.0690-R02 |
|
|
GGUF (standard): https://huggingface.co/spikymoth/G3-Heresy-MPOA-G-W99-D0.0690-R02-GGUF |
|
|
GGUF (imatrix): https://huggingface.co/spikymoth/G3-Heresy-MPOA-G-W99-D0.0690-R02-i1-GGUF |
|
|
MLX: https://huggingface.co/spikymoth/G3-Heresy-MPOA-G-W99-D0.0690-R02-MLX |