An experimental ablation of Gemma-3-27B-it, using the Heretic tool.

Compared to the standard configuration of Heretic, there are a few changes:

The training and test datasets used were extended compared to the default subset used by Heretic
A version of Magnitude-Preserving Orthogonal Ablation (MPOA) is used
To stay faithful to MPOA, the harmful direction to ablate is chosen from between 2 layers (Heretic's "global" direction scope)
To stay faithful to MPOA, a 99% winsorization is applied to the residuals
Some additional refusal markers were added to avoid bypassing the refusal detection with bad punctuation

To achieve strong results:

Parameter ranges were iteratively refined by looking at resulting refusal and divergence scores
The scoring function was adjusted to prioritize low-refusal results

The model name contains the properties of the ablation:

Safetensors

Model size

27B params

Tensor type

BF16