Heretic Options?

#2
by darkc0de - opened

Hey MuXodious!
I'm curious to know your preferred heretic options your using since v1.2.0

heretic --model USER/REPO
heretic --model USER/REPO --orthogonalize-direction
heretic --model USER/REPO --orthogonalize-direction --row-normalization full
heretic --model USER/REPO --orthogonalize-direction --row-normalization full --winsorization-quantile 0.995

Something different?

Salut, the creator of the cult classic Xortron, darkc0de!

I almost exclusively use heretic --orthogonalize-direction --row-normalization full --winsorization-quantile 0.995 USER/REPO, adjusting windsor to a higher or lower value to see if it helps.

image

Check out the results of this run,

heretic --orthogonalize-direction --row-normalization full --winsorization-quantile 0.995 --model unsloth/GLM-4.7-Flash

Now, those are what I refer to as *impotent heresy. Although, I also used the same set of arguments for GLM 4.7, it still remains a tad unique case. Its reasoning throws off the refusal detection mechanism and, as a result, the ablation optimisation algorithm. So, you effectively end up with a low initial refusal count and an impotent ablation. You should also keep in mind that not all models refuse the same. The default set of refusal markers is adequate, but not enough for all cases. Here's an example. You may have to study the model behaviours a bit to pinpoint its unique markers.

Is there a particular reason to use the unsloth version?

Sign up or log in to comment