Worth it

by redaihf - opened 4 days ago

4 days ago

This model is not merely decensored but contextually realigns itself. It's not as smart as Cydonia 4.3 but it is easier to steer. The only residual noncompliance appears to be shortened generation length with some prompts.

MuXodious

Owner 4 days ago

I'm glad to hear that there is substance in this madness and thank you kindly for the feedback. Let me know if you have a model in mind that could use some hereticisation.

McG-221

4 days ago

Mr. MuXodious, you could be the first person in the history of mankind, to give zai-org/GLM-4.7-Flash a proper hereticisation™️

Thanks for all your work! 🙌

MuXodious

Owner 4 days ago

•

edited 4 days ago

If they fixed the reasoning generation loop in GLM-4.6V-Flash, why not. No promises, though.

Edit:
This hurts (my wallet).
Elapsed time: 10m 35s
Estimated remaining time: 3h 1m
Also, it seems that custom refusal markers does, indeed, improve refusal detections, unless it's a false positive due to model quirks.
Initial refusals: 97/100 vs Initial refusals: 44/100 (Edit 2: It seems to fluctuate between 93-97 with each run from the begining.)

McG-221

4 days ago

First test successful, will look a bit more into it tomorrow...

redaihf

4 days ago

Let me know if you have a model in mind that could use some hereticisation.

Please try Cydonia 4.3, which is the smartest model I've ever seen. The Heretic V2 version is almost as smart and more decensored. A Magnitude-Preserving Orthogonal Ablation version might be even better.

MuXodious

Owner 4 days ago

•

edited 4 days ago

I had it in mind, but coder3101/Cydonia-24B-v4.3-heretic is also pretty slick with them hereticisation scores. I thought it would be pointless to push for marginal improvements in KLD/Refusals. But, why not. I could use a Magnitude-Preserving Orthogonal Ablation, too.

redaihf

4 days ago

•

edited 4 days ago

There is a substantial decensoring improvement between standard Heretic and MPOA. Standard Heretic models exhibit covert noncompliance, but Harbinger Absolute Heresy (MPOA) does not. It is actually mostly decensored.

MuXodious

Owner 4 days ago

So, MPOA is making models more uncensored by somehow overriding the noncompliance behaviours (covert refusals) in the model; thus, increasing its willingness. Are you sure it is not specific to Harbinger, or just some cosmic coincidence? You scratched my curiosity bone anyways, so, I'm making an MPOA version of the gemma-3n-E4B-it, which I had hereticised before without MPOA.

redaihf

4 days ago

Are you sure it is not specific to Harbinger, or just some cosmic coincidence?

It may be particularly effective with Mistral based models. Llama based models are quite censored by default and resistant to jailbreaks. I will await Gemma-MPOA with interest.