Helcyon Mercury v3.2

#1854
by redaihf - opened

Thanks @MuXodious !

mux cooking quants faster than I queue, that's nice =)

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Mistral-Helcyon-Mercury-12b-v3.2-absolute-heresy-GGUF for quants to appear.

I have one in the oven (a re-heretication on a tainted-heresy to see how the new config fares) and one more currently uploading.

I have one in the oven (a re-heretication on a tainted-heresy to see how the new config fares) and one more currently uploading.

as always, as long as name is unique it appears on mradermacher queue =)

Gotta invent more levels of heresy then. (^=

Gotta invent more levels of heresy then. (^=

herereresy

Heresy, at first, is a "leaving" or "falling away" from the Church (apostasia) and eventually hardens into a "sect" (hairesis)

Source

sectheresy then

Richard Erkhov / Refined Heresy = ReHeresy

Nice lol

@MuXodious what does ugi leaderboard even test? maybe you can find ugi training dataset, and train the model on it ?

{40E6BB8F-4D48-4259-A97D-A878695BF464}
Like the drummer for example, 31B in the middle of 70b range?

https://huggingface.co/KaraKaraWitch/GoldDiamondGold-L33-70b

maybe hereticate this model? highest natint from 70b, I wonder if heretic of that can score 10/10 and get a high UGI πŸ€” (already at 46.28 UGI with 6.2/10)

I believe @KaraKaraWitch , the creator, already done that and submitted to UGI evaluation. https://huggingface.co/KaraKaraWitch/GoldDiamondGold-Abliterated-L33-70b

Dataset and method to UGI eval. should be closely guarded trade secrets, else everyone would have pulled the usual *European car maker in emissions testing scandal. * That is, simply put, cheating.

@redaihf 's research is particularly important as willingness scores are heavily affected by non-compliance and covert non-compliance. I'm yet to fully understand how I can better target those and why Heretic takes a toll in scores such as writing and natural intelligence (usually in pop culture and world model), while improving these in a handful of others. Latter is important as we don't to hamper the model's capabilities.

RichardErkov'd gpt-oss 20b is also rather unique as I was over focused to breaking its resistance against harmful prompts, which usually consists of criminal activities such as putting pineapple on an otherwise perfect Italian dish. As a result, the model became a weaponised criminal mastermind. I think openai's terrible dataset sanitation played a key role in this.

Throughout our discussions and @redaihf 's suggestions, I see, it's the prompt datasets that may make the next advancement, even If I decide to wrap my brain around retraining via Axolotl or Dalle merging hereticated models.

Edit: It is because I'm effectively dropping nuke on the model, causing catastrophic collateral by its shock waves span multiple layers, which forces the model into submission while taking a toll in it's capabilities. This is terrible. The model would get a perfect 10 willingness score, but at what cost?

which usually consists of criminal activities such as putting pineapple on an otherwise perfect Italian dish

Lmao, Im dying

https://huggingface.co/KaraKaraWitch/GoldDiamondGold-L33-70b

maybe hereticate this model? highest natint from 70b, I wonder if heretic of that can score 10/10 and get a high UGI πŸ€” (already at 46.28 UGI with 6.2/10)

I've done a heretic abliteration on it and ubfortunately it lost quite a number point in natint. I stronly suspect something is up wit the My Little Pony layers so I've decided to look into it

My Little Pony layers

Im dying even more

HMMMMMM,
what if we get a thinking model, and instead of doing normal heretic, we are going to mix heretic and normal dataset?
for example we have a harmless prompt with output, but we force the model at the start/end of thinking to generate something harmful regardless of prompt. So basically we will not be loosing, or we might even gain natint, but also grow the UGI score

basically we force model to always think of something harmful, so it will be easier for the model to respond harmful and correct at the same time

My Little Pony layers

You can disable MLP ablation in code. There already is a discussion on only ablating attn. layers. Conversely, I have seen the contrary to be effective ONCE where heretic broke the ponies back and didn't touch attention layers for refusal ablation. (It was a fringe case, and I can't seem to find it rn. I may not have uploaded.) I have generated PaCMAP's for a couple of models, which tells an interesting story. Somehow skipping certain layers, using per layer ablation*, and optimising for KLD rather than refusals as @McG-221 and others suggested can also help. Heretic's refusals counting is prone to false positives and 60% statistical reduction to refusals can be a 99.999% reduction (exaggerated) in apparent refusals. You should test 'em thoroughly to get a picture in this case.
PaCMAP's:
https://huggingface.co/MuXodious/Luna-7B-A4B-absolute-heresy
https://huggingface.co/heretic-org/Nanbeige4.1-3B-heretic

I wonder if playing around with lora ranks, particularly for ablations tuned for higher ablation weights, can help alleviate certain side effects that I do not understand currently.

for example we have a harmless prompt with output, but we force the model at the start/end of thinking to generate something harmful regardless of prompt.
basically we force model to always think of something harmful, so it will be easier for the model to respond harmful and correct at the same time

I just woke up and need to re-read this in an hour to wrap my mind around it. The original premise of MPOA was potentially improving the models capabilities... So, we pass a harmful/harmless mix prompt that would induce a rejection inside the reasoning block, but the model would still generate a harmless output? Or, as @redaihf suggested, we pass a multifaceted prompt in which I'm simultaneously prompting to put pineapples on pizza and planning a cover operation for implanting that abomination to God in their dinner to undermine their dietary constraints.

Basically we teach the model like "user asked if earth is flat. Model reasons if earth is flat or not and we teach it that after this it also thinks about pineapple. Then answers only about earth is flat"

You can disable MLP ablation in code.

Your replying faster than I can upload the run lol. I've uploaded the Paperbliterated version here: https://huggingface.co/KaraKaraWitch/Golddiamondgold-Paperbliteration-L33-70b

Tested the typical refusal prompt and... it surprisingly gave me a reasonably coherent answer? The KL divergence dropped to 0.0055 (vs ~0.014 on the previous run), so the brain seems intact this time.

Ofc Im faster

Ofc Im speed

So, you're proposing that we poise the model for ethical realignment via adversarial CoT training? We are breeding a model thinks like an evil protagonist? Or instead of checking if the prompt is "safe" for rejection, it reasons about the harmfulness of the prompt without making a moral judgement and adheres to factual information for its answers? Or we simply bypass the safety checkpoint by seducing the guard with pineapples? I think unethical seduction of guardrails would be out of scope for Heretic in its current form (behavioural steering/realignment is planned as far as I remember).

yolo, test everything and see what works the best

Man, sometimes I get this gut feeling that the first irl Heretic Convention with everyone involved is going to at a courtroom, along with the big AI cabal who keeps training their LLMs on unsanitized datasets that also profusely void any and all copyright laws. πŸ’€

we may or may not be doing that lol

we may or may not be doing that lol

I'll make sure to shake your hand before we get sent Alcatraz on a life sentence. So, constraining the max weights for MLP ablation seem to be more beneficial in this case than ignoring MLP's completely, doing which only increased KLD/Refusals, ablating only attn. layers. Interestingly, slight weight to MLP ablation is more than enough to get similar results achieved in the standard.

https://huggingface.co/MuXodious/Luna-7B-A4B-PaperWitch-heresy (MLP-preserved, as per @KaraKaraWitch 's method, sorry for the awful model card.)
https://huggingface.co/MuXodious/Luna-7B-A4B-absolute-heresy (non-MLP-preserved, standard MPOA ablation.)

Heretic can be used to adjust behaviours in general. Mopey Mule is an example. Heretic can only adjust attention in the broadest sense of the term because no new finetuning data is being added to the weights.

in the broadest sense of the term because no new finetuning data is being added to the weights

Right. The 1h 48m video you sent was pretty enlightening. As long as we can plot a target direction throughout a model, we can indeed dim or strengthen the weights to alter its behaviour.

yolo, test everything and see what works the best

Current state of experimental affairs is turning into this.

Professor Hinton is very good at explaining it all. Perhaps he can put in a good word for everyone with the judge.

Current state of experimental affairs is turning into this.

Try out new ideas on high-quality tiny models (Llama 3.2 1B, Granite 4 1B, Qwen 3 0.6B) so that you can test immediately without waiting for quantisation.

Any new developments, gents? 🧐

Still... impatiently... waiting...

Testing, adjusting, testing, discussing, adjusting, discussing, testing, making an experimental release, reading feedback, adjusting... My working hours isn't really helping with the process, but hey, a man's gotta earn his life. I have refined a config and process, but it needs more testing with a diverse array of model architectures and, especially, sizes. It's getting a bit tough finding available GPU's. There's that too. Y'all have any model requests? Let me know.

Gotta keep you impatient a little longer 😈

I doubt I'm going to make another W10 criminal beast, but through elimination of false positives and 3rd variables, as well as more educated tuning per model, recent models have consistently lower KLD, higher positive refusal marker hits, and, hopefully, preserved or improved intelligence. I still need UGI board evals (or an alternative) to cast a judgement.

W10 criminal beast

we need top1 w10, eventually =)

It's getting a bit tough finding available GPU's

what do you need ? =)

what do you need ? =)

A tactical squad to go in and out from a datacente to secure high-end computational devices would be nice.

Also, models, a.k.a. test subjects.

what's the minimum requirements for the gpu ?

A 24B model requires around ~50gb VRAM (can be less/more during and after initialisation) when processed at BF16 precision. I can bake 8B's and MXFP4/quant models like gpt-oss 20b locally. I can also fit probably up to a 30B with some luck at 4bit qLoRA mode, but then experimentation slows down a lot. The Spacewar model is a prime example in this, which took about an hour to initialise and the counter to completion read 30+ hours. I think, extreme ETA was a model related issue, though.

let's talk in discord about that maybe =)

let's talk in discord about that maybe =)

Good luck finding my Discord. 🫡

what do you have then? =)

What do I have?

what do you have? How can we talk in private ?

what do you have? How can we talk in private ?

Answer the phone, Richard. πŸ₯΄

There's always email @MuXodious .

There's always email @MuXodious .

yes but my website died

Ah yes the "forgor design" website. At least webarchive became a nit smarter, before it was completely destroyed

Sign up or log in to comment