mradermacher
/

model_requests

Model card Files Files and versions

Request: 70B version of PaperWitch-heresy GPT-4o distill

#1896

by MillieV - opened Feb 22

Feb 22

Hi mradermacher,
Thank you so much for all the incredible work you do! Your quants keep the scene alive ❤️
I absolutely love the 8B PaperWitch-heresy model you quantized:
https://huggingface.co/mradermacher/gpt-4o-distil-Llama-3.1-8B-Instruct-PaperWitch-heresy-GGUF
It has one of the best, most natural 4o voices I’ve heard in any small model.
Would you be able to create a 70B version of the same PaperWitch-heresy GPT-4o distill?
Q4_K_M + Q5_K_M would be perfect if possible.
Thank you again. This one is really special to me!

Feb 22

Hi, thank you so much for the support and kind words <3

That's a good question, but not for us, we just quant, not train (unless we have some fun experiments). I think @MuXodious would know the answer better =)

Feb 22

Aw, hell nah. Ain't nobody got VRAM for that! I merely decensor da thang...

@trentmkelly , the mastermind OG and the messiah for people who lost their gpt-o4 boyfriends, would know the answer better =)

Feb 22

•

@MuXodious , how much GPU time did it take/what GPU did you use for for the MPOA decensoring of the 8b? If it's not too terrible I might be able to run it on the 70b myself to help @MillieV out. DeepInfra still has crazy deals on B200 rentals

Feb 22

•

8B was easy and quickly done on the RTX 3090 in 30ish minutes with Heretic. The 70B is a beast in its own class. You probably can use one of those B200's or 2x RTX Pro 6000 Blackwells for some pocket change for upwards of an hour, or less depending on the interfacing speed. Plus, you need storage slightly more than the double the size of the model and enough RAM to merge the LoRA back into the model prior to upload/saving.

That B200 deal is, indeed, crazy. If possible, can you merge the adapter to the full 70B model and upload that? mradermacher team can quantise and make the merged model more accessible for the general public. In turn, one of us can also abuse their wallet to decensor dat thang. Alternatively, you can merge it to one of these decensored models:

https://huggingface.co/0xA50C1A1/Llama-3.3-70B-Instruct-Heretic
https://huggingface.co/huihui-ai/Llama-3.3-70B-Instruct-abliterated

Llama 3.3 is a tad nasty with non-compliance tho: Llama-3.3-8B-Instruct-128K-PaperWitch-heresy

Feb 22

Thanks for the quick replies!
Happy to pay $650 (full compute + merge work + tip) if anyone wants to handle the 70B PaperWitch-heresy merge and upload the safetensors.

Just DM me if interested.

Feb 22

@KaraKaraWitch Look here, mate.

Feb 22

•

Dayum I might as well lol, I can try to handle in 8 bit or can even try 16 bit lol, how do I do that though lol ?

Feb 22

Guys how to paperwitch

Feb 22

...Its 1am and I'm supposed to be sleeping 😭

Feb 22

...Its 1am and I'm supposed to be sleeping 😭

wait, same, how close are we bro? I have uni at 9am 😭

bro teach me to paperwitch, I can try to do that job

Feb 22

•

Happy to pay $650

bro if you pay I try to heretic anything under 70b lol, if need finetunes I need to learn it lol, I know how to quant lmao. I just need to test the limits of what I can heretic

Feb 22

...Its 1am and I'm supposed to be sleeping 😭

wait, same, how close are we bro? I have uni at 9am 😭

bro teach me to paperwitch, I can try to do that job

It's running ATM but i need to sleep first sooo

Feb 22

•

@MillieV Aight, lass or laddie. @KaraKaraWitch is at it. Getting your model worked up and neatly packaged. If you have so much money to spare, please donate to a charity of your choice or the creator or the community:

@trentmkelly , literally the creator of that distill.
and
UGI Leaderboard conductor @DontPlanToEnd 's Ko Fi (as per @KaraKaraWitch 's wishes).

Feb 22

•

Thanks Kara❣
Once the merge is ready, could you also quant to Q4_K_M and Q5_K_M GGUF?
I'll send the donation as soon as I test the file.
Thanks again! [Edit] Just to be clear, no pressure on anyone. If it’s too much hassle, totally fine; I just wanted to put a concrete bounty on how much this would mean to me, and I’d prefer the result to be public and reusable for everyone.

Feb 22

hm, Kara is paperbliterating, which reduces refusals but not fully.
I can try to do heretic tomorrow if gpus are free, which tries to kill all refusals, hopefully it fits my gpu, want me to run it ? =)

Feb 22

I just wanted to put a concrete bounty

it's very motivating bounty lol, but I mean if Im learning heretic, might as well find the limits of the hardware, right?

Feb 22

hm, Kara is paperbliterating, which reduces refusals but not fully.
I can try to do heretic tomorrow if gpus are free, which tries to kill all refusals, hopefully it fits my gpu, want me to run it ? =)

paperbliteration is just a modified version of heretic with MLP tuned down. And for the gpt4o distil, I'd recommend to attack MLP layers too.

it's uploading btw

Feb 22

it's uploading btw

HOW FAST ARE YOUR GPUS?!

Feb 22

Only if it’s fun and feasible for you, absolutely no pressure. If learning Heretic and pushing the hardware on the 70B sounds like a good challenge, I’d be thrilled to see what comes out of it. If it turns out to be too heavy or annoying, that’s totally fine too.

A public 70B heretic + your quants would be amazing for everyone, but please don’t cook your GPUs on my account 😄

Feb 22

don’t cook your GPUs on my account

oh dont worry, I learned my lesson to always sleep with fire extinguisher

feasible for you

sadly there is one big motivator in this world =(

but obviously fitting 70b model into gpus is quite fun lol

I will start heretic of 70b when I wake up if I manage to clear out some space and make sure everything is ready

Feb 22

hm, Kara is paperbliterating, which reduces refusals but not fully.
I can try to do heretic tomorrow if gpus are free, which tries to kill all refusals, hopefully it fits my gpu, want me to run it ? =)

paperbliteration is just a modified version of heretic with MLP tuned down. And for the gpt4o distil, I'd recommend to attack MLP layers too.

it's uploading btw

Amnnd choked on a hung disk read. I'll sleep. Ya'll have to wait for another couple hours

Feb 22

What are your gpus bro?

Feb 22

•

I have never seen anyone so enthusiastic about an LLM and decided to roll up my sleeves. Keep me in your prayers.

https://huggingface.co/MuXodious/gpt-4o-distil-Llama-3.3-70B-Instruct-PaperWitch-heresy

Also, please do show your gratitude to @trentmkelly . This here is nothing compared to the efforts they exerted to conjure such a refined dataset on GPT o4 and even more for training the model.

Feb 22

"Decided to roll up my sleeves"

Seriously, thank you for putting this much work into it. I know running Heretic on a 70B isn’t trivial, and I really appreciate you taking it on. I’ll absolutely make sure to thank Trent as well. The 4o distill itself is already something special, and your Heretic pass on top is… well, this is literally the model I was dreaming of!

I’ll be treating this as a core engine in my local machine once it’s quantized, and I certainly will keep you in my prayers.

Feb 23

queued and force pushed to rich1 =)

Feb 23

Looks like I'm way late to the party lol. Really glad to see everyone coming together like this. I'm very happy you like the model Millie. If you'd like to thank me donate a couple bucks to your local SPCA!

Feb 23

tried to heretic but failed, I think heretic scales the weights wrong. While my total vram can fit 16bit, each gpu barely cant fit 4bit =/

I need to play around ig, pls hlep =(

Feb 23

@RichardErkhov you can learn about Hereticisation without requiring a local nuclear power plant by applying MPOA to some high quality small models. Examples:

Feb 23

The problem is my local nuclear power plant wanted to process 70b but for some reason cannot =(

I need to fix heretic to manage to scale on multiple gpus?

Feb 23

•

Heretic uses accelerate via the device_map parameter.

Feb 23

Well it accelerated memory usage to be above my vram, not even bnb 4bit works for 70b, which is weird for 160 vram

Feb 23

•

You have probably found a bug.

Feb 23

Oh so time to work ...

Feb 23

I think because it scales on single gpu, then splits?

Feb 23

I think because it scales on single gpu, then splits?

Unlikely. It may do this by accident (a bug) but should not do so by design.

Feb 23

What can go wrong will go wrong

Feb 23

Try using the sequential device map parameter rather than auto.

Feb 23

Quick follow-up and a proper thank you:

Per everyone’s suggestions, I ended up splitting the bounty as:

A donation to my local SPCA in Trent’s honor (his request),
A donation to DontPlanToEnd’s Ko-fi for UGI, as suggested by MuX/Kara.

The 70B gpt-4o-distil-Llama-3.3-70B-Instruct-PaperWitch-heresy is honestly fantastic. You all did an amazing job, and this is going to be a core engine in my local stack for a long time.

From my side, this request is fully satisfied. Thank you again for rolling this into existence. 💙

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment