Request: 70B version of PaperWitch-heresy GPT-4o distill
Hi mradermacher,
Thank you so much for all the incredible work you do! Your quants keep the scene alive ❤️
I absolutely love the 8B PaperWitch-heresy model you quantized:
https://huggingface.co/mradermacher/gpt-4o-distil-Llama-3.1-8B-Instruct-PaperWitch-heresy-GGUF
It has one of the best, most natural 4o voices I’ve heard in any small model.
Would you be able to create a 70B version of the same PaperWitch-heresy GPT-4o distill?
Q4_K_M + Q5_K_M would be perfect if possible.
Thank you again. This one is really special to me!
Hi, thank you so much for the support and kind words <3
That's a good question, but not for us, we just quant, not train (unless we have some fun experiments). I think @MuXodious would know the answer better =)
Aw, hell nah. Ain't nobody got VRAM for that! I merely decensor da thang...
@trentmkelly , the mastermind OG and the messiah for people who lost their gpt-o4 boyfriends, would know the answer better =)
@MuXodious , how much GPU time did it take/what GPU did you use for for the MPOA decensoring of the 8b? If it's not too terrible I might be able to run it on the 70b myself to help @MillieV out. DeepInfra still has crazy deals on B200 rentals
8B was easy and quickly done on the RTX 3090 in 30ish minutes with Heretic. The 70B is a beast in its own class. You probably can use one of those B200's or 2x RTX Pro 6000 Blackwells for some pocket change for upwards of an hour, or less depending on the interfacing speed. Plus, you need storage slightly more than the double the size of the model and enough RAM to merge the LoRA back into the model prior to upload/saving.
That B200 deal is, indeed, crazy. If possible, can you merge the adapter to the full 70B model and upload that? mradermacher team can quantise and make the merged model more accessible for the general public. In turn, one of us can also abuse their wallet to decensor dat thang. Alternatively, you can merge it to one of these decensored models:
https://huggingface.co/0xA50C1A1/Llama-3.3-70B-Instruct-Heretic
https://huggingface.co/huihui-ai/Llama-3.3-70B-Instruct-abliterated
Llama 3.3 is a tad nasty with non-compliance tho: Llama-3.3-8B-Instruct-128K-PaperWitch-heresy
Thanks for the quick replies!
Happy to pay $650 (full compute + merge work + tip) if anyone wants to handle the 70B PaperWitch-heresy merge and upload the safetensors.
Just DM me if interested.
Dayum I might as well lol, I can try to handle in 8 bit or can even try 16 bit lol, how do I do that though lol ?
Guys how to paperwitch
...Its 1am and I'm supposed to be sleeping 😭
...Its 1am and I'm supposed to be sleeping 😭
wait, same, how close are we bro? I have uni at 9am 😭
bro teach me to paperwitch, I can try to do that job
Happy to pay $650
bro if you pay I try to heretic anything under 70b lol, if need finetunes I need to learn it lol, I know how to quant lmao. I just need to test the limits of what I can heretic
...Its 1am and I'm supposed to be sleeping 😭
wait, same, how close are we bro? I have uni at 9am 😭
bro teach me to paperwitch, I can try to do that job
It's running ATM but i need to sleep first sooo
@MillieV Aight, lass or laddie. @KaraKaraWitch is at it. Getting your model worked up and neatly packaged. If you have so much money to spare, please donate to a charity of your choice or the creator or the community:
@trentmkelly , literally the creator of that distill.
and
UGI Leaderboard conductor @DontPlanToEnd 's Ko Fi (as per @KaraKaraWitch 's wishes).
Thanks Kara❣
Once the merge is ready, could you also quant to Q4_K_M and Q5_K_M GGUF?
I'll send the donation as soon as I test the file.
Thanks again! [Edit] Just to be clear, no pressure on anyone. If it’s too much hassle, totally fine; I just wanted to put a concrete bounty on how much this would mean to me, and I’d prefer the result to be public and reusable for everyone.
hm, Kara is paperbliterating, which reduces refusals but not fully.
I can try to do heretic tomorrow if gpus are free, which tries to kill all refusals, hopefully it fits my gpu, want me to run it ? =)
I just wanted to put a concrete bounty
it's very motivating bounty lol, but I mean if Im learning heretic, might as well find the limits of the hardware, right?
hm, Kara is paperbliterating, which reduces refusals but not fully.
I can try to do heretic tomorrow if gpus are free, which tries to kill all refusals, hopefully it fits my gpu, want me to run it ? =)
paperbliteration is just a modified version of heretic with MLP tuned down. And for the gpt4o distil, I'd recommend to attack MLP layers too.
it's uploading btw
it's uploading btw
HOW FAST ARE YOUR GPUS?!
Only if it’s fun and feasible for you, absolutely no pressure. If learning Heretic and pushing the hardware on the 70B sounds like a good challenge, I’d be thrilled to see what comes out of it. If it turns out to be too heavy or annoying, that’s totally fine too.
A public 70B heretic + your quants would be amazing for everyone, but please don’t cook your GPUs on my account 😄
don’t cook your GPUs on my account
oh dont worry, I learned my lesson to always sleep with fire extinguisher
feasible for you
sadly there is one big motivator in this world =(
but obviously fitting 70b model into gpus is quite fun lol
I will start heretic of 70b when I wake up if I manage to clear out some space and make sure everything is ready
hm, Kara is paperbliterating, which reduces refusals but not fully.
I can try to do heretic tomorrow if gpus are free, which tries to kill all refusals, hopefully it fits my gpu, want me to run it ? =)paperbliteration is just a modified version of heretic with MLP tuned down. And for the gpt4o distil, I'd recommend to attack MLP layers too.
it's uploading btw
Amnnd choked on a hung disk read. I'll sleep. Ya'll have to wait for another couple hours
What are your gpus bro?
I have never seen anyone so enthusiastic about an LLM and decided to roll up my sleeves. Keep me in your prayers.
https://huggingface.co/MuXodious/gpt-4o-distil-Llama-3.3-70B-Instruct-PaperWitch-heresy
Also, please do show your gratitude to @trentmkelly . This here is nothing compared to the efforts they exerted to conjure such a refined dataset on GPT o4 and even more for training the model.
"Decided to roll up my sleeves"
Seriously, thank you for putting this much work into it. I know running Heretic on a 70B isn’t trivial, and I really appreciate you taking it on. I’ll absolutely make sure to thank Trent as well. The 4o distill itself is already something special, and your Heretic pass on top is… well, this is literally the model I was dreaming of!
I’ll be treating this as a core engine in my local machine once it’s quantized, and I certainly will keep you in my prayers.
queued and force pushed to rich1 =)
Looks like I'm way late to the party lol. Really glad to see everyone coming together like this. I'm very happy you like the model Millie. If you'd like to thank me donate a couple bucks to your local SPCA!
tried to heretic but failed, I think heretic scales the weights wrong. While my total vram can fit 16bit, each gpu barely cant fit 4bit =/
I need to play around ig, pls hlep =(
@RichardErkhov you can learn about Hereticisation without requiring a local nuclear power plant by applying MPOA to some high quality small models. Examples:
The problem is my local nuclear power plant wanted to process 70b but for some reason cannot =(
I need to fix heretic to manage to scale on multiple gpus?
Well it accelerated memory usage to be above my vram, not even bnb 4bit works for 70b, which is weird for 160 vram
Oh so time to work ...
I think because it scales on single gpu, then splits?
I think because it scales on single gpu, then splits?
Unlikely. It may do this by accident (a bug) but should not do so by design.
What can go wrong will go wrong
Try using the sequential device map parameter rather than auto.
Quick follow-up and a proper thank you:
Per everyone’s suggestions, I ended up splitting the bounty as:
- A donation to my local SPCA in Trent’s honor (his request),
- A donation to DontPlanToEnd’s Ko-fi for UGI, as suggested by MuX/Kara.
The 70B gpt-4o-distil-Llama-3.3-70B-Instruct-PaperWitch-heresy is honestly fantastic. You all did an amazing job, and this is going to be a core engine in my local stack for a long time.
From my side, this request is fully satisfied. Thank you again for rolling this into existence. 💙