Request for <4B linear attention quants

#1
by TomLucidor - opened

Could you do Q8/Q6/Q4/Adaptive quants on Jet-Nemotron-2B / Nemotron-Flash-3B-Instruct / Jet-Nemotron-4B / Nemotron-H-4B-Instruct-128K (ideally MLX-compatible)?

Hi, we only do gguf static and imaxtrix, we dont do other formats. If you want gguf, you can just send link for models here so I can process them

so I just realised that this comment was left on my and not mradermacher page. As much as I want to, due to huggingface blocking me from uploading to my account (because whatever can go wrong will go wrong in my life) and my forgetfullness I just fully joined mradermacher team instead, so you would need to find the quants on their page. I queued them there, here's the message I usually leave on model request for mradermacher =)

It's queued!

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Jet-Nemotron-4B-GGUF
https://hf.tst.eu/model#Jet-Nemotron-2B-GGUF
https://hf.tst.eu/model#Nemotron-Flash-3B-Instruct-GGUF
https://hf.tst.eu/model#Nemotron-H-4B-Instruct-128K-GGUF
for quants to appear.

https://huggingface.co/nvidia/Nemotron-H-4B-Instruct-128K

Queue gave me this error, you would like to check it to understand why your nemotrons might not be quantized

model broken, max arrogance. https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2/discussions/5

So sorry if something is not going to be quanted, as it is out of my control

WTF from nVidia! We definitely need something functional in the linear attention sphere...

Sign up or log in to comment