Spaces:
Running
Request to quantize GRM-2.6-Plus
Hi cyankiwi,
I'd like to request AWQ 4-bit (and optionally GGUF Q4_K_M) quantizations of the GRM-2.6-Plus model (OrionLLM).
This model shows strong benchmark results in the ~27B parameter range and I believe it would be a great addition to your quant lineup. I'm running on a 48GB L40S and AWQ-4bit would let me take full advantage of the context window while keeping VRAM efficient.
Model link: OrionLLM/GRM-2.6-Plus
Thanks for all your quant work โ your AWQ versions are what I use daily. Appreciate the consideration!
CJ - CEO & Founder
AetherPro Technologies
Thank you for using my models for many months. Please enjoy :)
https://huggingface.co/cyankiwi/GRM-2.6-Plus-AWQ-BF16-INT4
https://huggingface.co/cyankiwi/GRM-2.6-Plus-AWQ-INT4
Thanks, this is going on my open 48 L40S GPU, right on time too. Are you updating all those models because of the new paper on AWQ quantization you released and should I update the models I am using once you drop the updates? My current live stack of models is - (all cyankiwi AWQ-4bit, except for a INT qwen3.6-27b) qwen3.6-36b, qwen3.6-27b-int4, gemma-4-26b-a4(I have all the gemma models and all the qwen models up to this date, I only have L4-360(4 24GB), 1 L40S-90, 2 L40S-180 o GPU count so I only run 4-6 models at a time) nemotron-3-nano-omni(this isn't your quant and im not sure if you should spend time on it yet). The GRM model is a fine tune of qwen3.6, that's why I wanted it.It's a beast for it's size. Once I perfect my stack I am deploying and distributing my system, from models to application level on Beelink and Strix Halo small form factor PC's, that's why I need quants. Fully self hosted Telephony Voice Agent and Ad engine pipeline, all local, no cloud expect twilio connection. No one has this. I built go high level capabilities for privacy focused companies. The journey is just beginning.