MagicQuant for Apriel 1.6?
Hi,
Since Apriel 1.6 was hitting all the most important benchmarks, I was wondering if you could make a MagicQuant version of this one? According to the top 10 of best small opensource models that can run on laptops Apriel 1.6 and GLM 4.7 flash are the TOP of the intelligence line right now.
https://artificialanalysis.ai/models/open-source/small
The GLM 4.7 flash 30B requires double of the VRAM versus Apriel 1.6 (15B).
It would be great if we could have a magic quants version of both, but especially apriel 1.6, because that one is really slow in Q6 or Q8. While the Q4 is hallucinating to much in thinking mode. And Apriel 1.6 is the only one that can run on most consumer hardware without Super expensive GPU's. The only problem is speed vs quality. If we go Q4 it's unsuably bad for coding, while q6 is the best for that size, but extreemly slow at 4 tokens per second ...
Unsloth also published a new methodology that makes the process much faster and with less vram requirements:
https://unsloth.ai/docs/new/3x-faster-training-packing
Here is how you can start:
https://unsloth.ai/docs/basics/quantization-aware-training-qat
I would really appreciate a magic quantz ;-)
And you'll also probably will benefit of that one :-)
pls do this
seems like he's done
If it means anything I am still working on the project. Just busy with life and only able to put a few hours in here or there, but I got some good work on it over the weekend. I'm not using the old pipeline I built anymore. It takes a lot of time, there's flaws that're blatant to me, and I'm working very hard on version 2 which is built in a totally new framework and language.
But the original MagicQuant results truly were just my prototype. Plus, when the old pipeline runs, my entire PC is basically frozen until it's done and it can take days or weeks. So to be honest, I'm trying to make a proper code base that I can not only trust, have it be more perform ant, achieve better results, but also something I'd be comfortable releasing as an open source project. Because I don't really want to be the bottleneck of why people can't build MagicQuant models.
I am hoping to have a lot of the new Qwen3.5 and Gemma models made into the version 2 MagicQuant quantizations by the end of April or may. Then after hammering out the last of the details, I want to just release the code and let the community do with it as they want.
It makes me feel bad when people ask for models and I can't help. Because only my primary workstation can run the old pipeline and I can't have my PC hang for days or weeks when I have work to do. I work from home, so I need my main PC daily. All my previous MagicQuant models baked when I took leave.
take your time, it's just kinda been radio silent from huggingface but that's the only place i follow you
Thanks! On my GitHub here:
https://github.com/magiccodingman/MagicQuant-Wiki
I am trying to be a bit more active. Version 2 is taking a completely different direction. I learned a lot from version 1. And with KL Divergence now a benchmark, there's more nuance to how models are chosen. Previous models that were clear winners are not always clear winners anymore. Plus with a whole new philosophy of how to target and find the best quants has resulted in some really weird, but cool things. But I'm still sitting on it, digesting it, etc.
The hardest parts of my new framework are done. I'm now just tuning it and deciding on multiple factors now.
@sebastienbo @floory Please refer to the upcoming v2.0 launch. All v1.0 MagicQuant models will be depreciated. Original MagicQuant models act deceptively smart, but had major unmeasured flaws. New v2.0 not only resolves this, but introduces a new system that's fundamentally different. Keep an eye on the wiki and collection in the repo over the next week or 2. I may even silently launch a couple examples while testing. But v2.0 isn't just better, it's trustworthy which is way more important. I'm also blending in Unsloth Dynamic learned quants into tensor groups now too. So, the new release will be much more fun and production grade. Plus with a system that's significantly more trust worthy and not producing hidden damage either, I can just easily build additional model architectures as well.
I'm literally in the final stages right now of cleaning up v2.0 and documenting. It's already built on the back end, I'm merely cleaning up the code, output, and so on. After posting some small 4B test models, I'll be starting with the Qwen3.6 series, but will have the ability to work with way more now π
v2.0 isn't fully ready for showcasing, but if anyone is still interested:
https://huggingface.co/magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-GGUF
That's the first showcase of MagicQuant v2.0
The wiki has been updated:
https://github.com/magiccodingman/MagicQuant-Wiki