Akicou/MiniMax-M2-5-REAP-19
https://huggingface.co/Akicou/MiniMax-M2-5-REAP-19
And if you can make TQ1_0 unsloth dynamic ggufs of the whole series, many thanks.
Hi, I can make as many models as you want, as long as you provide links. But I cant get you unsloth's quants as we are not unsloth, and neither support something made beyond mainstream llama cpp, for example bruteforced quants or llama cpp forks
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#MiniMax-M2-5-REAP-19-GGUF for quants to appear.
Hi, I can make as many models as you want, as long as you provide links. But I cant get you unsloth's quants as we are not unsloth, and neither support something made beyond mainstream llama cpp, for example bruteforced quants or llama cpp forks
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#MiniMax-M2-5-REAP-19-GGUF for quants to appear.
Hello Richard! I have more REAP'd variants could you also queue them up? I would very much appreciate it!
https://huggingface.co/Akicou/MiniMax-M2-5-REAP-50
https://huggingface.co/Akicou/MiniMax-M2-5-REAP-39
https://huggingface.co/Akicou/MiniMax-M2-5-REAP-29
Thank you very much!
Of course I can, let me know if you need anything else =)
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#MiniMax-M2-5-REAP-50-GGUF
https://hf.tst.eu/model#MiniMax-M2-5-REAP-39-GGUF
https://hf.tst.eu/model#MiniMax-M2-5-REAP-29-GGUF
for quants to appear.
I'd definitely be interested in the 10% REAP variant of MiniMax M2.5 that you had hinted at in your uploads. That would at least make IQ4_XS quants viable on 128GB of memory while still leaving enough memory for moderate sized contexts.
IQ4_XS
afaik iq4_k_m is a bit better, I wonder what is better: 12-14% reap IQ4_K_M or 10% IQ4_XS π€
Hey richard. Not to be a bother but what exactly do these error mean?
error/1 converting...
error/57 dryrun...
perhaps the transformers bug that mradermacher and nico were talking about. I requeued with transformers v4, let's hope it works. check on it later please. It's not a bother for me, as I say "please keep touching me because I can forget the task that I am currently working on and might start doing something else" lol . it literally segfaulted with no explanation lol, let's hope it works now
I might know why. I had to REAP the model on 4.57.2 It could be that the newest transformer model doesnt work for llama cpp quant as it didnt work in the reaping environment i was worked on!
Thanks for your effort :D
Yes, that's why our amazing mr mradermacher did a quick recovery with transformers v4 =)
Pretty sure it went on a conversion error once more...
Nah, just randomly ran out of storage and nico isnt answering for an hour. I go sleep, but my time is way more than his, so he will eventually come back and resume nico1 server =)
Ah okay. Sleep well Richard π΄
Sleep well Richard
I wish
Unrelated but this crew seems stacked, any idea on what major change ollama put in place with .15 that causes some ggufs not to load? It's made me switch back to pure llamacpp, and the switch wasn't too bad and now it's like I was missing out all this time.
No idea, all I know llama cpp might be superior because you simply have more control over it (at least a year aho when I tried it). When and what they change? No clue... if you want to continue ollama, show me error. If easy I fix, if not easy idk lol, google or raise an issue on their github
Could you get in contact with Nico?
Could you get in contact with Nico?
What should I ask/check?
If the nico1 server was resumed and whether or not your team can help quantizing the pruned models
They didnt quant and someone removed them?? What a shame, I will requeue to rich1 as soon as I am free, probably in few hours. They arent vision, right? I can queue only nonvision to rich1
Rich1 slow but has enough space for sure lol, so will take some time but will process
No idea, all I know llama cpp might be superior because you simply have more control over it (at least a year aho when I tried it). When and what they change? No clue... if you want to continue ollama, show me error. If easy I fix, if not easy idk lol, google or raise an issue on their github
Yes text to text
Rich1 slow but has enough space for sure lol, so will take some time but will process
Alright. I appreciate it
requeued with higher priority to rich1. unless someone with even higher priority gets a queue, you should be next
we have like 5.5TB spare on rich1 lmao, it will for sure quant, it's just matter of time
Alright thanks Richard π
Huston we have a problem which is on your side.
llama_model_load: error loading model: tensor 'blk.29.ffn_up_exps.weight' data is not within the file bounds, model is corrupted or incomplete
llama_model_load_from_file_impl: failed to load model
main: error: unable to load model
which makes sense knowing that you literally removed experts... You either need a placeholder so the model can actually run through llama.cpp or a different approach (at least that's what I understood after talking to nico)
I. Probably pruned it wrong.. I'll check on my side
Thoughts on ttc-REAP as a concept, targeting experts who entropy of use is bounded / expert or something like that? Model learns which expert to use based on difficulty, where experts map each to its own compute budget multiplier? Or meta-effect idk we deep in the weeds tonight boys.
I dont think I can quant anything that modifies the structure of the model itself. So kind of negative thoughts unless you find a way to support official quanting, then I might have positive thoughts =)