mradermacher/model_requests · Akicou/MiniMax-M2-5-REAP-19

TimeLordRaps

Feb 15

https://huggingface.co/Akicou/MiniMax-M2-5-REAP-19

And if you can make TQ1_0 unsloth dynamic ggufs of the whole series, many thanks.

RichardErkhov

Feb 15

•

edited Feb 15

Hi, I can make as many models as you want, as long as you provide links. But I cant get you unsloth's quants as we are not unsloth, and neither support something made beyond mainstream llama cpp, for example bruteforced quants or llama cpp forks
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#MiniMax-M2-5-REAP-19-GGUF for quants to appear.

Akicou

Feb 15

Hi, I can make as many models as you want, as long as you provide links. But I cant get you unsloth's quants as we are not unsloth, and neither support something made beyond mainstream llama cpp, for example bruteforced quants or llama cpp forks
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#MiniMax-M2-5-REAP-19-GGUF for quants to appear.

Hello Richard! I have more REAP'd variants could you also queue them up? I would very much appreciate it!

https://huggingface.co/Akicou/MiniMax-M2-5-REAP-50
https://huggingface.co/Akicou/MiniMax-M2-5-REAP-39
https://huggingface.co/Akicou/MiniMax-M2-5-REAP-29

Thank you very much!

RichardErkhov

Feb 15

Of course I can, let me know if you need anything else =)

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#MiniMax-M2-5-REAP-50-GGUF
https://hf.tst.eu/model#MiniMax-M2-5-REAP-39-GGUF
https://hf.tst.eu/model#MiniMax-M2-5-REAP-29-GGUF
for quants to appear.

stew675

Feb 15

I'd definitely be interested in the 10% REAP variant of MiniMax M2.5 that you had hinted at in your uploads. That would at least make IQ4_XS quants viable on 128GB of memory while still leaving enough memory for moderate sized contexts.

RichardErkhov

Feb 15

•

edited Feb 15

IQ4_XS

afaik iq4_k_m is a bit better, I wonder what is better: 12-14% reap IQ4_K_M or 10% IQ4_XS 🤔

Akicou

Feb 15

Hey richard. Not to be a bother but what exactly do these error mean?

error/1 converting...
error/57 dryrun...

RichardErkhov

Feb 15

•

edited Feb 15

perhaps the transformers bug that mradermacher and nico were talking about. I requeued with transformers v4, let's hope it works. check on it later please. It's not a bother for me, as I say "please keep touching me because I can forget the task that I am currently working on and might start doing something else" lol . it literally segfaulted with no explanation lol, let's hope it works now

Akicou

Feb 15

I might know why. I had to REAP the model on 4.57.2 It could be that the newest transformer model doesnt work for llama cpp quant as it didnt work in the reaping environment i was worked on!

Thanks for your effort :D

RichardErkhov

Feb 15

Yes, that's why our amazing mr mradermacher did a quick recovery with transformers v4 =)

Akicou

Feb 15

Pretty sure it went on a conversion error once more...

RichardErkhov

Feb 15

Nah, just randomly ran out of storage and nico isnt answering for an hour. I go sleep, but my time is way more than his, so he will eventually come back and resume nico1 server =)

Akicou

Feb 15

Ah okay. Sleep well Richard 😴

RichardErkhov

Feb 15

Sleep well Richard

I wish

TimeLordRaps

Feb 16

Unrelated but this crew seems stacked, any idea on what major change ollama put in place with .15 that causes some ggufs not to load? It's made me switch back to pure llamacpp, and the switch wasn't too bad and now it's like I was missing out all this time.

RichardErkhov

Feb 16

No idea, all I know llama cpp might be superior because you simply have more control over it (at least a year aho when I tried it). When and what they change? No clue... if you want to continue ollama, show me error. If easy I fix, if not easy idk lol, google or raise an issue on their github

Akicou

Feb 16

Could you get in contact with Nico?

RichardErkhov

Feb 16

Could you get in contact with Nico?

What should I ask/check?

Akicou

Feb 16

If the nico1 server was resumed and whether or not your team can help quantizing the pruned models

RichardErkhov

Feb 16

They didnt quant and someone removed them?? What a shame, I will requeue to rich1 as soon as I am free, probably in few hours. They arent vision, right? I can queue only nonvision to rich1

RichardErkhov

Feb 16

Rich1 slow but has enough space for sure lol, so will take some time but will process

Akicou

Feb 16

No idea, all I know llama cpp might be superior because you simply have more control over it (at least a year aho when I tried it). When and what they change? No clue... if you want to continue ollama, show me error. If easy I fix, if not easy idk lol, google or raise an issue on their github

Yes text to text

Akicou

Feb 16

Rich1 slow but has enough space for sure lol, so will take some time but will process

Alright. I appreciate it

RichardErkhov

Feb 16

requeued with higher priority to rich1. unless someone with even higher priority gets a queue, you should be next

RichardErkhov

Feb 16

we have like 5.5TB spare on rich1 lmao, it will for sure quant, it's just matter of time

Akicou

Feb 16

Alright thanks Richard 😊

RichardErkhov

Feb 17

Huston we have a problem which is on your side.

llama_model_load: error loading model: tensor 'blk.29.ffn_up_exps.weight' data is not within the file bounds, model is corrupted or incomplete
llama_model_load_from_file_impl: failed to load model
main: error: unable to load model

which makes sense knowing that you literally removed experts... You either need a placeholder so the model can actually run through llama.cpp or a different approach (at least that's what I understood after talking to nico)

Akicou

Feb 17

I. Probably pruned it wrong.. I'll check on my side

TimeLordRaps

Feb 17

Thoughts on ttc-REAP as a concept, targeting experts who entropy of use is bounded / expert or something like that? Model learns which expert to use based on difficulty, where experts map each to its own compute budget multiplier? Or meta-effect idk we deep in the weeds tonight boys.

RichardErkhov

Feb 17

I dont think I can quant anything that modifies the structure of the model itself. So kind of negative thoughts unless you find a way to support official quanting, then I might have positive thoughts =)

TimeLordRaps

Mar 9

I dont think I can quant anything that modifies the structure of the model itself. So kind of negative thoughts unless you find a way to support official quanting, then I might have positive thoughts =)

Did you try projecting into a LoRa and just apply it on top of the layers that naturally go over?

RichardErkhov

Mar 9

we need to queue a full model =(

TimeLordRaps

Mar 9

What if it just logits on top, idk how to bring the inside to the top yet though, who's to say I figure it out. Also unrelated I designed the windfall:
https://github.com/TimeLordRaps/satisfiable-ai/blob/main/LICENSE
https://github.com/TimeLordRaps/satisfaction-suffices/blob/main/LICENSE

RichardErkhov

Mar 9

•

edited Mar 9

I dont know, I dont control how the models are processed, we need to ask nico or something but yhe original model is distributed as full safetensors, so he would need to get lora, then merge it differently, then only I can queue.