Q8 and smaller quants

#1
by whoisjeremylam - opened

hi, this is super interesting. Is there a possibility of a Q8 or smaller quant? At Q8, the handful of us with 48 gb GPUs will be able to try this out. 😊

I’m trying to patch all the converters and experts now to quantize it. It’s quite the challenge. I did get it converted to a GGUF though.

concavity.ai org

Thanks for the interest! Right now I am doing some additional training on the model, will quantize it to 8-bit afterwards.

I had some issues with
blk.0.ssm_a tensor is exporting as 64, 1, 1, 1,… my project was expecting 1, 64 ..but it’s all good. I look forward to your quantization. If you could post any quantized gguf I can make magic happen :) . But I’ll work with whatever you share!I love this concept! I can run a nemotron3 nano with 2 million context inconsumer hardware it’s a little slow, but it works. I think your logic in this project is the solution too many many things! When I make the breakthrough, I’ll share it with you absolutely thanks again.

Sign up or log in to comment