Q8 and smaller quants

by whoisjeremylam - opened 19 days ago

hi, this is super interesting. Is there a possibility of a Q8 or smaller quant? At Q8, the handful of us with 48 gb GPUs will be able to try this out. 😊

Jeffbulger82

18 days ago

I’m trying to patch all the converters and experts now to quantize it. It’s quite the challenge. I did get it converted to a GGUF though.

hif1000

concavity.ai org 17 days ago

Thanks for the interest! Right now I am doing some additional training on the model, will quantize it to 8-bit afterwards.

Jbulger82

17 days ago

I had some issues with
blk.0.ssm_a tensor is exporting as 64, 1, 1, 1,… my project was expecting 1, 64 ..but it’s all good. I look forward to your quantization. If you could post any quantized gguf I can make magic happen :) . But I’ll work with whatever you share!I love this concept! I can run a nemotron3 nano with 2 million context inconsumer hardware it’s a little slow, but it works. I think your logic in this project is the solution too many many things! When I make the breakthrough, I’ll share it with you absolutely thanks again.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment