Some questions on BitNet PTQ

#1
by TomLucidor - opened
  1. A lot of repos these days still does not have that much support for BitNet (I think), how can this be run? https://github.com/vllm-project/vllm/issues/33142
  2. Will you ever try BitNet for Qwen3-30B-A3B or GPT-OSS (MoE-style)? What about Nemotron-3-Nano or Ring-Mini-Linear-2.0 or Kimi-Linear (hybird attention)?
  3. How could Sherry techniques be wrapped under BitNet/ternary quantization?
  4. Could these models be benchmarked for their reasoning, agentic coding, and instruction-following abilities?

Cross-ref https://huggingface.co/nightmedia/Kimi-Linear-REAP-35B-A3B-Instruct-mxfp4-mlx/discussions/1

Owner

I will have to research some of this. I am unfortunately a hobbyist and self taught, so I have a lot of knowledge gaps. There were issues with initial quant that I've identified and I am working to patch these out and will reupload in place once resolved. I will work, once these quant issues are sorted, to ensure it extends to other model types and various implementations. I will have to follow up on the rest of your questions, but I do plan longer term to benchmark the model and others that I work on and get ternary quantized.

Please start with Qwen3.5 if possible as they did one last banger. Looking forward to GitHub repos for quantizing/running this as well.

Sign up or log in to comment