YCWTG/Qwen3-Coder-Next-int2-mixed-AutoRound

#1
by Ohjaaja - opened

Thanks for offering model. This is guite agressive quant so could you provide info of its accuracy and usability.

This model is mainly an experimental result of my work on the AutoRound quantization algorithm and mixed precision. In practice, compared with other versions at the same bit precision, it can improve performance on simpler tasks such as MMLU by around 15%. However, since it does not support inference engines like vLLM or Ollama and can only be loaded through Transformers, it runs about 2–4Γ— slower than other models at the same precision even with GPU acceleration. For more complex coding tasks or general office use, I would still recommend the standard q2_KS version or a higher-precision model.

Hey thanks for insight. Good work.

Ohjaaja changed discussion status to closed

Sign up or log in to comment