Qwen3-Coder-30B-A3B-Instruct-AWQ

Duplication of cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ

Method

Quantised using vllm-project/llm-compressor, nvidia/Llama-Nemotron-Post-Training-Dataset and the following configs:

recipe = [
    AWQModifier(
        ignore=["lm_head", "re:.*mlp.gate$", "re:.*mlp.shared_expert_gate$"],
        scheme="W4A16",
        targets=["Linear"],  
    ),
]

Citation

If you find our work helpful, feel free to give us a cite.

@misc{qwen3technicalreport,
      title={Qwen3 Technical Report}, 
      author={Qwen Team},
      year={2025},
      eprint={2505.09388},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.09388}, 
}
Downloads last month
93
Safetensors
Model size
5B params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for tabnine/Qwen3-Coder-30B-A3B-Instruct-AWQ