Qwen3 Technical Report
Paper
• 2505.09388 • Published
• 338
Duplication of
cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ
Quantised using vllm-project/llm-compressor, nvidia/Llama-Nemotron-Post-Training-Dataset and the following configs:
recipe = [
AWQModifier(
ignore=["lm_head", "re:.*mlp.gate$", "re:.*mlp.shared_expert_gate$"],
scheme="W4A16",
targets=["Linear"],
),
]
If you find our work helpful, feel free to give us a cite.
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388},
}