| Quantization on Ascend. | |
| To load already quantized models, simply load the model weights and config. Again, if the model has been quantized offline, there's no need to add `--quantization` argument when starting the engine. The quantization method will be automatically parsed from the downloaded `quant_model_description.json` or `config.json` config. | |
| [ModelSlim on Ascend support](https://github.com/sgl-project/sglang/pull/14504): | |
| - [x] W4A4 dynamic linear | |
| - [x] W8A8 static linear | |
| - [x] W8A8 dynamic linear | |
| - [x] W4A8 dynamic MOE | |
| - [x] W8A8 dynamic MOE | |
| [AWQ on Ascend support](https://github.com/sgl-project/sglang/pull/10158): | |
| - [x] W4A16 linear | |
| - [x] W8A16 linear # Need to test | |
| - [x] W4A16 MOE # Need to test | |
| Compressed-tensors (LLM Compressor) on Ascend support: | |
| - [x] [W4A8 dynamic MOE with/without activation clip](https://github.com/sgl-project/sglang/pull/14736) # Need to test | |
| - [x] [W4A16 MOE](https://github.com/sgl-project/sglang/pull/12759) | |
| - [x] [W8A8 dynamic linear](https://github.com/sgl-project/sglang/pull/14504) | |
| - [x] [W8A8 dynamic MOE](https://github.com/sgl-project/sglang/pull/14504) | |