Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

FasterDFlash
/

Hanrui

Model card Files Files and versions

xet

Community

Hanrui / sglang /docs /platforms /ascend_npu_quantization.md

Lekr0

Add files using upload-large-folder tool

a227c91 verified about 1 month ago

preview code

|

raw

history blame contribute delete

1.11 kB

Quantization on Ascend.

To load already quantized models, simply load the model weights and config. Again, if the model has been quantized offline, there's no need to add --quantization argument when starting the engine. The quantization method will be automatically parsed from the downloaded quant_model_description.json or config.json config.

ModelSlim on Ascend support:

W4A4 dynamic linear
W8A8 static linear
W8A8 dynamic linear
W4A8 dynamic MOE
W8A8 dynamic MOE

AWQ on Ascend support:

W4A16 linear
W8A16 linear # Need to test
W4A16 MOE # Need to test

Compressed-tensors (LLM Compressor) on Ascend support:

W4A8 dynamic MOE with/without activation clip # Need to test
W4A16 MOE
W8A8 dynamic linear
W8A8 dynamic MOE