DeepSeek-R1-AWQ / README.md
v2ray's picture
Update README.md
fac8d4b verified
|
raw
history blame
505 Bytes
metadata
license: mit
language:
  - en
  - zh
base_model:
  - deepseek-ai/DeepSeek-R1
pipeline_tag: text-generation
library_name: transformers

DeepSeek R1 AWQ

AWQ of the DeepSeek R1 model.

This quant modified some of the model code to fix the overflow issue when using float16.

Tested on vLLM with 8x H100, inference speed 5 tokens/s with batch size 1 and short prompts.

If you are serving with vLLM, please either add --dtype float16 or use the new moe_wna16 kernel by using --quantization moe_wna16.