|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
base_model: |
|
|
- deepseek-ai/DeepSeek-R1 |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
--- |
|
|
# DeepSeek R1 AWQ |
|
|
AWQ of the DeepSeek R1 model. |
|
|
|
|
|
This quant modified some of the model code to fix the overflow issue when using float16. |
|
|
|
|
|
Tested on vLLM with 8x H100, inference speed 5 tokens/s with batch size 1 and short prompts. |
|
|
|
|
|
If you are serving with vLLM, please either add `--dtype float16` or use the new `moe_wna16` kernel by using `--quantization moe_wna16`. |