QuixiAI
/

DeepSeek-R1-AWQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

DeepSeek-R1-AWQ / README.md

v2ray's picture

Update README.md

fac8d4b verified 12 months ago

|

505 Bytes

	---
	license: mit
	language:
	- en
	- zh
	base_model:
	- deepseek-ai/DeepSeek-R1
	pipeline_tag: text-generation
	library_name: transformers
	---
	# DeepSeek R1 AWQ
	AWQ of the DeepSeek R1 model.

	This quant modified some of the model code to fix the overflow issue when using float16.

	Tested on vLLM with 8x H100, inference speed 5 tokens/s with batch size 1 and short prompts.

	If you are serving with vLLM, please either add `--dtype float16` or use the new `moe_wna16` kernel by using `--quantization moe_wna16`.