nm-testing
/

DeepSeek-R1-Distill-Qwen-32B-NVFP4

Text Generation

8-bit precision

compressed-tensors

Model card Files Files and versions

nm-research commited on Nov 21, 2025

Commit

df352b5

·

verified ·

1 Parent(s): 83b82c0

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -35,7 +35,7 @@ It was evaluated on a several tasks to assess the its quality in comparison to t
 ### Model Optimizations
 This model was obtained by quantizing the weights and activations of [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) to FP4 data type, ready for inference with vLLM>=0.9.1
-This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 25%.
 Only the weights of the linear operators within transformers blocks are quantized using [LLM Compressor](https://github.com/vllm-project/llm-compressor).

 ### Model Optimizations
 This model was obtained by quantizing the weights and activations of [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) to FP4 data type, ready for inference with vLLM>=0.9.1
+This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%.
 Only the weights of the linear operators within transformers blocks are quantized using [LLM Compressor](https://github.com/vllm-project/llm-compressor).