Update README.md
Browse files
README.md
CHANGED
|
@@ -35,7 +35,7 @@ It was evaluated on a several tasks to assess the its quality in comparison to t
|
|
| 35 |
### Model Optimizations
|
| 36 |
|
| 37 |
This model was obtained by quantizing the weights and activations of [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) to FP4 data type, ready for inference with vLLM>=0.9.1
|
| 38 |
-
This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately
|
| 39 |
|
| 40 |
Only the weights of the linear operators within transformers blocks are quantized using [LLM Compressor](https://github.com/vllm-project/llm-compressor).
|
| 41 |
|
|
|
|
| 35 |
### Model Optimizations
|
| 36 |
|
| 37 |
This model was obtained by quantizing the weights and activations of [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) to FP4 data type, ready for inference with vLLM>=0.9.1
|
| 38 |
+
This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%.
|
| 39 |
|
| 40 |
Only the weights of the linear operators within transformers blocks are quantized using [LLM Compressor](https://github.com/vllm-project/llm-compressor).
|
| 41 |
|