Add metadata and improve model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +9 -22
README.md CHANGED
@@ -1,10 +1,16 @@
1
  ---
2
  license: apache-2.0
 
 
 
3
  tags:
4
  - 3-bit
5
  - Quantization
6
  - Pseudo-Quantization
 
 
7
  ---
 
8
  # QuantLRM-R1-Qwen-32B-3-bit
9
 
10
  3-bit quantized `DeepSeek-R1-Distill-Qwen-32B` based on [QuantLRM](https://www.arxiv.org/abs/2602.02581), a state-of-the-art quantization method of large reasoning models via fine-tuning signals
@@ -13,30 +19,17 @@ tags:
13
 
14
  This is the pseudo-quantized model (weights are dequantized back to full-precision) to facilitate the use of `vLLM`, which is the recommended way of inference. To obtain the real quantized version, please refer to our [Github repo](https://github.com/psunlpgroup/QuantLRM). We use an existing CUDA kernel to support the inference of 4-bit real quantized models.
15
 
16
- ### Model Description
17
-
18
- <!-- Provide a longer summary of what this model is. -->
19
-
20
-
21
  - **Developed by:** Nan Zhang (njz5124@psu.edu)
22
  - **Model type:** 3-bit pseudo-quantized version of `DeepSeek-R1-Distill-Qwen-32B`
23
-
24
- ### Model Sources
25
-
26
- <!-- Provide the basic links for the model. -->
27
-
28
  - **Repository:** https://github.com/psunlpgroup/QuantLRM
29
  - **Paper:** https://www.arxiv.org/abs/2602.02581
30
 
31
 
32
  ## Uses
33
 
34
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
35
-
36
  This model is designed to be used with `vLLM` due to its inference optimization. Please use the tokenizer of `deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`.
37
 
38
 
39
-
40
  ## Calibration Data
41
 
42
  We use the default calibration set of QuantLRM (`mit-han-lab/pile-val-backup`) to obtain this model.
@@ -49,8 +42,6 @@ This model achieves more than 3% improvement (based on average scores of various
49
 
50
  ## Citation
51
 
52
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
53
-
54
  **BibTeX:**
55
 
56
  ```bibtex
@@ -71,10 +62,6 @@ This model achieves more than 3% improvement (based on average scores of various
71
  Zhang, N., Kwek, E., Zhang, Y., Pan, M., Wang, S., Mitra, P., & Zhang, R. (2026). QuantLRM: Quantization of Large Reasoning Models via Fine-Tuning Signals. arXiv preprint arXiv:2602.02581.
72
  ```
73
 
74
-
75
- ## Model Card Author
76
- Nan Zhang
77
-
78
- ## Model Card Contact
79
-
80
- njz5124@psu.edu
 
1
  ---
2
  license: apache-2.0
3
+ library_name: transformers
4
+ base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
5
+ pipeline_tag: text-generation
6
  tags:
7
  - 3-bit
8
  - Quantization
9
  - Pseudo-Quantization
10
+ - reasoning
11
+ - arxiv:2602.02581
12
  ---
13
+
14
  # QuantLRM-R1-Qwen-32B-3-bit
15
 
16
  3-bit quantized `DeepSeek-R1-Distill-Qwen-32B` based on [QuantLRM](https://www.arxiv.org/abs/2602.02581), a state-of-the-art quantization method of large reasoning models via fine-tuning signals
 
19
 
20
  This is the pseudo-quantized model (weights are dequantized back to full-precision) to facilitate the use of `vLLM`, which is the recommended way of inference. To obtain the real quantized version, please refer to our [Github repo](https://github.com/psunlpgroup/QuantLRM). We use an existing CUDA kernel to support the inference of 4-bit real quantized models.
21
 
 
 
 
 
 
22
  - **Developed by:** Nan Zhang (njz5124@psu.edu)
23
  - **Model type:** 3-bit pseudo-quantized version of `DeepSeek-R1-Distill-Qwen-32B`
 
 
 
 
 
24
  - **Repository:** https://github.com/psunlpgroup/QuantLRM
25
  - **Paper:** https://www.arxiv.org/abs/2602.02581
26
 
27
 
28
  ## Uses
29
 
 
 
30
  This model is designed to be used with `vLLM` due to its inference optimization. Please use the tokenizer of `deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`.
31
 
32
 
 
33
  ## Calibration Data
34
 
35
  We use the default calibration set of QuantLRM (`mit-han-lab/pile-val-backup`) to obtain this model.
 
42
 
43
  ## Citation
44
 
 
 
45
  **BibTeX:**
46
 
47
  ```bibtex
 
62
  Zhang, N., Kwek, E., Zhang, Y., Pan, M., Wang, S., Mitra, P., & Zhang, R. (2026). QuantLRM: Quantization of Large Reasoning Models via Fine-Tuning Signals. arXiv preprint arXiv:2602.02581.
63
  ```
64
 
65
+ ## Acknowledgement
66
+ * Our quantization pipeline is developed based on AWQ: https://github.com/mit-han-lab/llm-awq/tree/main.
67
+ * The idea of only searching for the scales of o_proj and down_proj on Olmo3 is based on LLM Compressor: https://github.com/vllm-project/llm-compressor.