Add metadata, sample usage, and improve model details

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +33 -1
README.md CHANGED
@@ -4,10 +4,14 @@ tags:
4
  - 3-bit
5
  - Quantization
6
  - Pseudo-Quantization
 
 
 
7
  ---
 
8
  # QuantLRM-R1-Qwen3-8B-3-bit
9
 
10
- 3-bit quantized `DeepSeek-R1-0528-Qwen3-8B` based on [QuantLRM](https://www.arxiv.org/abs/2602.02581), a state-of-the-art quantization method of large reasoning models via fine-tuning signals
11
 
12
  ## Model Details
13
 
@@ -20,6 +24,7 @@ This is the pseudo-quantized model (weights are dequantized back to full-precisi
20
 
21
  - **Developed by:** Nan Zhang (njz5124@psu.edu)
22
  - **Model type:** 3-bit pseudo-quantized version of `DeepSeek-R1-0528-Qwen3-8B`
 
23
 
24
  ### Model Sources
25
 
@@ -35,7 +40,34 @@ This is the pseudo-quantized model (weights are dequantized back to full-precisi
35
 
36
  This model is designed to be used with `vLLM` due to its inference optimization. Please use the tokenizer of `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B`.
37
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ## Calibration Data
41
 
 
4
  - 3-bit
5
  - Quantization
6
  - Pseudo-Quantization
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ base_model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
10
  ---
11
+
12
  # QuantLRM-R1-Qwen3-8B-3-bit
13
 
14
+ 3-bit quantized `DeepSeek-R1-0528-Qwen3-8B` based on [QuantLRM: Quantization of Large Reasoning Models via Fine-Tuning Signals](https://www.arxiv.org/abs/2602.02581), a state-of-the-art quantization method of large reasoning models via fine-tuning signals.
15
 
16
  ## Model Details
17
 
 
24
 
25
  - **Developed by:** Nan Zhang (njz5124@psu.edu)
26
  - **Model type:** 3-bit pseudo-quantized version of `DeepSeek-R1-0528-Qwen3-8B`
27
+ - **Base Model:** `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B`
28
 
29
  ### Model Sources
30
 
 
40
 
41
  This model is designed to be used with `vLLM` due to its inference optimization. Please use the tokenizer of `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B`.
42
 
43
+ ## Sample Usage
44
+
45
+ To use this model, you can follow the steps below from the [QuantLRM GitHub repository](https://github.com/psunlpgroup/QuantLRM).
46
+
47
+ First, compute input channel importance scores:
48
+
49
+ ```bash
50
+ python compare_weight_matrix.py
51
+ python quadratic_mapping.py # supports processing weight updates on GPU
52
+ ```
53
+
54
+ Then, run the quantization pipeline to search for the optimal scales:
55
 
56
+ ```bash
57
+ python -m awq.entry --model_path /PATH/TO/LRM \
58
+ --w_bit 3 --q_group_size 128 --run_awq --dump_awq QuantLRM_cache/R1-Qwen3-8B-w3-g128.pt
59
+ ```
60
+
61
+ For inference with the pseudo-quantized model using `vLLM`:
62
+
63
+ ```bash
64
+ python -m awq.entry --model_path /PATH/TO/LRM \
65
+ --w_bit 3 --q_group_size 128 \
66
+ --load_awq QuantLRM_cache/R1-Qwen3-8B-w3-g128.pt \
67
+ --q_backend fake --dump_fake models/R1-Qwen3-8B-w3-g128
68
+
69
+ CUDA_VISIBLE_DEVICES=0 python inference_vllm.py
70
+ ```
71
 
72
  ## Calibration Data
73