ekurtic commited on
Commit
f043c47
·
verified ·
1 Parent(s): c707060

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -86,7 +86,7 @@ We follow the standard vLLM performance benchmarking with ShareGPT dataset and o
86
  | -------------------------------------------- | :-------------------------------------: | :---------------------------------------: | :------------------------------------: |
87
  | cognitivecomputations/DeepSeek-R1-AWQ | 1585.45 | 55.41 | 43.06 |
88
  | ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts | 1344.68 | 41.49 | 36.33 |
89
- | ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g | 815.19 | 44.65 | 37.88 |
90
 
91
  GPTQ models are faster across all metrics than AWQ models because GPTQ uses less bits-per-parameter than AWQ. More specifically, AWQ has to use smaller group-size of 64 (vs 128 in GPTQ) to preserve accuracy, and zero-points due to asymmetric quantization.
92
 
 
86
  | -------------------------------------------- | :-------------------------------------: | :---------------------------------------: | :------------------------------------: |
87
  | cognitivecomputations/DeepSeek-R1-AWQ | 1585.45 | 55.41 | 43.06 |
88
  | ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts | 1344.68 | 41.49 | 36.33 |
89
+ | ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g <br> **(this model)** | 815.19 | 44.65 | 37.88 |
90
 
91
  GPTQ models are faster across all metrics than AWQ models because GPTQ uses less bits-per-parameter than AWQ. More specifically, AWQ has to use smaller group-size of 64 (vs 128 in GPTQ) to preserve accuracy, and zero-points due to asymmetric quantization.
92