ekurtic commited on
Commit
ffb8276
·
verified ·
1 Parent(s): e81f1b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -84,7 +84,7 @@ We follow the standard vLLM performance benchmarking with ShareGPT dataset and o
84
  | | Time to First Token<br>Median TTFT (ms) ↓ | Time per Output Token<br>Median TPOT (ms) ↓ | Inter-token Latency<br>Median ITL (ms) ↓ |
85
  | -------------------------------------------- | :-------------------------------------: | :---------------------------------------: | :------------------------------------: |
86
  | cognitivecomputations/DeepSeek-R1-AWQ | 1585.45 | 55.41 | 43.06 |
87
- | ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts | 1344.68 | 41.49 | 36.33 |
88
  | ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g | 815.19 | 44.65 | 37.88 |
89
 
90
  GPTQ models are faster across all metrics than AWQ models because GPTQ uses less bits-per-parameter than AWQ. More specifically, AWQ has to use smaller group-size of 64 (vs 128 in GPTQ) to preserve accuracy, and zero-points due to asymmetric quantization.
 
84
  | | Time to First Token<br>Median TTFT (ms) ↓ | Time per Output Token<br>Median TPOT (ms) ↓ | Inter-token Latency<br>Median ITL (ms) ↓ |
85
  | -------------------------------------------- | :-------------------------------------: | :---------------------------------------: | :------------------------------------: |
86
  | cognitivecomputations/DeepSeek-R1-AWQ | 1585.45 | 55.41 | 43.06 |
87
+ | ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts<br> **(this model)** | 1344.68 | 41.49 | 36.33 |
88
  | ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g | 815.19 | 44.65 | 37.88 |
89
 
90
  GPTQ models are faster across all metrics than AWQ models because GPTQ uses less bits-per-parameter than AWQ. More specifically, AWQ has to use smaller group-size of 64 (vs 128 in GPTQ) to preserve accuracy, and zero-points due to asymmetric quantization.