JunHowie commited on
Commit
b87222b
·
verified ·
1 Parent(s): d8e48ad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -1
README.md CHANGED
@@ -34,8 +34,53 @@ Variant Overview
34
 
35
  Choose the variant that best matches your hardware and quality requirements.
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  ### 【Model Update Date】
38
- ```
 
 
 
39
  2025-06-04
40
  1. fast commit
41
  ```
 
34
 
35
  Choose the variant that best matches your hardware and quality requirements.
36
 
37
+ ### 【Vllm 单机(8x141GB)启动命令】
38
+ ```
39
+ MAX_REQUESTS=512
40
+ CONTEXT_LEN=163840
41
+ python3 -m vllm.entrypoints.openai.api_server \
42
+ --model .../QuantTrio/DeepSeek-R1-0528-GPTQ-Int4-Int8Mix-Medium \
43
+ --served-model-name QuantTrio/DeepSeek-R1-0528-GPTQ-Int4-Int8Mix-Medium \
44
+ --swap-space 16 \
45
+ --tensor-parallel-size 8 \
46
+ --gpu-memory-utilization 0.95 \
47
+ --max-num-seqs $MAX_REQUESTS \
48
+ --max-seq-len-to-capture $CONTEXT_LEN \
49
+ --max-model-len $CONTEXT_LEN \
50
+ --enable-auto-tool-choice \
51
+ --tool-call-parser deepseek_v3 \
52
+ --chat-template tool_chat_template_deepseekr1.jinja \
53
+ --disable-log-requests \
54
+ --host 0.0.0.0 \
55
+ --port 8000
56
+ ```
57
+ ### 【H200 throughput performance】
58
+
59
+ 1. `8 × H200 (141 GB)`、 `context = 163840 tokens`
60
+
61
+ | concurrent reqs | total tok/s | tok/s per req |
62
+ |-----------------|-------------|---------------|
63
+ | 1 | 60 | 60.0 |
64
+ | 50 | 1350 | 27.0 |
65
+ | 100 | 2200 | 22.0 |
66
+ | 200 | 3400 | 17.0 |
67
+ | 400 | 5100 | 12.7 |
68
+
69
+ 2. `4 × H200 (141 GB)`、 `context = 63840 tokens`
70
+
71
+ | concurrent reqs | total tok/s | tok/s per req |
72
+ |-----------------|-------------|---------------|
73
+ | 1 | 56 | 56.0 |
74
+ | 50 | 1100 | 22.0 |
75
+ | 100 | 1700 | 17.0 |
76
+ | 200 | 2600 | 13.0 |
77
+ | 400 | 3900 | 9.7 |
78
+
79
  ### 【Model Update Date】
80
+ ```
81
+ 2025-06-20
82
+ Added vLLM launch example (single node with 8 × H200 / 141 GB) and corresponding concurrency throughput benchmark data.
83
+
84
  2025-06-04
85
  1. fast commit
86
  ```