Update README.md
Browse files
README.md
CHANGED
|
@@ -34,8 +34,53 @@ Variant Overview
|
|
| 34 |
|
| 35 |
Choose the variant that best matches your hardware and quality requirements.
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
### 【Model Update Date】
|
| 38 |
-
```
|
|
|
|
|
|
|
|
|
|
| 39 |
2025-06-04
|
| 40 |
1. fast commit
|
| 41 |
```
|
|
|
|
| 34 |
|
| 35 |
Choose the variant that best matches your hardware and quality requirements.
|
| 36 |
|
| 37 |
+
### 【Vllm 单机(8x141GB)启动命令】
|
| 38 |
+
```
|
| 39 |
+
MAX_REQUESTS=512
|
| 40 |
+
CONTEXT_LEN=163840
|
| 41 |
+
python3 -m vllm.entrypoints.openai.api_server \
|
| 42 |
+
--model .../QuantTrio/DeepSeek-R1-0528-GPTQ-Int4-Int8Mix-Medium \
|
| 43 |
+
--served-model-name QuantTrio/DeepSeek-R1-0528-GPTQ-Int4-Int8Mix-Medium \
|
| 44 |
+
--swap-space 16 \
|
| 45 |
+
--tensor-parallel-size 8 \
|
| 46 |
+
--gpu-memory-utilization 0.95 \
|
| 47 |
+
--max-num-seqs $MAX_REQUESTS \
|
| 48 |
+
--max-seq-len-to-capture $CONTEXT_LEN \
|
| 49 |
+
--max-model-len $CONTEXT_LEN \
|
| 50 |
+
--enable-auto-tool-choice \
|
| 51 |
+
--tool-call-parser deepseek_v3 \
|
| 52 |
+
--chat-template tool_chat_template_deepseekr1.jinja \
|
| 53 |
+
--disable-log-requests \
|
| 54 |
+
--host 0.0.0.0 \
|
| 55 |
+
--port 8000
|
| 56 |
+
```
|
| 57 |
+
### 【H200 throughput performance】
|
| 58 |
+
|
| 59 |
+
1. `8 × H200 (141 GB)`、 `context = 163840 tokens`
|
| 60 |
+
|
| 61 |
+
| concurrent reqs | total tok/s | tok/s per req |
|
| 62 |
+
|-----------------|-------------|---------------|
|
| 63 |
+
| 1 | 60 | 60.0 |
|
| 64 |
+
| 50 | 1350 | 27.0 |
|
| 65 |
+
| 100 | 2200 | 22.0 |
|
| 66 |
+
| 200 | 3400 | 17.0 |
|
| 67 |
+
| 400 | 5100 | 12.7 |
|
| 68 |
+
|
| 69 |
+
2. `4 × H200 (141 GB)`、 `context = 63840 tokens`
|
| 70 |
+
|
| 71 |
+
| concurrent reqs | total tok/s | tok/s per req |
|
| 72 |
+
|-----------------|-------------|---------------|
|
| 73 |
+
| 1 | 56 | 56.0 |
|
| 74 |
+
| 50 | 1100 | 22.0 |
|
| 75 |
+
| 100 | 1700 | 17.0 |
|
| 76 |
+
| 200 | 2600 | 13.0 |
|
| 77 |
+
| 400 | 3900 | 9.7 |
|
| 78 |
+
|
| 79 |
### 【Model Update Date】
|
| 80 |
+
```
|
| 81 |
+
2025-06-20
|
| 82 |
+
Added vLLM launch example (single node with 8 × H200 / 141 GB) and corresponding concurrency throughput benchmark data.
|
| 83 |
+
|
| 84 |
2025-06-04
|
| 85 |
1. fast commit
|
| 86 |
```
|