bdellabe commited on
Commit
39fc50a
·
verified ·
1 Parent(s): 253358e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -4,3 +4,32 @@ base_model:
4
  - deepseek-ai/DeepSeek-R1
5
  license: mit
6
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - deepseek-ai/DeepSeek-R1
5
  license: mit
6
  ---
7
+
8
+ Results from running `vllm serve RedHatAI/DeepSeek-R1-NVFP4-FP8-BLOCK --tensor-parallel-size=4` on 4 B200s, with `python tests/evals/gsm8k/gsm8k_eval.py --port 8000`
9
+
10
+ ```
11
+ Running GSM8K evaluation: 1319 questions, 5-shot
12
+ Evaluating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [01:49<00:00, 12.09it/s]
13
+
14
+ Results:
15
+ Accuracy: 0.952
16
+ Invalid responses: 0.000
17
+ Total latency: 109.097 s
18
+ Questions per second: 12.090
19
+ Total output tokens: 124914
20
+ Output tokens per second: 1144.985
21
+ ```
22
+
23
+ Results with `nvidia/DeepSeek-R1-NVFP4`
24
+ ```
25
+ Running GSM8K evaluation: 1319 questions, 5-shot
26
+ Evaluating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [01:52<00:00, 11.74it/s]
27
+
28
+ Results:
29
+ Accuracy: 0.954
30
+ Invalid responses: 0.000
31
+ Total latency: 112.357 s
32
+ Questions per second: 11.739
33
+ Total output tokens: 128126
34
+ Output tokens per second: 1140.344
35
+ ```