nm-research commited on
Commit
2acbf36
·
verified ·
1 Parent(s): ab94e6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +253 -2
README.md CHANGED
@@ -282,7 +282,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
282
  <summary>Benchmarking Command</summary>
283
 
284
  ```
285
- guidellm --model neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16 --target "http://localhost:8000/v1" --data-type emulated --data "prompt_tokens=<prompt_tokens>,generated_tokens=<generated_tokens>" --max seconds 360 --backend aiohttp_server
286
  ```
287
  </details>
288
 
@@ -327,6 +327,134 @@ guidellm --model neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16 --tar
327
  </tr>
328
  </thead>
329
  <tbody style="text-align: center" >
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
330
  <tr>
331
  <th rowspan="3" valign="top">H100</th>
332
  <td>2</td>
@@ -438,6 +566,128 @@ guidellm --model neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16 --tar
438
  </tr>
439
  </thead>
440
  <tbody style="text-align: center" >
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
441
  <tr>
442
  <th rowspan="3" valign="top">H100x4</th>
443
  <th>deepseek-ai/DeepSeek-R1-Distill-Llama-70B</th>
@@ -506,4 +756,5 @@ guidellm --model neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16 --tar
506
 
507
  **QPS: Queries per second.
508
 
509
- **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
 
 
282
  <summary>Benchmarking Command</summary>
283
 
284
  ```
285
+ guidellm --model neuralmagic/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic --target "http://localhost:8000/v1" --data-type emulated --data "prompt_tokens=<prompt_tokens>,generated_tokens=<generated_tokens>" --max seconds 360 --backend aiohttp_server
286
  ```
287
  </details>
288
 
 
327
  </tr>
328
  </thead>
329
  <tbody style="text-align: center" >
330
+ <tr>
331
+ <th rowspan="3" valign="top">A6000</th>
332
+ <td>4</td>
333
+ <th>deepseek-ai/DeepSeek-R1-Distill-Llama-70B</th>
334
+ <td>---</td>
335
+ <td>7.4</td>
336
+ <td>152</td>
337
+ <td>14.9</td>
338
+ <td>76</td>
339
+ <td>7.5</td>
340
+ <td>149</td>
341
+ <td>7.7</td>
342
+ <td>146</td>
343
+ <td>57.2</td>
344
+ <td>20</td>
345
+ <td>58.9</td>
346
+ <td>19</td>
347
+ <td>31.9</td>
348
+ <td>35</td>
349
+ <td>98.4</td>
350
+ <td>11</td>
351
+ </tr>
352
+ <tr>
353
+ <td>2</td>
354
+ <th>neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8</th>
355
+ <td>1.93</td>
356
+ <td>7.7</td>
357
+ <td>292</td>
358
+ <td>15.2</td>
359
+ <td>148</td>
360
+ <td>7.8</td>
361
+ <td>287</td>
362
+ <td>8.0</td>
363
+ <td>282</td>
364
+ <td>60.7</td>
365
+ <td>37</td>
366
+ <td>60.2</td>
367
+ <td>37</td>
368
+ <td>32.3</td>
369
+ <td>70</td>
370
+ <td>104.0</td>
371
+ <td>22</td>
372
+ </tr>
373
+ <tr>
374
+ <td>2</td>
375
+ <th>neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16</th>
376
+ <td>2.83</td>
377
+ <td>4.9</td>
378
+ <td>457</td>
379
+ <td>10.0</td>
380
+ <td>225</td>
381
+ <td>5.5</td>
382
+ <td>411</td>
383
+ <td>5.8</td>
384
+ <td>389</td>
385
+ <td>38.9</td>
386
+ <td>58</td>
387
+ <td>39.2</td>
388
+ <td>57</td>
389
+ <td>23.7</td>
390
+ <td>95</td>
391
+ <td>76.6</td>
392
+ <td>29</td>
393
+ </tr>
394
+ <tr>
395
+ <th rowspan="3" valign="top">A100</th>
396
+ <td>2</td>
397
+ <th>deepseek-ai/DeepSeek-R1-Distill-Llama-70B</th>
398
+ <td>---</td>
399
+ <td>6.4</td>
400
+ <td>157</td>
401
+ <td>12.8</td>
402
+ <td>79</td>
403
+ <td>6.6</td>
404
+ <td>153</td>
405
+ <td>6.7</td>
406
+ <td>151</td>
407
+ <td>50.4</td>
408
+ <td>20</td>
409
+ <td>50.8</td>
410
+ <td>20</td>
411
+ <td>27.0</td>
412
+ <td>37</td>
413
+ <td>85.4</td>
414
+ <td>12</td>
415
+ </tr>
416
+ <tr>
417
+ <td>2</td>
418
+ <th>neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8</th>
419
+ <td>1.48</td>
420
+ <td>4.1</td>
421
+ <td>245</td>
422
+ <td>8.2</td>
423
+ <td>123</td>
424
+ <td>4.2</td>
425
+ <td>238</td>
426
+ <td>4.3</td>
427
+ <td>235</td>
428
+ <td>32.4</td>
429
+ <td>31</td>
430
+ <td>32.8</td>
431
+ <td>31</td>
432
+ <td>17.6</td>
433
+ <td>57</td>
434
+ <td>90.8</td>
435
+ <td>11</td>
436
+ </tr>
437
+ <tr>
438
+ <td>1</td>
439
+ <th>neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16</th>
440
+ <td>2.69</td>
441
+ <td>4.6</td>
442
+ <td>440</td>
443
+ <td>9.2</td>
444
+ <td>220</td>
445
+ <td>4.9</td>
446
+ <td>407</td>
447
+ <td>5.2</td>
448
+ <td>389</td>
449
+ <td>35.3</td>
450
+ <td>57</td>
451
+ <td>36.3</td>
452
+ <td>55</td>
453
+ <td>21.2</td>
454
+ <td>95</td>
455
+ <td>68.1</td>
456
+ <td>30</td>
457
+ </tr>
458
  <tr>
459
  <th rowspan="3" valign="top">H100</th>
460
  <td>2</td>
 
566
  </tr>
567
  </thead>
568
  <tbody style="text-align: center" >
569
+ <tr>
570
+ <th rowspan="3" valign="top">A6000x4</th>
571
+ <th>deepseek-ai/DeepSeek-R1-Distill-Llama-70B</th>
572
+ <td>---</td>
573
+ <td>3.65</td>
574
+ <td>4102</td>
575
+ <td>1.56</td>
576
+ <td>1757</td>
577
+ <td>1.90</td>
578
+ <td>2143</td>
579
+ <td>1.48</td>
580
+ <td>1665</td>
581
+ <td>0.44</td>
582
+ <td>493</td>
583
+ <td>0.34</td>
584
+ <td>380</td>
585
+ <td>0.22</td>
586
+ <td>245</td>
587
+ <td>0.05</td>
588
+ <td>55</td>
589
+ </tr>
590
+ <tr>
591
+ <th>neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8</th>
592
+ <td>1.76</td>
593
+ <td>5.89</td>
594
+ <td>6625</td>
595
+ <td>2.94</td>
596
+ <td>3307</td>
597
+ <td>3.36</td>
598
+ <td>3775</td>
599
+ <td>2.59</td>
600
+ <td>2916</td>
601
+ <td>0.74</td>
602
+ <td>828</td>
603
+ <td>0.53</td>
604
+ <td>601</td>
605
+ <td>0.35</td>
606
+ <td>398</td>
607
+ <td>0.11</td>
608
+ <td>120</td>
609
+ </tr>
610
+ <tr>
611
+ <th>neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16</th>
612
+ <td>1.48</td>
613
+ <td>4.91</td>
614
+ <td>5528</td>
615
+ <td>2.01</td>
616
+ <td>2259</td>
617
+ <td>2.03</td>
618
+ <td>2280</td>
619
+ <td>1.12</td>
620
+ <td>1255</td>
621
+ <td>1.11</td>
622
+ <td>1251</td>
623
+ <td>0.76</td>
624
+ <td>852</td>
625
+ <td>0.24</td>
626
+ <td>267</td>
627
+ <td>0.07</td>
628
+ <td>81</td>
629
+ </tr>
630
+ <tr>
631
+ <th rowspan="3" valign="top">A100x4</th>
632
+ <th>deepseek-ai/DeepSeek-R1-Distill-Llama-70B</th>
633
+ <td>---</td>
634
+ <td>10.41</td>
635
+ <td>5235</td>
636
+ <td>5.10</td>
637
+ <td>2565</td>
638
+ <td>5.50</td>
639
+ <td>2766</td>
640
+ <td>4.36</td>
641
+ <td>2193</td>
642
+ <td>1.49</td>
643
+ <td>751</td>
644
+ <td>1.21</td>
645
+ <td>607</td>
646
+ <td>0.89</td>
647
+ <td>447</td>
648
+ <td>0.19</td>
649
+ <td>98</td>
650
+ </tr>
651
+ <tr>
652
+ <th>neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8</th>
653
+ <td>1.63</td>
654
+ <td>18.11</td>
655
+ <td>9103</td>
656
+ <td>8.90</td>
657
+ <td>4477</td>
658
+ <td>9.41</td>
659
+ <td>4730</td>
660
+ <td>7.42</td>
661
+ <td>3731</td>
662
+ <td>2.44</td>
663
+ <td>1229</td>
664
+ <td>1.89</td>
665
+ <td>948</td>
666
+ <td>1.26</td>
667
+ <td>631</td>
668
+ <td>0.30</td>
669
+ <td>149</td>
670
+ </tr>
671
+ <tr>
672
+ <th>neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16</th>
673
+ <td>1.12</td>
674
+ <td>12.63</td>
675
+ <td>6353</td>
676
+ <td>5.32</td>
677
+ <td>2673</td>
678
+ <td>5.58</td>
679
+ <td>2804</td>
680
+ <td>4.27</td>
681
+ <td>2144</td>
682
+ <td>2.30</td>
683
+ <td>1158</td>
684
+ <td>1.45</td>
685
+ <td>729</td>
686
+ <td>0.76</td>
687
+ <td>381</td>
688
+ <td>0.22</td>
689
+ <td>110</td>
690
+ </tr>
691
  <tr>
692
  <th rowspan="3" valign="top">H100x4</th>
693
  <th>deepseek-ai/DeepSeek-R1-Distill-Llama-70B</th>
 
756
 
757
  **QPS: Queries per second.
758
 
759
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
760
+