shubhrapandit commited on
Commit
b44a522
·
verified ·
1 Parent(s): 1563c4b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -13
README.md CHANGED
@@ -259,7 +259,7 @@ lm_eval \
259
  ## Inference Performance
260
 
261
 
262
- This model achieves up to 1.87x speedup in single-stream deployment and up to 1.69x speedup in multi-stream asynchronous deployment, depending on hardware and use-case scenario.
263
  The following performance benchmarks were conducted with [vLLM](https://docs.vllm.ai/en/latest/) version 0.7.2, and [GuideLLM](https://github.com/neuralmagic/guidellm).
264
 
265
  <details>
@@ -411,21 +411,21 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
411
  <tr>
412
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
413
  <td>1.70</td>
414
- <td>1.6</td>
415
  <td>766</td>
416
- <td>2.2</td>
417
  <td>1142</td>
418
- <td>2.6</td>
419
  <td>1348</td>
420
  </tr>
421
  <tr>
422
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
423
  <td>1.48</td>
424
- <td>1.0</td>
425
  <td>552</td>
426
- <td>2.0</td>
427
  <td>1010</td>
428
- <td>2.8</td>
429
  <td>1360</td>
430
  </tr>
431
  <tr>
@@ -442,21 +442,21 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
442
  <tr>
443
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
444
  <td>1.61</td>
445
- <td>3.4</td>
446
  <td>905</td>
447
- <td>5.2</td>
448
  <td>1406</td>
449
- <td>6.4</td>
450
  <td>1759</td>
451
  </tr>
452
  <tr>
453
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
454
  <td>1.33</td>
455
- <td>2.8</td>
456
  <td>761</td>
457
- <td>4.4</td>
458
  <td>1228</td>
459
- <td>5.4</td>
460
  <td>1480</td>
461
  </tr>
462
  </tbody>
 
259
  ## Inference Performance
260
 
261
 
262
+ This model achieves up to 1.87x speedup in single-stream deployment and up to 2.0x speedup in multi-stream asynchronous deployment, depending on hardware and use-case scenario.
263
  The following performance benchmarks were conducted with [vLLM](https://docs.vllm.ai/en/latest/) version 0.7.2, and [GuideLLM](https://github.com/neuralmagic/guidellm).
264
 
265
  <details>
 
411
  <tr>
412
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
413
  <td>1.70</td>
414
+ <td>0.8</td>
415
  <td>766</td>
416
+ <td>1.1</td>
417
  <td>1142</td>
418
+ <td>1.3</td>
419
  <td>1348</td>
420
  </tr>
421
  <tr>
422
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
423
  <td>1.48</td>
424
+ <td>0.5</td>
425
  <td>552</td>
426
+ <td>1.0</td>
427
  <td>1010</td>
428
+ <td>1.4</td>
429
  <td>1360</td>
430
  </tr>
431
  <tr>
 
442
  <tr>
443
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
444
  <td>1.61</td>
445
+ <td>1.7</td>
446
  <td>905</td>
447
+ <td>2.6</td>
448
  <td>1406</td>
449
+ <td>3.2</td>
450
  <td>1759</td>
451
  </tr>
452
  <tr>
453
  <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
454
  <td>1.33</td>
455
+ <td>1.4</td>
456
  <td>761</td>
457
+ <td>2.2</td>
458
  <td>1228</td>
459
+ <td>2.7</td>
460
  <td>1480</td>
461
  </tr>
462
  </tbody>