Update README.md
Browse files
README.md
CHANGED
|
@@ -259,7 +259,7 @@ lm_eval \
|
|
| 259 |
## Inference Performance
|
| 260 |
|
| 261 |
|
| 262 |
-
This model achieves up to 1.87x speedup in single-stream deployment and up to
|
| 263 |
The following performance benchmarks were conducted with [vLLM](https://docs.vllm.ai/en/latest/) version 0.7.2, and [GuideLLM](https://github.com/neuralmagic/guidellm).
|
| 264 |
|
| 265 |
<details>
|
|
@@ -411,21 +411,21 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
| 411 |
<tr>
|
| 412 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
|
| 413 |
<td>1.70</td>
|
| 414 |
-
<td>
|
| 415 |
<td>766</td>
|
| 416 |
-
<td>
|
| 417 |
<td>1142</td>
|
| 418 |
-
<td>
|
| 419 |
<td>1348</td>
|
| 420 |
</tr>
|
| 421 |
<tr>
|
| 422 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
|
| 423 |
<td>1.48</td>
|
| 424 |
-
<td>
|
| 425 |
<td>552</td>
|
| 426 |
-
<td>
|
| 427 |
<td>1010</td>
|
| 428 |
-
<td>
|
| 429 |
<td>1360</td>
|
| 430 |
</tr>
|
| 431 |
<tr>
|
|
@@ -442,21 +442,21 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
| 442 |
<tr>
|
| 443 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
|
| 444 |
<td>1.61</td>
|
| 445 |
-
<td>
|
| 446 |
<td>905</td>
|
| 447 |
-
<td>
|
| 448 |
<td>1406</td>
|
| 449 |
-
<td>
|
| 450 |
<td>1759</td>
|
| 451 |
</tr>
|
| 452 |
<tr>
|
| 453 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
|
| 454 |
<td>1.33</td>
|
| 455 |
-
<td>
|
| 456 |
<td>761</td>
|
| 457 |
-
<td>
|
| 458 |
<td>1228</td>
|
| 459 |
-
<td>
|
| 460 |
<td>1480</td>
|
| 461 |
</tr>
|
| 462 |
</tbody>
|
|
|
|
| 259 |
## Inference Performance
|
| 260 |
|
| 261 |
|
| 262 |
+
This model achieves up to 1.87x speedup in single-stream deployment and up to 2.0x speedup in multi-stream asynchronous deployment, depending on hardware and use-case scenario.
|
| 263 |
The following performance benchmarks were conducted with [vLLM](https://docs.vllm.ai/en/latest/) version 0.7.2, and [GuideLLM](https://github.com/neuralmagic/guidellm).
|
| 264 |
|
| 265 |
<details>
|
|
|
|
| 411 |
<tr>
|
| 412 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
|
| 413 |
<td>1.70</td>
|
| 414 |
+
<td>0.8</td>
|
| 415 |
<td>766</td>
|
| 416 |
+
<td>1.1</td>
|
| 417 |
<td>1142</td>
|
| 418 |
+
<td>1.3</td>
|
| 419 |
<td>1348</td>
|
| 420 |
</tr>
|
| 421 |
<tr>
|
| 422 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
|
| 423 |
<td>1.48</td>
|
| 424 |
+
<td>0.5</td>
|
| 425 |
<td>552</td>
|
| 426 |
+
<td>1.0</td>
|
| 427 |
<td>1010</td>
|
| 428 |
+
<td>1.4</td>
|
| 429 |
<td>1360</td>
|
| 430 |
</tr>
|
| 431 |
<tr>
|
|
|
|
| 442 |
<tr>
|
| 443 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
|
| 444 |
<td>1.61</td>
|
| 445 |
+
<td>1.7</td>
|
| 446 |
<td>905</td>
|
| 447 |
+
<td>2.6</td>
|
| 448 |
<td>1406</td>
|
| 449 |
+
<td>3.2</td>
|
| 450 |
<td>1759</td>
|
| 451 |
</tr>
|
| 452 |
<tr>
|
| 453 |
<td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
|
| 454 |
<td>1.33</td>
|
| 455 |
+
<td>1.4</td>
|
| 456 |
<td>761</td>
|
| 457 |
+
<td>2.2</td>
|
| 458 |
<td>1228</td>
|
| 459 |
+
<td>2.7</td>
|
| 460 |
<td>1480</td>
|
| 461 |
</tr>
|
| 462 |
</tbody>
|