| Category | Metric | neuralmagic/Pixtral-Large-Instruct-2411-hf | neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-dynamic | Recovery (%) |
|---|---|---|---|---|
| Vision | MMMU (val, CoT) explicit_prompt_relaxed_correctness |
63.56 | 63.44 | 99.81% |
| VQAv2 (val) vqa_match |
79.03 | 79.06 | 100.04% | |
| DocVQA (val) anls |
89.55 | 89.63 | 100.09% | |
| ChartQA (test, CoT) anywhere_in_answer_relaxed_correctness |
82.24 | 82.80 | 100.68% | |
| Mathvista (testmini, CoT) explicit_prompt_relaxed_correctness |
67.3 | 66.50 | 98.81% | |
| Average Score | 76.34 | 76.29 | 99.93% | |
| Text | MGSM (CoT) | 76.05 | 75.58 | 99.38% |
| MMLU (5-shot) | 82.8 | 82.74 | 99.93% |
| Document Visual Question Answering 1680W x 2240H 64/128 |
Visual Reasoning 640W x 480H 128/128 |
Image Captioning 480W x 360H 0/128 |
|||||||
|---|---|---|---|---|---|---|---|---|---|
| Hardware | Number of GPUs | Model | Average Cost Reduction | Latency (s) | Queries Per Dollar | Latency (s) | Queries Per Dollar | Latency (s) | Queries Per Dollar |
| A100 | 4 | neuralmagic/Pixtral-Large-Instruct-2411-hf | 7.5 | 67 | 6.5 | 77 | 6.4 | 79 | |
| 2 | neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8 | 1.86 | 8.1 | 124 | 7.1 | 142 | 6.8 | 148 | |
| 2 | neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16 | 2.52 | 6.9 | 147 | 5.1 | 199 | 4.5 | 221 | |
| H100 | 4 | neuralmagic/Pixtral-Large-Instruct-2411-hf | 4.4 | 67 | 3.9 | 74 | 3.7 | 79 | |
| 2 | neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic | 1.82 | 4.7 | 120 | 4.1 | 137 | 3.9 | 145 | |
| 2 | neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16 | 1.87 | 4.7 | 120 | 3.9 | 144 | 3.8 | 149 | |
| Document Visual Question Answering 1680W x 2240H 64/128 |
Visual Reasoning 640W x 480H 128/128 |
Image Captioning 480W x 360H 0/128 |
||||||
|---|---|---|---|---|---|---|---|---|
| Hardware | Model | Average Cost Reduction | Maximum throughput (QPS) | Queries Per Dollar | Maximum throughput (QPS) | Queries Per Dollar | Maximum throughput (QPS) | Queries Per Dollar |
| A100x4 | neuralmagic/Pixtral-Large-Instruct-2411-hf | 0.4 | 222 | 0.7 | 341 | 0.8 | 399 | |
| neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8 | 1.70 | 0.8 | 383 | 1.1 | 571 | 1.3 | 674 | |
| neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16 | 1.48 | 0.5 | 276 | 1.0 | 505 | 1.4 | 680 | |
| H100x4 | neuralmagic/Pixtral-Large-Instruct-2411-hf | 1.0 | 284 | 1.6 | 465 | 1.8 | 511 | |
| neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic | 1.61 | 1.7 | 467 | 2.6 | 726 | 3.2 | 908 | |
| neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16 | 1.33 | 1.4 | 393 | 2.2 | 726 | 2.7 | 764 | |
| Document Visual Question Answering 1680W x 2240H 64/128 |
Visual Reasoning 640W x 480H 128/128 |
Image Captioning 480W x 360H 0/128 |
||||||
|---|---|---|---|---|---|---|---|---|
| Hardware | Model | Average Cost Reduction | Maximum throughput (QPS) | Queries Per Dollar | Maximum throughput (QPS) | Queries Per Dollar | Maximum throughput (QPS) | Queries Per Dollar |
| A100x4 | neuralmagic/Pixtral-Large-Instruct-2411-hf | 0.4 | 222 | 0.7 | 341 | 0.8 | 399 | |
| neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8 | 1.70 | 0.8 | 766 | 1.1 | 1142 | 1.3 | 1348 | |
| neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16 | 1.48 | 0.5 | 552 | 1.0 | 1010 | 1.4 | 1360 | |
| H100x4 | neuralmagic/Pixtral-Large-Instruct-2411-hf | 1.0 | 284 | 1.6 | 465 | 1.8 | 511 | |
| neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic | 1.61 | 1.7 | 905 | 2.6 | 1406 | 3.2 | 1759 | |
| neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16 | 1.33 | 1.4 | 761 | 2.2 | 1228 | 2.7 | 1480 | |