RedHatAI
/

Pixtral-Large-Instruct-2411-hf-quantized.w8a8

@@ -8,14 +8,14 @@ license: apache-2.0
 license_link: https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md
 language:
   - en
-base_model: nm-testing/Pixtral-Large-Instruct-2411-hf
 library_name: transformers
 ---
 # Pixtral-Large-Instruct-2411-hf-quantized.w8a8
 ## Model Overview
-- **Model Architecture:** nm-testing/Pixtral-Large-Instruct-2411-hf
   - **Input:** Vision-Text
   - **Output:** Text
 - **Model Optimizations:**
@@ -25,11 +25,11 @@ library_name: transformers
 - **Version:** 1.0
 - **Model Developers:** Neural Magic
-Quantized version of [nm-testing/Pixtral-Large-Instruct-2411-hf](https://huggingface.co/nm-testing/Pixtral-Large-Instruct-2411-hf/tree/main).
 ### Model Optimizations
-This model was obtained by quantizing the weights of [nm-testing/Pixtral-Large-Instruct-2411-hf](https://huggingface.co/nm-testing/Pixtral-Large-Instruct-2411-hf/tree/main) to INT8 data type, ready for inference with vLLM >= 0.5.2.
 ## Deployment
@@ -85,7 +85,7 @@ from llmcompressor.transformers import oneshot
 from llmcompressor.transformers.tracing import TraceableLlavaForConditionalGeneration
 # Load model.
-model_id = "nm-testing/Pixtral-Large-Instruct-2411-hf"
 model = TraceableLlavaForConditionalGeneration.from_pretrained(
     model_id, device_map="auto", torch_dtype="auto"
 )
@@ -150,6 +150,71 @@ The model was evaluated on OpenLLM Leaderboard [V1](https://huggingface.co/space
 ### Accuracy
 ## Inference Performance
@@ -159,7 +224,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
 <details>
 <summary>Benchmarking Command</summary>
 ```
-  guidellm --model nm-testing/Pixtral-Large-Instruct-2411-hf-quantized.w8a8 --target "http://localhost:8000/v1" --data-type emulated --data prompt_tokens=<prompt_tokens>,generated_tokens=<generated_tokens>,images=<num_images>,width=<image_width>,height=<image_height> --max seconds 120 --backend aiohttp_server
 ```
 </details>
@@ -194,7 +259,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
     <tr>
       <th rowspan="3" valign="top">A100</th>
       <td>4</td>
-      <td>nm-testing/Pixtral-Large-Instruct-2411-hf</td>
       <td></td>
       <td>7.5</td>
       <td>67</td>
@@ -205,7 +270,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
     </tr>
     <tr>
       <td>2</td>
-      <td>nm-testing/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
       <td>1.86</td>
       <td>8.1</td>
       <td>124</td>
@@ -216,7 +281,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
     </tr>
     <tr>
       <td>2</td>
-      <td>nm-testing/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
       <td>2.52</td>
       <td>6.9</td>
       <td>147</td>
@@ -228,7 +293,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
     <tr>
       <th rowspan="3" valign="top">H100</th>
       <td>4</td>
-      <td>nm-testing/Pixtral-Large-Instruct-2411-hf</td>
       <td></td>
       <td>4.4</td>
       <td>67</td>
@@ -239,7 +304,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
     </tr>
     <tr>
       <td>2</td>
-      <td>nm-testing/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
       <td>1.82</td>
       <td>4.7</td>
       <td>120</td>
@@ -250,7 +315,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
     </tr>
     <tr>
       <td>2</td>
-      <td>nm-testing/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
       <td>1.87</td>
       <td>4.7</td>
       <td>120</td>
@@ -293,7 +358,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
   <tbody style="text-align: center">
    <tr>
       <th rowspan="3" valign="top">A100x4</th>
-      <td>nm-testing/Pixtral-Large-Instruct-2411-hf</td>
       <td></td>
       <td>0.4</td>
       <td>222</td>
@@ -303,7 +368,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
       <td>399</td>
     </tr>
     <tr>
-      <td>nm-testing/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
       <td>1.70</td>
       <td>1.6</td>
       <td>383</td>
@@ -313,7 +378,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
       <td>674</td>
     </tr>
     <tr>
-      <td>nm-testing/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
       <td>1.48</td>
       <td>1.0</td>
       <td>276</td>
@@ -324,7 +389,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
     </tr>
     <tr>
       <<th rowspan="3" valign="top">H100x4</th>
-      <td>nm-testing/Pixtral-Large-Instruct-2411-hf</td>
       <td></td>
       <td>1.0</td>
       <td>284</td>
@@ -334,7 +399,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
       <td>511</td>
     </tr>
     <tr>
-      <td>nm-testing/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
       <td>1.61</td>
       <td>3.4</td>
       <td>467</td>
@@ -344,7 +409,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
       <td>908</td>
     </tr>
     <tr>
-      <td>nm-testing/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
       <td>1.33</td>
       <td>2.8</td>
       <td>393</td>

 license_link: https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md
 language:
   - en
+base_model: neuralmagic/Pixtral-Large-Instruct-2411-hf
 library_name: transformers
 ---
 # Pixtral-Large-Instruct-2411-hf-quantized.w8a8
 ## Model Overview
+- **Model Architecture:** neuralmagic/Pixtral-Large-Instruct-2411-hf
   - **Input:** Vision-Text
   - **Output:** Text
 - **Model Optimizations:**
 - **Version:** 1.0
 - **Model Developers:** Neural Magic
+Quantized version of [neuralmagic/Pixtral-Large-Instruct-2411-hf](https://huggingface.co/neuralmagic/Pixtral-Large-Instruct-2411-hf/tree/main).
 ### Model Optimizations
+This model was obtained by quantizing the weights of [neuralmagic/Pixtral-Large-Instruct-2411-hf](https://huggingface.co/neuralmagic/Pixtral-Large-Instruct-2411-hf/tree/main) to INT8 data type, ready for inference with vLLM >= 0.5.2.
 ## Deployment
 from llmcompressor.transformers.tracing import TraceableLlavaForConditionalGeneration
 # Load model.
+model_id = "neuralmagic/Pixtral-Large-Instruct-2411-hf"
 model = TraceableLlavaForConditionalGeneration.from_pretrained(
     model_id, device_map="auto", torch_dtype="auto"
 )
 ### Accuracy
+<table>
+  <thead>
+    <tr>
+      <th>Category</th>
+      <th>Metric</th>
+      <th>neuralmagic/Pixtral-Large-Instruct-2411-hf</th>
+      <th>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</th>
+      <th>Recovery (%)</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td rowspan="6"><b>Vision</b></td>
+      <td>MMMU (val, CoT)<br><i>explicit_prompt_relaxed_correctness</i></td>
+      <td>63.56</td>
+      <td>63.89</td>
+      <td>100.52%</td>
+    </tr>
+    <tr>
+      <td>VQAv2 (val)<br><i>vqa_match</i></td>
+      <td>79.03</td>
+      <td>79.12</td>
+      <td>100.11%</td>
+    </tr>
+    <tr>
+      <td>DocVQA (val)<br><i>anls</i></td>
+      <td>89.55</td>
+      <td>89.80</td>
+      <td>100.28%</td>
+    </tr>
+    <tr>
+      <td>ChartQA (test, CoT)<br><i>anywhere_in_answer_relaxed_correctness</i></td>
+      <td>82.24</td>
+      <td>80.44</td>
+      <td>97.81%</td>
+    </tr>
+    <tr>
+      <td>Mathvista (testmini, CoT)<br><i>explicit_prompt_relaxed_correctness</i></td>
+      <td>67.3</td>
+      <td>66.50</td>
+      <td>98.81%</td>
+    </tr>
+    <tr>
+      <td><b>Average Score</b></td>
+      <td><b>76.34</b></td>
+      <td><b>75.95</b></td>
+      <td><b>99.49%</b></td>
+    </tr>
+    <tr>
+      <td rowspan="2"><b>Text</b></td>
+      <td>MGSM (CoT)</td>
+      <td>76.05</td>
+      <td>74.76</td>
+      <td>98.30%</td>
+    </tr>
+    <tr>
+      <td>MMLU (5-shot)</td>
+      <td>82.8</td>
+      <td>82.9</td>
+      <td>100.12%</td>
+    </tr>
+  </tbody>
+</table>
 ## Inference Performance
 <details>
 <summary>Benchmarking Command</summary>
 ```
+  guidellm --model neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8 --target "http://localhost:8000/v1" --data-type emulated --data prompt_tokens=<prompt_tokens>,generated_tokens=<generated_tokens>,images=<num_images>,width=<image_width>,height=<image_height> --max seconds 120 --backend aiohttp_server
 ```
 </details>
     <tr>
       <th rowspan="3" valign="top">A100</th>
       <td>4</td>
+      <td>neuralmagic/Pixtral-Large-Instruct-2411-hf</td>
       <td></td>
       <td>7.5</td>
       <td>67</td>
     </tr>
     <tr>
       <td>2</td>
+      <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
       <td>1.86</td>
       <td>8.1</td>
       <td>124</td>
     </tr>
     <tr>
       <td>2</td>
+      <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
       <td>2.52</td>
       <td>6.9</td>
       <td>147</td>
     <tr>
       <th rowspan="3" valign="top">H100</th>
       <td>4</td>
+      <td>neuralmagic/Pixtral-Large-Instruct-2411-hf</td>
       <td></td>
       <td>4.4</td>
       <td>67</td>
     </tr>
     <tr>
       <td>2</td>
+      <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
       <td>1.82</td>
       <td>4.7</td>
       <td>120</td>
     </tr>
     <tr>
       <td>2</td>
+      <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
       <td>1.87</td>
       <td>4.7</td>
       <td>120</td>
   <tbody style="text-align: center">
    <tr>
       <th rowspan="3" valign="top">A100x4</th>
+      <td>neuralmagic/Pixtral-Large-Instruct-2411-hf</td>
       <td></td>
       <td>0.4</td>
       <td>222</td>
       <td>399</td>
     </tr>
     <tr>
+      <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
       <td>1.70</td>
       <td>1.6</td>
       <td>383</td>
       <td>674</td>
     </tr>
     <tr>
+      <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
       <td>1.48</td>
       <td>1.0</td>
       <td>276</td>
     </tr>
     <tr>
       <<th rowspan="3" valign="top">H100x4</th>
+      <td>neuralmagic/Pixtral-Large-Instruct-2411-hf</td>
       <td></td>
       <td>1.0</td>
       <td>284</td>
       <td>511</td>
     </tr>
     <tr>
+      <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
       <td>1.61</td>
       <td>3.4</td>
       <td>467</td>
       <td>908</td>
     </tr>
     <tr>
+      <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
       <td>1.33</td>
       <td>2.8</td>
       <td>393</td>