shubhrapandit commited on
Commit
d62bb6d
·
verified ·
1 Parent(s): e3631a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -18
README.md CHANGED
@@ -8,14 +8,14 @@ license: apache-2.0
8
  license_link: https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md
9
  language:
10
  - en
11
- base_model: nm-testing/Pixtral-Large-Instruct-2411-hf
12
  library_name: transformers
13
  ---
14
 
15
  # Pixtral-Large-Instruct-2411-hf-quantized.w8a8
16
 
17
  ## Model Overview
18
- - **Model Architecture:** nm-testing/Pixtral-Large-Instruct-2411-hf
19
  - **Input:** Vision-Text
20
  - **Output:** Text
21
  - **Model Optimizations:**
@@ -25,11 +25,11 @@ library_name: transformers
25
  - **Version:** 1.0
26
  - **Model Developers:** Neural Magic
27
 
28
- Quantized version of [nm-testing/Pixtral-Large-Instruct-2411-hf](https://huggingface.co/nm-testing/Pixtral-Large-Instruct-2411-hf/tree/main).
29
 
30
  ### Model Optimizations
31
 
32
- This model was obtained by quantizing the weights of [nm-testing/Pixtral-Large-Instruct-2411-hf](https://huggingface.co/nm-testing/Pixtral-Large-Instruct-2411-hf/tree/main) to INT8 data type, ready for inference with vLLM >= 0.5.2.
33
 
34
  ## Deployment
35
 
@@ -85,7 +85,7 @@ from llmcompressor.transformers import oneshot
85
  from llmcompressor.transformers.tracing import TraceableLlavaForConditionalGeneration
86
 
87
  # Load model.
88
- model_id = "nm-testing/Pixtral-Large-Instruct-2411-hf"
89
  model = TraceableLlavaForConditionalGeneration.from_pretrained(
90
  model_id, device_map="auto", torch_dtype="auto"
91
  )
@@ -150,6 +150,71 @@ The model was evaluated on OpenLLM Leaderboard [V1](https://huggingface.co/space
150
 
151
  ### Accuracy
152
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
  ## Inference Performance
154
 
155
 
@@ -159,7 +224,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
159
  <details>
160
  <summary>Benchmarking Command</summary>
161
  ```
162
- guidellm --model nm-testing/Pixtral-Large-Instruct-2411-hf-quantized.w8a8 --target "http://localhost:8000/v1" --data-type emulated --data prompt_tokens=<prompt_tokens>,generated_tokens=<generated_tokens>,images=<num_images>,width=<image_width>,height=<image_height> --max seconds 120 --backend aiohttp_server
163
  ```
164
 
165
  </details>
@@ -194,7 +259,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
194
  <tr>
195
  <th rowspan="3" valign="top">A100</th>
196
  <td>4</td>
197
- <td>nm-testing/Pixtral-Large-Instruct-2411-hf</td>
198
  <td></td>
199
  <td>7.5</td>
200
  <td>67</td>
@@ -205,7 +270,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
205
  </tr>
206
  <tr>
207
  <td>2</td>
208
- <td>nm-testing/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
209
  <td>1.86</td>
210
  <td>8.1</td>
211
  <td>124</td>
@@ -216,7 +281,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
216
  </tr>
217
  <tr>
218
  <td>2</td>
219
- <td>nm-testing/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
220
  <td>2.52</td>
221
  <td>6.9</td>
222
  <td>147</td>
@@ -228,7 +293,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
228
  <tr>
229
  <th rowspan="3" valign="top">H100</th>
230
  <td>4</td>
231
- <td>nm-testing/Pixtral-Large-Instruct-2411-hf</td>
232
  <td></td>
233
  <td>4.4</td>
234
  <td>67</td>
@@ -239,7 +304,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
239
  </tr>
240
  <tr>
241
  <td>2</td>
242
- <td>nm-testing/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
243
  <td>1.82</td>
244
  <td>4.7</td>
245
  <td>120</td>
@@ -250,7 +315,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
250
  </tr>
251
  <tr>
252
  <td>2</td>
253
- <td>nm-testing/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
254
  <td>1.87</td>
255
  <td>4.7</td>
256
  <td>120</td>
@@ -293,7 +358,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
293
  <tbody style="text-align: center">
294
  <tr>
295
  <th rowspan="3" valign="top">A100x4</th>
296
- <td>nm-testing/Pixtral-Large-Instruct-2411-hf</td>
297
  <td></td>
298
  <td>0.4</td>
299
  <td>222</td>
@@ -303,7 +368,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
303
  <td>399</td>
304
  </tr>
305
  <tr>
306
- <td>nm-testing/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
307
  <td>1.70</td>
308
  <td>1.6</td>
309
  <td>383</td>
@@ -313,7 +378,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
313
  <td>674</td>
314
  </tr>
315
  <tr>
316
- <td>nm-testing/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
317
  <td>1.48</td>
318
  <td>1.0</td>
319
  <td>276</td>
@@ -324,7 +389,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
324
  </tr>
325
  <tr>
326
  <<th rowspan="3" valign="top">H100x4</th>
327
- <td>nm-testing/Pixtral-Large-Instruct-2411-hf</td>
328
  <td></td>
329
  <td>1.0</td>
330
  <td>284</td>
@@ -334,7 +399,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
334
  <td>511</td>
335
  </tr>
336
  <tr>
337
- <td>nm-testing/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
338
  <td>1.61</td>
339
  <td>3.4</td>
340
  <td>467</td>
@@ -344,7 +409,7 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
344
  <td>908</td>
345
  </tr>
346
  <tr>
347
- <td>nm-testing/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
348
  <td>1.33</td>
349
  <td>2.8</td>
350
  <td>393</td>
 
8
  license_link: https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md
9
  language:
10
  - en
11
+ base_model: neuralmagic/Pixtral-Large-Instruct-2411-hf
12
  library_name: transformers
13
  ---
14
 
15
  # Pixtral-Large-Instruct-2411-hf-quantized.w8a8
16
 
17
  ## Model Overview
18
+ - **Model Architecture:** neuralmagic/Pixtral-Large-Instruct-2411-hf
19
  - **Input:** Vision-Text
20
  - **Output:** Text
21
  - **Model Optimizations:**
 
25
  - **Version:** 1.0
26
  - **Model Developers:** Neural Magic
27
 
28
+ Quantized version of [neuralmagic/Pixtral-Large-Instruct-2411-hf](https://huggingface.co/neuralmagic/Pixtral-Large-Instruct-2411-hf/tree/main).
29
 
30
  ### Model Optimizations
31
 
32
+ This model was obtained by quantizing the weights of [neuralmagic/Pixtral-Large-Instruct-2411-hf](https://huggingface.co/neuralmagic/Pixtral-Large-Instruct-2411-hf/tree/main) to INT8 data type, ready for inference with vLLM >= 0.5.2.
33
 
34
  ## Deployment
35
 
 
85
  from llmcompressor.transformers.tracing import TraceableLlavaForConditionalGeneration
86
 
87
  # Load model.
88
+ model_id = "neuralmagic/Pixtral-Large-Instruct-2411-hf"
89
  model = TraceableLlavaForConditionalGeneration.from_pretrained(
90
  model_id, device_map="auto", torch_dtype="auto"
91
  )
 
150
 
151
  ### Accuracy
152
 
153
+ <table>
154
+ <thead>
155
+ <tr>
156
+ <th>Category</th>
157
+ <th>Metric</th>
158
+ <th>neuralmagic/Pixtral-Large-Instruct-2411-hf</th>
159
+ <th>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</th>
160
+ <th>Recovery (%)</th>
161
+ </tr>
162
+ </thead>
163
+ <tbody>
164
+ <tr>
165
+ <td rowspan="6"><b>Vision</b></td>
166
+ <td>MMMU (val, CoT)<br><i>explicit_prompt_relaxed_correctness</i></td>
167
+ <td>63.56</td>
168
+ <td>63.89</td>
169
+ <td>100.52%</td>
170
+ </tr>
171
+ <tr>
172
+ <td>VQAv2 (val)<br><i>vqa_match</i></td>
173
+ <td>79.03</td>
174
+ <td>79.12</td>
175
+ <td>100.11%</td>
176
+ </tr>
177
+ <tr>
178
+ <td>DocVQA (val)<br><i>anls</i></td>
179
+ <td>89.55</td>
180
+ <td>89.80</td>
181
+ <td>100.28%</td>
182
+ </tr>
183
+ <tr>
184
+ <td>ChartQA (test, CoT)<br><i>anywhere_in_answer_relaxed_correctness</i></td>
185
+ <td>82.24</td>
186
+ <td>80.44</td>
187
+ <td>97.81%</td>
188
+ </tr>
189
+ <tr>
190
+ <td>Mathvista (testmini, CoT)<br><i>explicit_prompt_relaxed_correctness</i></td>
191
+ <td>67.3</td>
192
+ <td>66.50</td>
193
+ <td>98.81%</td>
194
+ </tr>
195
+ <tr>
196
+ <td><b>Average Score</b></td>
197
+ <td><b>76.34</b></td>
198
+ <td><b>75.95</b></td>
199
+ <td><b>99.49%</b></td>
200
+ </tr>
201
+ <tr>
202
+ <td rowspan="2"><b>Text</b></td>
203
+ <td>MGSM (CoT)</td>
204
+ <td>76.05</td>
205
+ <td>74.76</td>
206
+ <td>98.30%</td>
207
+ </tr>
208
+ <tr>
209
+ <td>MMLU (5-shot)</td>
210
+ <td>82.8</td>
211
+ <td>82.9</td>
212
+ <td>100.12%</td>
213
+ </tr>
214
+ </tbody>
215
+ </table>
216
+
217
+
218
  ## Inference Performance
219
 
220
 
 
224
  <details>
225
  <summary>Benchmarking Command</summary>
226
  ```
227
+ guidellm --model neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8 --target "http://localhost:8000/v1" --data-type emulated --data prompt_tokens=<prompt_tokens>,generated_tokens=<generated_tokens>,images=<num_images>,width=<image_width>,height=<image_height> --max seconds 120 --backend aiohttp_server
228
  ```
229
 
230
  </details>
 
259
  <tr>
260
  <th rowspan="3" valign="top">A100</th>
261
  <td>4</td>
262
+ <td>neuralmagic/Pixtral-Large-Instruct-2411-hf</td>
263
  <td></td>
264
  <td>7.5</td>
265
  <td>67</td>
 
270
  </tr>
271
  <tr>
272
  <td>2</td>
273
+ <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
274
  <td>1.86</td>
275
  <td>8.1</td>
276
  <td>124</td>
 
281
  </tr>
282
  <tr>
283
  <td>2</td>
284
+ <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
285
  <td>2.52</td>
286
  <td>6.9</td>
287
  <td>147</td>
 
293
  <tr>
294
  <th rowspan="3" valign="top">H100</th>
295
  <td>4</td>
296
+ <td>neuralmagic/Pixtral-Large-Instruct-2411-hf</td>
297
  <td></td>
298
  <td>4.4</td>
299
  <td>67</td>
 
304
  </tr>
305
  <tr>
306
  <td>2</td>
307
+ <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
308
  <td>1.82</td>
309
  <td>4.7</td>
310
  <td>120</td>
 
315
  </tr>
316
  <tr>
317
  <td>2</td>
318
+ <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
319
  <td>1.87</td>
320
  <td>4.7</td>
321
  <td>120</td>
 
358
  <tbody style="text-align: center">
359
  <tr>
360
  <th rowspan="3" valign="top">A100x4</th>
361
+ <td>neuralmagic/Pixtral-Large-Instruct-2411-hf</td>
362
  <td></td>
363
  <td>0.4</td>
364
  <td>222</td>
 
368
  <td>399</td>
369
  </tr>
370
  <tr>
371
+ <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w8a8</td>
372
  <td>1.70</td>
373
  <td>1.6</td>
374
  <td>383</td>
 
378
  <td>674</td>
379
  </tr>
380
  <tr>
381
+ <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
382
  <td>1.48</td>
383
  <td>1.0</td>
384
  <td>276</td>
 
389
  </tr>
390
  <tr>
391
  <<th rowspan="3" valign="top">H100x4</th>
392
+ <td>neuralmagic/Pixtral-Large-Instruct-2411-hf</td>
393
  <td></td>
394
  <td>1.0</td>
395
  <td>284</td>
 
399
  <td>511</td>
400
  </tr>
401
  <tr>
402
+ <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-FP8-Dynamic</td>
403
  <td>1.61</td>
404
  <td>3.4</td>
405
  <td>467</td>
 
409
  <td>908</td>
410
  </tr>
411
  <tr>
412
+ <td>neuralmagic/Pixtral-Large-Instruct-2411-hf-quantized.w4a16</td>
413
  <td>1.33</td>
414
  <td>2.8</td>
415
  <td>393</td>