ChibuUkachi commited on
Commit
4be8317
·
verified ·
1 Parent(s): 31df79e

add results

Browse files
Files changed (1) hide show
  1. README.md +106 -1
README.md CHANGED
@@ -24,7 +24,7 @@ tags:
24
  ---
25
 
26
  # Ministral 3 14B Instruct 2512
27
- The largest model in the Ministral 3 family, **Ministral 3 14B** offers frontier capabilities and performance comparable to its larger [Mistral Small 3.2 24B](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506) counterpart. A powerful and efficient language model with vision capabilities.
28
 
29
  This model is the instruct post-trained version in **FP8**, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.
30
 
@@ -527,7 +527,112 @@ model = Mistral3ForConditionalGeneration.from_pretrained(
527
  quantization_config=FineGrainedFP8Config(dequantize=True)
528
  )
529
  ```
 
530
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
531
  ## License
532
 
533
  This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
 
24
  ---
25
 
26
  # Ministral 3 14B Instruct 2512
27
+ The largest model in the Ministral 3 family, **Ministral 3 14B** offers frontier capabilities and performance comparable to its larger [Mistral Small 3.2 24B](https://huggingface.co/mistralai/ Ministral-3-14B-Instruct-2512) counterpart. A powerful and efficient language model with vision capabilities.
28
 
29
  This model is the instruct post-trained version in **FP8**, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.
30
 
 
527
  quantization_config=FineGrainedFP8Config(dequantize=True)
528
  )
529
  ```
530
+ ## Red Hat AI Evaluations
531
 
532
+ As part of the model validation effort, Red Hat conducted independent accuracy evaluations and the results are presented below.
533
+ The model was evaluated with [vLLM](https://vllm.ai/) version 0.11.2 and either [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) or
534
+ [lighteval](https://github.com/huggingface/lighteval) depending on the benchmark.
535
+
536
+ <details>
537
+ <summary>Evaluation commands</summary>
538
+
539
+ All evaluations were conducted using the vLLM server interface.
540
+ The server is first initialized with the following command on 1 H200 GPUs:
541
+ ```bash
542
+ vllm serve RedHatAI/Ministral-3-14B-Instruct-2512 \
543
+ --max-model-len 262144 \
544
+ --tokenizer_mode mistral \
545
+ --config_format mistral \
546
+ --load_format mistral \
547
+ --limit-mm-per-prompt '{"image": 10}'
548
+ ```
549
+
550
+ MMLU-Pro, IFEval and MMMU were evaluated using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) as follows.
551
+ ```bash
552
+ lm_eval \
553
+ --model local-chat-completions \
554
+ --tasks mmlu_pro,ifeval,mmmu_val \
555
+ --model_args "model=RedHatAI/Ministral-3-14B-Instruct-2512,max_length=64000,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=64,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=1200,max_images=10" \
556
+ --apply_chat_template \
557
+ --fewshot_as_multiturn \
558
+ --output_path results_lmeval_ministral \
559
+ --gen_kwargs "do_sample=True,temperature=0.15"
560
+ ```
561
+
562
+ AIME25, GPQA Diamond and Math 500 were evaluated using [lighteval](https://github.com/huggingface/lighteval) as follows.
563
+
564
+ litellm_config.yaml
565
+ ```yaml
566
+ model_parameters:
567
+ provider: "hosted_vllm"
568
+ model_name: "hosted_vllm/RedHatAI/Ministral-3-14B-Instruct-2512"
569
+ base_url: "http://0.0.0.0:8000/v1"
570
+ api_key: ""
571
+ timeout: 1200
572
+ concurrent_requests: 64
573
+ max_model_length: 262144
574
+ generation_parameters:
575
+ temperature: 0.15
576
+ max_new_tokens: 64000
577
+ ```
578
+
579
+ ```bash
580
+ lighteval endpoint litellm litellm_config.yaml \
581
+ "aime25|0,math_500|0,gpqa:diamond|0" \
582
+ --output-dir results_lighteval_ministral \
583
+ --save-details
584
+ ```
585
+
586
+ </details>
587
+
588
+ <table>
589
+ <thead>
590
+ <tr>
591
+ <th>Benchmark</th>
592
+ <th>RedHatAI/Ministral-3-14B-Instruct-2512-BF16</th>
593
+ <th>RedHatAI/Ministral-3-14B-Instruct-2512</th>
594
+ <th>Recovery</th>
595
+ </tr>
596
+ </thead>
597
+ <tbody>
598
+ <tr>
599
+ <td>MMLU-Pro</td>
600
+ <td>41.69</td>
601
+ <td>45.87</td>
602
+ <td>110.0%</td>
603
+ </tr>
604
+ <tr>
605
+ <td>IFEval</td>
606
+ <td>77.34</td>
607
+ <td>76.86</td>
608
+ <td>99.38%</td>
609
+ </tr>
610
+ <tr>
611
+ <td>MMMU</td>
612
+ <td>55.33</td>
613
+ <td>55.33</td>
614
+ <td>100.0%</td>
615
+ </tr>
616
+ <tr>
617
+ <td>AIME25</td>
618
+ <td>36.67</td>
619
+ <td>36.67</td>
620
+ <td>100.0%</td>
621
+ </tr>
622
+ <tr>
623
+ <td>GPQA Diamond</td>
624
+ <td>58.59</td>
625
+ <td>58.59</td>
626
+ <td>100.0%</td>
627
+ </tr>
628
+ <tr>
629
+ <td>MATH 500</td>
630
+ <td>88.6</td>
631
+ <td>86.2</td>
632
+ <td>97.29%</td>
633
+ </tr>
634
+ </tbody>
635
+ </table>
636
  ## License
637
 
638
  This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).