RedHatAI
/

Ministral-3-14B-Instruct-2512

@@ -24,7 +24,7 @@ tags:
 ---
 # Ministral 3 14B Instruct 2512
-The largest model in the Ministral 3 family, **Ministral 3 14B** offers frontier capabilities and performance comparable to its larger [Mistral Small 3.2 24B](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506) counterpart. A powerful and efficient language model with vision capabilities.
 This model is the instruct post-trained version in **FP8**, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.
@@ -527,7 +527,112 @@ model = Mistral3ForConditionalGeneration.from_pretrained(
     quantization_config=FineGrainedFP8Config(dequantize=True)
 )
 ```
 ## License
 This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).

 ---
 # Ministral 3 14B Instruct 2512
+The largest model in the Ministral 3 family, **Ministral 3 14B** offers frontier capabilities and performance comparable to its larger [Mistral Small 3.2 24B](https://huggingface.co/mistralai/ Ministral-3-14B-Instruct-2512) counterpart. A powerful and efficient language model with vision capabilities.
 This model is the instruct post-trained version in **FP8**, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.
     quantization_config=FineGrainedFP8Config(dequantize=True)
 )
 ```
+## Red Hat AI Evaluations
+As part of the model validation effort, Red Hat conducted independent accuracy evaluations and the results are presented below.
+The model was evaluated with [vLLM](https://vllm.ai/) version 0.11.2 and either [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) or
+[lighteval](https://github.com/huggingface/lighteval) depending on the benchmark.
+<details>
+<summary>Evaluation commands</summary>
+All evaluations were conducted using the vLLM server interface.
+The server is first initialized with the following command on 1 H200 GPUs:
+```bash
+vllm serve RedHatAI/Ministral-3-14B-Instruct-2512 \
+  --max-model-len 262144 \
+  --tokenizer_mode mistral \
+  --config_format mistral \
+  --load_format mistral \
+  --limit-mm-per-prompt '{"image": 10}'
+```
+MMLU-Pro, IFEval and MMMU were evaluated using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) as follows.
+```bash
+lm_eval \
+  --model local-chat-completions \
+  --tasks mmlu_pro,ifeval,mmmu_val \
+  --model_args "model=RedHatAI/Ministral-3-14B-Instruct-2512,max_length=64000,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=64,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=1200,max_images=10" \
+  --apply_chat_template \
+  --fewshot_as_multiturn \
+  --output_path results_lmeval_ministral \
+  --gen_kwargs "do_sample=True,temperature=0.15"
+```
+AIME25, GPQA Diamond and Math 500 were evaluated using [lighteval](https://github.com/huggingface/lighteval) as follows.
+litellm_config.yaml
+```yaml
+model_parameters:
+  provider: "hosted_vllm"
+  model_name: "hosted_vllm/RedHatAI/Ministral-3-14B-Instruct-2512"
+  base_url: "http://0.0.0.0:8000/v1"
+  api_key: ""
+  timeout: 1200
+  concurrent_requests: 64
+  max_model_length: 262144
+  generation_parameters:
+    temperature: 0.15
+    max_new_tokens: 64000
+```
+```bash
+lighteval endpoint litellm litellm_config.yaml \
+"aime25|0,math_500|0,gpqa:diamond|0" \
+--output-dir results_lighteval_ministral \
+--save-details
+```
+</details>
+<table>
+  <thead>
+    <tr>
+      <th>Benchmark</th>
+      <th>RedHatAI/Ministral-3-14B-Instruct-2512-BF16</th>
+      <th>RedHatAI/Ministral-3-14B-Instruct-2512</th>
+      <th>Recovery</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>MMLU-Pro</td>
+      <td>41.69</td>
+      <td>45.87</td>
+      <td>110.0%</td>
+    </tr>
+    <tr>
+      <td>IFEval</td>
+      <td>77.34</td>
+      <td>76.86</td>
+      <td>99.38%</td>
+    </tr>
+    <tr>
+      <td>MMMU</td>
+      <td>55.33</td>
+      <td>55.33</td>
+      <td>100.0%</td>
+    </tr>
+    <tr>
+      <td>AIME25</td>
+      <td>36.67</td>
+      <td>36.67</td>
+      <td>100.0%</td>
+    </tr>
+    <tr>
+      <td>GPQA Diamond</td>
+      <td>58.59</td>
+      <td>58.59</td>
+      <td>100.0%</td>
+    </tr>
+    <tr>
+      <td>MATH 500</td>
+      <td>88.6</td>
+      <td>86.2</td>
+      <td>97.29%</td>
+    </tr>
+  </tbody>
+</table>
 ## License
 This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).