| | --- |
| | library_name: vllm |
| | language: |
| | - en |
| | - fr |
| | - es |
| | - de |
| | - it |
| | - pt |
| | - nl |
| | - zh |
| | - ja |
| | - ko |
| | - ar |
| | license: apache-2.0 |
| | inference: false |
| | base_model: |
| | - mistralai/Ministral-3-3B-Base-2512 |
| | extra_gated_description: >- |
| | If you want to learn more about how we process your personal data, please read |
| | our <a href="https://mistral.ai/terms/">Privacy Policy</a>. |
| | tags: |
| | - mistral-common |
| | --- |
| | |
| | # Ministral 3 3B Instruct 2512 |
| | The smallest model in the Ministral 3 family, **Ministral 3 3B** is a powerful, efficient tiny language model with vision capabilities. |
| |
|
| | This model is the instruct post-trained version in **FP8**, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases. |
| |
|
| | The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, capable of fitting in 8GB of VRAM in FP8, and less if further quantized. |
| |
|
| | Learn more in our [blog post](https://mistral.ai/news/mistral-3) and [paper](https://arxiv.org/abs/2601.08584). |
| |
|
| | ## Key Features |
| | Ministral 3 3B consists of two main architectural components: |
| | - **3.4B Language Model** |
| | - **0.4B Vision Encoder** |
| |
|
| | The Ministral 3 3B Instruct model offers the following capabilities: |
| | - **Vision**: Enables the model to analyze images and provide insights based on visual content, in addition to text. |
| | - **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic. |
| | - **System Prompt**: Maintains strong adherence and support for system prompts. |
| | - **Agentic**: Offers best-in-class agentic capabilities with native function calling and JSON outputting. |
| | - **Edge-Optimized**: Delivers best-in-class performance at a small scale, deployable anywhere. |
| | - **Apache 2.0 License**: Open-source license allowing usage and modification for both commercial and non-commercial purposes. |
| | - **Large Context Window**: Supports a 256k context window. |
| |
|
| | ### Use Cases |
| | Ideal for lightweight, real-time applications on edge or low-resource devices, such as: |
| | - Image captioning |
| | - Text classification |
| | - Real-time efficient translation |
| | - Data extraction |
| | - Short content generation |
| | - Fine-tuning and specialization |
| | - And more... |
| | |
| | Bringing advanced AI capabilities to edge and distributed environments for embedded systems. |
| |
|
| | ### Recommended Settings |
| |
|
| | We recommend deploying with the following best practices: |
| | - System Prompt: Define a clear environment and use case, including guidance on how to effectively leverage tools in agentic systems. |
| | - Sampling Parameters: Use a **temperature below 0.1** for daily-driver and production environments ; Higher temperatures may be explored for creative use cases - developers are encouraged to experiment with alternative settings. |
| | - Tools: Keep the set of tools well-defined and limit their number to the minimum required for the use case - Avoiding overloading the model with an excessive number of tools. |
| | - Vision: When deploying with vision capabilities, we recommend maintaining an aspect ratio close to 1:1 (width-to-height) for images. Avoiding the use of overly thin or wide images - crop them as needed to ensure optimal performance. |
| |
|
| | ### Recommended Sampling |
| |
|
| | * We recommend starting with a Temperature of 0.1 for most use cases. Feel free to experiment with different settings to best suit your specific needs. |
| |
|
| | ## Ministral 3 Family |
| |
|
| | | Model Name | Type | Precision | Link | |
| | |--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------| |
| | | Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) | |
| | | **Ministral 3 3B Instruct 2512** | **Instruct post-trained** | **FP8** | [**Hugging Face**](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) | |
| | | Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) | |
| | | Ministral 3 8B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) | |
| | | Ministral 3 8B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) | |
| | | Ministral 3 8B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512) | |
| | | Ministral 3 14B Base 2512 | Base pre-trained** | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512) | |
| | | Ministral 3 14B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) | |
| | | Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) | |
| |
|
| | Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-additional-checkpoints). |
| |
|
| | ## Benchmark Results |
| |
|
| | We compare Ministral 3 to similar sized models. |
| |
|
| | ### Reasoning |
| |
|
| | | Model | AIME25 | AIME24 | GPQA Diamond | LiveCodeBench | |
| | |---------------------------|-------------|-------------|--------------|---------------| |
| | | **Ministral 3 14B** | <u>0.850</u>| <u>0.898</u>| <u>0.712</u> | <u>0.646</u> | |
| | | Qwen3-14B (Thinking) | 0.737 | 0.837 | 0.663 | 0.593 | |
| | | | | | | | |
| | | **Ministral 3 8B** | 0.787 | <u>0.860</u>| 0.668 | <u>0.616</u> | |
| | | Qwen3-VL-8B-Thinking | <u>0.798</u>| <u>0.860</u>| <u>0.671</u> | 0.580 | |
| | | | | | | | |
| | | **Ministral 3 3B** | <u>0.721</u>| <u>0.775</u>| 0.534 | <u>0.548</u> | |
| | | Qwen3-VL-4B-Thinking | 0.697 | 0.729 | <u>0.601</u> | 0.513 | |
| |
|
| | ### Instruct |
| |
|
| | | Model | Arena Hard | WildBench | MATH Maj@1 | MM MTBench | |
| | |---------------------------|-------------|------------|-------------|------------------| |
| | | **Ministral 3 14B** | <u>0.551</u>| <u>68.5</u>| <u>0.904</u>| <u>8.49</u> | |
| | | Qwen3 14B (Non-Thinking) | 0.427 | 65.1 | 0.870 | NOT MULTIMODAL | |
| | | Gemma3-12B-Instruct | 0.436 | 63.2 | 0.854 | 6.70 | |
| | | | | | | | |
| | | **Ministral 3 8B** | 0.509 | <u>66.8</u>| 0.876 | <u>8.08</u> | |
| | | Qwen3-VL-8B-Instruct | <u>0.528</u>| 66.3 | <u>0.946</u>| 8.00 | |
| | | | | | | | |
| | | **Ministral 3 3B** | 0.305 | <u>56.8</u>| 0.830 | 7.83 | |
| | | Qwen3-VL-4B-Instruct | <u>0.438</u>| <u>56.8</u>| <u>0.900</u>| <u>8.01</u> | |
| | | Qwen3-VL-2B-Instruct | 0.163 | 42.2 | 0.786 | 6.36 | |
| | | Gemma3-4B-Instruct | 0.318 | 49.1 | 0.759 | 5.23 | |
| |
|
| | ### Base |
| |
|
| | | Model | Multilingual MMLU | MATH CoT 2-Shot | AGIEval 5-shot | MMLU Redux 5-shot | MMLU 5-shot | TriviaQA 5-shot | |
| | |---------------------|-------------------|-----------------|----------------|-------------------|-------------|-----------------| |
| | | **Ministral 3 14B** | 0.742 | <u>0.676</u> | 0.648 | 0.820 | 0.794 | 0.749 | |
| | | Qwen3 14B Base | <u>0.754</u> | 0.620 | <u>0.661</u> | <u>0.837</u> | <u>0.804</u>| 0.703 | |
| | | Gemma 3 12B Base | 0.690 | 0.487 | 0.587 | 0.766 | 0.745 | <u>0.788</u> | |
| | | | | | | | | | |
| | | **Ministral 3 8B** | <u>0.706</u> | <u>0.626</u> | 0.591 | 0.793 | <u>0.761</u>| <u>0.681</u> | |
| | | Qwen 3 8B Base | 0.700 | 0.576 | <u>0.596</u> | <u>0.794</u> | 0.760 | 0.639 | |
| | | | | | | | | | |
| | | **Ministral 3 3B** | 0.652 | <u>0.601</u> | 0.511 | 0.735 | 0.707 | 0.592 | |
| | | Qwen 3 4B Base | <u>0.677</u> | 0.405 | <u>0.570</u> | <u>0.759</u> | <u>0.713</u>| 0.530 | |
| | | Gemma 3 4B Base | 0.516 | 0.294 | 0.430 | 0.626 | 0.589 | <u>0.640</u> | |
| |
|
| | ## Usage |
| |
|
| | The model can be used with the following frameworks; |
| | - [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm) |
| | - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers) |
| | |
| | ### vLLM |
| |
|
| | We recommend using this model with [vLLM](https://github.com/vllm-project/vllm). |
| |
|
| | #### Installation |
| |
|
| | Make sure to install **vllm >= 0.12.0**: |
| |
|
| | ``` |
| | pip install vllm --upgrade |
| | ``` |
| |
|
| | Doing so should automatically install [`mistral_common >= 1.8.6`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.6). |
| |
|
| | To check: |
| | ``` |
| | python -c "import mistral_common; print(mistral_common.__version__)" |
| | ``` |
| |
|
| | You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest). |
| |
|
| | #### Serve |
| |
|
| | Due to their size and the FP8 format of their weights `Ministral-3-3B-Instruct-2512`, `Ministral-3-8B-Instruct-2512` and `Ministral-3-14B-Instruct-2512` can run on a single 1xH200 GPU. |
| |
|
| | A simple launch command is: |
| |
|
| | ```bash |
| | vllm serve mistralai/Ministral-3-3B-Instruct-2512 \ |
| | --tokenizer_mode mistral --config_format mistral --load_format mistral \ |
| | --enable-auto-tool-choice --tool-call-parser mistral |
| | ``` |
| |
|
| | Key parameter notes: |
| |
|
| | * enable-auto-tool-choice: Required when enabling tool usage. |
| | * tool-call-parser mistral: Required when enabling tool usage. |
| |
|
| |
|
| | Additional flags: |
| |
|
| | * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios. |
| | * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency. |
| | |
| | #### Usage of the model |
| |
|
| | Here we assume that the model `mistralai/Ministral-3-3B-Instruct-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM. |
| |
|
| | <details> |
| | <summary>Vision Reasoning</summary> |
| |
|
| | Let's see if the Ministral 3 knows when to pick a fight ! |
| |
|
| | ```python |
| | from datetime import datetime, timedelta |
| | |
| | from openai import OpenAI |
| | from huggingface_hub import hf_hub_download |
| | |
| | # Modify OpenAI's API key and API base to use vLLM's API server. |
| | openai_api_key = "EMPTY" |
| | openai_api_base = "http://localhost:8000/v1" |
| | |
| | TEMP = 0.15 |
| | MAX_TOK = 262144 |
| | |
| | client = OpenAI( |
| | api_key=openai_api_key, |
| | base_url=openai_api_base, |
| | ) |
| | |
| | models = client.models.list() |
| | model = models.data[0].id |
| | |
| | |
| | def load_system_prompt(repo_id: str, filename: str) -> str: |
| | file_path = hf_hub_download(repo_id=repo_id, filename=filename) |
| | with open(file_path, "r") as file: |
| | system_prompt = file.read() |
| | today = datetime.today().strftime("%Y-%m-%d") |
| | yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d") |
| | model_name = repo_id.split("/")[-1] |
| | return system_prompt.format(name=model_name, today=today, yesterday=yesterday) |
| | |
| | |
| | SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt") |
| | image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438" |
| | |
| | messages = [ |
| | {"role": "system", "content": SYSTEM_PROMPT}, |
| | { |
| | "role": "user", |
| | "content": [ |
| | { |
| | "type": "text", |
| | "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.", |
| | }, |
| | {"type": "image_url", "image_url": {"url": image_url}}, |
| | ], |
| | }, |
| | ] |
| | |
| | |
| | response = client.chat.completions.create( |
| | model=model, |
| | messages=messages, |
| | temperature=TEMP, |
| | max_tokens=MAX_TOK, |
| | ) |
| | |
| | print(response.choices[0].message.content) |
| | ``` |
| |
|
| | </details> |
| |
|
| | <details> |
| | <summary>Function Calling</summary> |
| |
|
| | Let's solve some equations thanks to our simple Python calculator tool. |
| |
|
| | ```python |
| | import json |
| | from openai import OpenAI |
| | from huggingface_hub import hf_hub_download |
| | |
| | # Modify OpenAI's API key and API base to use vLLM's API server. |
| | openai_api_key = "EMPTY" |
| | openai_api_base = "http://localhost:8000/v1" |
| | |
| | TEMP = 0.15 |
| | MAX_TOK = 262144 |
| | |
| | client = OpenAI( |
| | api_key=openai_api_key, |
| | base_url=openai_api_base, |
| | ) |
| | |
| | models = client.models.list() |
| | model = models.data[0].id |
| | |
| | |
| | def load_system_prompt(repo_id: str, filename: str) -> str: |
| | file_path = hf_hub_download(repo_id=repo_id, filename=filename) |
| | with open(file_path, "r") as file: |
| | system_prompt = file.read() |
| | return system_prompt |
| | |
| | |
| | SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt") |
| | |
| | image_url = "https://math-coaching.com/img/fiche/46/expressions-mathematiques.jpg" |
| | |
| | |
| | def my_calculator(expression: str) -> str: |
| | return str(eval(expression)) |
| | |
| | |
| | tools = [ |
| | { |
| | "type": "function", |
| | "function": { |
| | "name": "my_calculator", |
| | "description": "A calculator that can evaluate a mathematical expression.", |
| | "parameters": { |
| | "type": "object", |
| | "properties": { |
| | "expression": { |
| | "type": "string", |
| | "description": "The mathematical expression to evaluate.", |
| | }, |
| | }, |
| | "required": ["expression"], |
| | }, |
| | }, |
| | }, |
| | { |
| | "type": "function", |
| | "function": { |
| | "name": "rewrite", |
| | "description": "Rewrite a given text for improved clarity", |
| | "parameters": { |
| | "type": "object", |
| | "properties": { |
| | "text": { |
| | "type": "string", |
| | "description": "The input text to rewrite", |
| | } |
| | }, |
| | }, |
| | }, |
| | }, |
| | ] |
| | |
| | messages = [ |
| | {"role": "system", "content": SYSTEM_PROMPT}, |
| | { |
| | "role": "user", |
| | "content": [ |
| | { |
| | "type": "text", |
| | "text": "Thanks to your calculator, compute the results for the equations that involve numbers displayed in the image.", |
| | }, |
| | { |
| | "type": "image_url", |
| | "image_url": { |
| | "url": image_url, |
| | }, |
| | }, |
| | ], |
| | }, |
| | ] |
| | |
| | response = client.chat.completions.create( |
| | model=model, |
| | messages=messages, |
| | temperature=TEMP, |
| | max_tokens=MAX_TOK, |
| | tools=tools, |
| | tool_choice="auto", |
| | ) |
| | |
| | tool_calls = response.choices[0].message.tool_calls |
| | |
| | results = [] |
| | for tool_call in tool_calls: |
| | function_name = tool_call.function.name |
| | function_args = tool_call.function.arguments |
| | if function_name == "my_calculator": |
| | result = my_calculator(**json.loads(function_args)) |
| | results.append(result) |
| | |
| | messages.append({"role": "assistant", "tool_calls": tool_calls}) |
| | for tool_call, result in zip(tool_calls, results): |
| | messages.append( |
| | { |
| | "role": "tool", |
| | "tool_call_id": tool_call.id, |
| | "name": tool_call.function.name, |
| | "content": result, |
| | } |
| | ) |
| | |
| | |
| | response = client.chat.completions.create( |
| | model=model, |
| | messages=messages, |
| | temperature=TEMP, |
| | max_tokens=MAX_TOK, |
| | ) |
| | |
| | print(response.choices[0].message.content) |
| | ``` |
| |
|
| | </details> |
| |
|
| | <details> |
| | <summary>Text-Only Request</summary> |
| |
|
| | Ministral 3 can follow your instructions to the letter. |
| |
|
| | ```python |
| | from openai import OpenAI |
| | from huggingface_hub import hf_hub_download |
| | |
| | # Modify OpenAI's API key and API base to use vLLM's API server. |
| | openai_api_key = "EMPTY" |
| | openai_api_base = "http://localhost:8000/v1" |
| | |
| | TEMP = 0.15 |
| | MAX_TOK = 262144 |
| | |
| | client = OpenAI( |
| | api_key=openai_api_key, |
| | base_url=openai_api_base, |
| | ) |
| | |
| | models = client.models.list() |
| | model = models.data[0].id |
| | |
| | |
| | def load_system_prompt(repo_id: str, filename: str) -> str: |
| | file_path = hf_hub_download(repo_id=repo_id, filename=filename) |
| | with open(file_path, "r") as file: |
| | system_prompt = file.read() |
| | return system_prompt |
| | |
| | |
| | SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt") |
| | |
| | messages = [ |
| | {"role": "system", "content": SYSTEM_PROMPT}, |
| | { |
| | "role": "user", |
| | "content": "Write me a sentence where every word starts with the next letter in the alphabet - start with 'a' and end with 'z'.", |
| | }, |
| | ] |
| | |
| | response = client.chat.completions.create( |
| | model=model, |
| | messages=messages, |
| | temperature=TEMP, |
| | max_tokens=MAX_TOK, |
| | ) |
| | |
| | assistant_message = response.choices[0].message.content |
| | print(assistant_message) |
| | ``` |
| |
|
| | </details> |
| |
|
| | ### Transformers |
| |
|
| | You can also use Ministral 3 3B Instruct 2512 with `Transformers` ! |
| |
|
| | Transformers recently added support for FP8, so make sure to install from main: |
| |
|
| | ```sh |
| | uv pip install git+https://github.com/huggingface/transformers |
| | ``` |
| |
|
| | To make the best use of our model with `Transformers` make sure to have [installed](https://github.com/mistralai/mistral-common) `mistral-common >= 1.8.6` to use our tokenizer. |
| |
|
| | ```bash |
| | pip install mistral-common --upgrade |
| | ``` |
| |
|
| | Try it out by running the following snippet. |
| |
|
| | > [!Tip] |
| | > On latest main as of 05/12/2025, by default |
| | > a FP8 triton kernel for fast accelerated matmuls |
| | > (`w8a8_block_fp8_matmul_triton`) will be used |
| | > without any degradation in accuracy. However, if you want to |
| | > run your model in BF16 see ([here](#transformers-bf16)) |
| |
|
| | <details> |
| | <summary>Python snippet</summary> |
| |
|
| | ```python |
| | import torch |
| | from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend |
| | |
| | model_id = "mistralai/Ministral-3-3B-Instruct-2512" |
| | |
| | tokenizer = MistralCommonBackend.from_pretrained(model_id) |
| | model = Mistral3ForConditionalGeneration.from_pretrained(model_id, device_map="auto") |
| | |
| | image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438" |
| | |
| | messages = [ |
| | { |
| | "role": "user", |
| | "content": [ |
| | { |
| | "type": "text", |
| | "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.", |
| | }, |
| | {"type": "image_url", "image_url": {"url": image_url}}, |
| | ], |
| | }, |
| | ] |
| | |
| | tokenized = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True) |
| | |
| | tokenized["input_ids"] = tokenized["input_ids"].to(device="cuda") |
| | tokenized["pixel_values"] = tokenized["pixel_values"].to(dtype=torch.bfloat16, device="cuda") |
| | image_sizes = [tokenized["pixel_values"].shape[-2:]] |
| | |
| | output = model.generate( |
| | **tokenized, |
| | image_sizes=image_sizes, |
| | max_new_tokens=512, |
| | )[0] |
| | |
| | decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):]) |
| | print(decoded_output) |
| | ``` |
| |
|
| | </details> |
| |
|
| | #### Transformers BF16 |
| |
|
| | Transformers allows you to automatically convert the checkpoint to Bfloat16. To do so, simply load the model as follows: |
| |
|
| | ```py |
| | from transformers import Mistral3ForConditionalGeneration, FineGrainedFP8Config |
| | |
| | model_id = "mistralai/Ministral-3-3B-Instruct-2512" |
| | model = Mistral3ForConditionalGeneration.from_pretrained( |
| | model_id, |
| | device_map="auto", |
| | quantization_config=FineGrainedFP8Config(dequantize=True) |
| | ) |
| | ``` |
| |
|
| | ## License |
| |
|
| | This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt). |
| |
|
| | *You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.* |