unsloth
/

Mistral-Large-3-675B-Instruct-2512

vllm

mistral-common

mistral

Model card Files Files and versions

xet

Community

danielhanchen commited on 7 days ago

Commit

c1182be

verified ·

1 Parent(s): 972a2d2

Add files using upload-large-folder tool

Browse files

Files changed (1) hide show

README.md +42 -8

README.md CHANGED Viewed

@@ -18,17 +18,20 @@ extra_gated_description: >-
   If you want to learn more about how we process your personal data, please read
   our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
 base_model:
-- mistralai/Mistral-Large-3-675B-Instruct-2512
 tags:
 - mistral-common
 ---
 # Mistral Large 3 675B Instruct 2512
-From our family of large models, **Mistral Large 3** is a state-of-the-art general-purpose **Multimodal granular Mixture-of-Experts** model with **41B active parameters** and **675B total parameters** trained from the ground up with 3000 H200s
 This model is the instruct post-trained version in **FP8**, fine-tuned for instruction tasks, making it ideal for chat, agentic and instruction based use cases.
 Designed for reliability and long-context comprehension - It is engineered for production-grade assistants, retrieval-augmented systems, scientific workloads, and complex enterprise workflows.
 Mistral Large 3 is deployable on-premises in:
 - **FP8** on a single node of B200s or H200s.
 - [NVFP4](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4) on a single node of H100s or A100s.
@@ -78,22 +81,27 @@ We recommend deploying Large 3 in a client-server configuration with the followi
 We compare Mistral Large 3 to similar sized models.
-### Text
-### Vision
 ## Usage
 The model can be used with the following frameworks;
 - [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)
 ### vLLM
 We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
 #### Installation
-Make sure to install [`vLLM >= 0.12.0`](https://github.com/vllm-project/vllm/releases/tag/v0.12.0):
 ```
 pip install vllm --upgrade
@@ -106,18 +114,20 @@ To check:
 python -c "import mistral_common; print(mistral_common.__version__)"
 ```
-You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
 #### Serve
 The Mistral Large 3 Instruct FP8 format can be used on one 8xH200 node. We recommend to use this format if you plan to fine-tuning as it can be more precise than NVFP4 in some situations.
 A simple launch command is:
 ```bash
 vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \
   --tensor-parallel-size 8 \
   --enable-auto-tool-choice --tool-call-parser mistral
 ```
@@ -132,6 +142,30 @@ Additional flags:
 * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
 * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
 #### Usage of the model
 Here we asumme that the model `mistralai/Mistral-Large-3-675B-Instruct-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.

   If you want to learn more about how we process your personal data, please read
   our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
 base_model:
+- mistralai/Mistral-Large-3-675B-Base-2512
 tags:
 - mistral-common
+- compressed-tensors
 ---
 # Mistral Large 3 675B Instruct 2512
+From our family of large models, **Mistral Large 3** is a state-of-the-art general-purpose **Multimodal granular Mixture-of-Experts** model with **41B active parameters** and **675B total parameters** trained from the ground up with 3000 H200s.
 This model is the instruct post-trained version in **FP8**, fine-tuned for instruction tasks, making it ideal for chat, agentic and instruction based use cases.
 Designed for reliability and long-context comprehension - It is engineered for production-grade assistants, retrieval-augmented systems, scientific workloads, and complex enterprise workflows.
+Learn more in our blog post [here](https://mistral.ai/news/mistral-3).
 Mistral Large 3 is deployable on-premises in:
 - **FP8** on a single node of B200s or H200s.
 - [NVFP4](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4) on a single node of H100s or A100s.
 We compare Mistral Large 3 to similar sized models.
+![image](https://cdn-uploads.huggingface.co/production/uploads/64161701107962562e9b1006/IrPlvUUD-5-Phwi9QSevh.png)
+![image](https://cdn-uploads.huggingface.co/production/uploads/64161701107962562e9b1006/fDFEymz4HZNsqFARB4u9Y.png)
+![image](https://cdn-uploads.huggingface.co/production/uploads/64161701107962562e9b1006/eMdaAPcjOo8VyoGyFKxrE.png)
 ## Usage
 The model can be used with the following frameworks;
 - [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)
+> [!Note]
+> We sadly didn't have enough time to add Mistral Large 3 to transformers, but we would be very happy for a community contribution by opening a PR to [huggingface/transformers](https://github.com/huggingface/transformers).
 ### vLLM
 We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
 #### Installation
+Make sure to install **vllm >= 1.12.0**:
 ```
 pip install vllm --upgrade
 python -c "import mistral_common; print(mistral_common.__version__)"
 ```
+You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest).
 #### Serve
 The Mistral Large 3 Instruct FP8 format can be used on one 8xH200 node. We recommend to use this format if you plan to fine-tuning as it can be more precise than NVFP4 in some situations.
+**Simple**
 A simple launch command is:
 ```bash
 vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \
   --tensor-parallel-size 8 \
+  --tokenizer_mode mistral --config_format mistral --load_format mistral \
   --enable-auto-tool-choice --tool-call-parser mistral
 ```
 * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
 * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
+**Accelerated with speculative decoding**
+For maximum performance we recommend serving the checkpoint with its customized draft model [Mistral-Large-3-675B-Instruct-2512-Eagle](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle):
+```bash
+vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \
+  --tensor-parallel-size 8 \
+  --load-format mistral \
+  --tokenizer-mode mistral \
+  --config-format mistral \
+  --enable-auto-tool-choice \
+  --tool-call-parser mistral \
+  --limit-mm-per-prompt '{"image": 10}' \
+  --speculative_config '{
+    "model": "mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle",
+    "num_speculative_tokens": 3,
+    "method": "eagle",
+    "max_model_len": "16384"
+  }'
+```
+For more information on the draft model, please have a look at [Mistral-Large-3-675B-Instruct-2512-Eagle](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle).
 #### Usage of the model
 Here we asumme that the model `mistralai/Mistral-Large-3-675B-Instruct-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.