Add files using upload-large-folder tool
Browse files
README.md
CHANGED
|
@@ -18,17 +18,20 @@ extra_gated_description: >-
|
|
| 18 |
If you want to learn more about how we process your personal data, please read
|
| 19 |
our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
|
| 20 |
base_model:
|
| 21 |
-
- mistralai/Mistral-Large-3-675B-
|
| 22 |
tags:
|
| 23 |
- mistral-common
|
|
|
|
| 24 |
---
|
| 25 |
|
| 26 |
# Mistral Large 3 675B Instruct 2512
|
| 27 |
-
From our family of large models, **Mistral Large 3** is a state-of-the-art general-purpose **Multimodal granular Mixture-of-Experts** model with **41B active parameters** and **675B total parameters** trained from the ground up with 3000 H200s
|
| 28 |
|
| 29 |
This model is the instruct post-trained version in **FP8**, fine-tuned for instruction tasks, making it ideal for chat, agentic and instruction based use cases.
|
| 30 |
Designed for reliability and long-context comprehension - It is engineered for production-grade assistants, retrieval-augmented systems, scientific workloads, and complex enterprise workflows.
|
| 31 |
|
|
|
|
|
|
|
| 32 |
Mistral Large 3 is deployable on-premises in:
|
| 33 |
- **FP8** on a single node of B200s or H200s.
|
| 34 |
- [NVFP4](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4) on a single node of H100s or A100s.
|
|
@@ -78,22 +81,27 @@ We recommend deploying Large 3 in a client-server configuration with the followi
|
|
| 78 |
|
| 79 |
We compare Mistral Large 3 to similar sized models.
|
| 80 |
|
| 81 |
-
|
|
|
|
|
|
|
| 82 |
|
| 83 |
-
|
| 84 |
|
| 85 |
## Usage
|
| 86 |
|
| 87 |
The model can be used with the following frameworks;
|
| 88 |
- [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)
|
| 89 |
-
|
|
|
|
|
|
|
|
|
|
| 90 |
### vLLM
|
| 91 |
|
| 92 |
We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
|
| 93 |
|
| 94 |
#### Installation
|
| 95 |
|
| 96 |
-
Make sure to install
|
| 97 |
|
| 98 |
```
|
| 99 |
pip install vllm --upgrade
|
|
@@ -106,18 +114,20 @@ To check:
|
|
| 106 |
python -c "import mistral_common; print(mistral_common.__version__)"
|
| 107 |
```
|
| 108 |
|
| 109 |
-
You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest
|
| 110 |
|
| 111 |
#### Serve
|
| 112 |
|
| 113 |
The Mistral Large 3 Instruct FP8 format can be used on one 8xH200 node. We recommend to use this format if you plan to fine-tuning as it can be more precise than NVFP4 in some situations.
|
| 114 |
|
|
|
|
|
|
|
| 115 |
A simple launch command is:
|
| 116 |
|
| 117 |
```bash
|
| 118 |
-
|
| 119 |
vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \
|
| 120 |
--tensor-parallel-size 8 \
|
|
|
|
| 121 |
--enable-auto-tool-choice --tool-call-parser mistral
|
| 122 |
```
|
| 123 |
|
|
@@ -132,6 +142,30 @@ Additional flags:
|
|
| 132 |
* You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
|
| 133 |
* You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
|
| 134 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
#### Usage of the model
|
| 136 |
|
| 137 |
Here we asumme that the model `mistralai/Mistral-Large-3-675B-Instruct-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.
|
|
|
|
| 18 |
If you want to learn more about how we process your personal data, please read
|
| 19 |
our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
|
| 20 |
base_model:
|
| 21 |
+
- mistralai/Mistral-Large-3-675B-Base-2512
|
| 22 |
tags:
|
| 23 |
- mistral-common
|
| 24 |
+
- compressed-tensors
|
| 25 |
---
|
| 26 |
|
| 27 |
# Mistral Large 3 675B Instruct 2512
|
| 28 |
+
From our family of large models, **Mistral Large 3** is a state-of-the-art general-purpose **Multimodal granular Mixture-of-Experts** model with **41B active parameters** and **675B total parameters** trained from the ground up with 3000 H200s.
|
| 29 |
|
| 30 |
This model is the instruct post-trained version in **FP8**, fine-tuned for instruction tasks, making it ideal for chat, agentic and instruction based use cases.
|
| 31 |
Designed for reliability and long-context comprehension - It is engineered for production-grade assistants, retrieval-augmented systems, scientific workloads, and complex enterprise workflows.
|
| 32 |
|
| 33 |
+
Learn more in our blog post [here](https://mistral.ai/news/mistral-3).
|
| 34 |
+
|
| 35 |
Mistral Large 3 is deployable on-premises in:
|
| 36 |
- **FP8** on a single node of B200s or H200s.
|
| 37 |
- [NVFP4](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4) on a single node of H100s or A100s.
|
|
|
|
| 81 |
|
| 82 |
We compare Mistral Large 3 to similar sized models.
|
| 83 |
|
| 84 |
+

|
| 85 |
+
|
| 86 |
+

|
| 87 |
|
| 88 |
+

|
| 89 |
|
| 90 |
## Usage
|
| 91 |
|
| 92 |
The model can be used with the following frameworks;
|
| 93 |
- [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)
|
| 94 |
+
|
| 95 |
+
> [!Note]
|
| 96 |
+
> We sadly didn't have enough time to add Mistral Large 3 to transformers, but we would be very happy for a community contribution by opening a PR to [huggingface/transformers](https://github.com/huggingface/transformers).
|
| 97 |
+
|
| 98 |
### vLLM
|
| 99 |
|
| 100 |
We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
|
| 101 |
|
| 102 |
#### Installation
|
| 103 |
|
| 104 |
+
Make sure to install **vllm >= 1.12.0**:
|
| 105 |
|
| 106 |
```
|
| 107 |
pip install vllm --upgrade
|
|
|
|
| 114 |
python -c "import mistral_common; print(mistral_common.__version__)"
|
| 115 |
```
|
| 116 |
|
| 117 |
+
You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest).
|
| 118 |
|
| 119 |
#### Serve
|
| 120 |
|
| 121 |
The Mistral Large 3 Instruct FP8 format can be used on one 8xH200 node. We recommend to use this format if you plan to fine-tuning as it can be more precise than NVFP4 in some situations.
|
| 122 |
|
| 123 |
+
**Simple**
|
| 124 |
+
|
| 125 |
A simple launch command is:
|
| 126 |
|
| 127 |
```bash
|
|
|
|
| 128 |
vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \
|
| 129 |
--tensor-parallel-size 8 \
|
| 130 |
+
--tokenizer_mode mistral --config_format mistral --load_format mistral \
|
| 131 |
--enable-auto-tool-choice --tool-call-parser mistral
|
| 132 |
```
|
| 133 |
|
|
|
|
| 142 |
* You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
|
| 143 |
* You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
|
| 144 |
|
| 145 |
+
**Accelerated with speculative decoding**
|
| 146 |
+
|
| 147 |
+
For maximum performance we recommend serving the checkpoint with its customized draft model [Mistral-Large-3-675B-Instruct-2512-Eagle](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle):
|
| 148 |
+
|
| 149 |
+
```bash
|
| 150 |
+
vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \
|
| 151 |
+
--tensor-parallel-size 8 \
|
| 152 |
+
--load-format mistral \
|
| 153 |
+
--tokenizer-mode mistral \
|
| 154 |
+
--config-format mistral \
|
| 155 |
+
--enable-auto-tool-choice \
|
| 156 |
+
--tool-call-parser mistral \
|
| 157 |
+
--limit-mm-per-prompt '{"image": 10}' \
|
| 158 |
+
--speculative_config '{
|
| 159 |
+
"model": "mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle",
|
| 160 |
+
"num_speculative_tokens": 3,
|
| 161 |
+
"method": "eagle",
|
| 162 |
+
"max_model_len": "16384"
|
| 163 |
+
}'
|
| 164 |
+
```
|
| 165 |
+
|
| 166 |
+
For more information on the draft model, please have a look at [Mistral-Large-3-675B-Instruct-2512-Eagle](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle).
|
| 167 |
+
|
| 168 |
+
|
| 169 |
#### Usage of the model
|
| 170 |
|
| 171 |
Here we asumme that the model `mistralai/Mistral-Large-3-675B-Instruct-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.
|