danielhanchen commited on
Commit
c1182be
·
verified ·
1 Parent(s): 972a2d2

Add files using upload-large-folder tool

Browse files
Files changed (1) hide show
  1. README.md +42 -8
README.md CHANGED
@@ -18,17 +18,20 @@ extra_gated_description: >-
18
  If you want to learn more about how we process your personal data, please read
19
  our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
20
  base_model:
21
- - mistralai/Mistral-Large-3-675B-Instruct-2512
22
  tags:
23
  - mistral-common
 
24
  ---
25
 
26
  # Mistral Large 3 675B Instruct 2512
27
- From our family of large models, **Mistral Large 3** is a state-of-the-art general-purpose **Multimodal granular Mixture-of-Experts** model with **41B active parameters** and **675B total parameters** trained from the ground up with 3000 H200s
28
 
29
  This model is the instruct post-trained version in **FP8**, fine-tuned for instruction tasks, making it ideal for chat, agentic and instruction based use cases.
30
  Designed for reliability and long-context comprehension - It is engineered for production-grade assistants, retrieval-augmented systems, scientific workloads, and complex enterprise workflows.
31
 
 
 
32
  Mistral Large 3 is deployable on-premises in:
33
  - **FP8** on a single node of B200s or H200s.
34
  - [NVFP4](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4) on a single node of H100s or A100s.
@@ -78,22 +81,27 @@ We recommend deploying Large 3 in a client-server configuration with the followi
78
 
79
  We compare Mistral Large 3 to similar sized models.
80
 
81
- ### Text
 
 
82
 
83
- ### Vision
84
 
85
  ## Usage
86
 
87
  The model can be used with the following frameworks;
88
  - [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)
89
-
 
 
 
90
  ### vLLM
91
 
92
  We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
93
 
94
  #### Installation
95
 
96
- Make sure to install [`vLLM >= 0.12.0`](https://github.com/vllm-project/vllm/releases/tag/v0.12.0):
97
 
98
  ```
99
  pip install vllm --upgrade
@@ -106,18 +114,20 @@ To check:
106
  python -c "import mistral_common; print(mistral_common.__version__)"
107
  ```
108
 
109
- You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
110
 
111
  #### Serve
112
 
113
  The Mistral Large 3 Instruct FP8 format can be used on one 8xH200 node. We recommend to use this format if you plan to fine-tuning as it can be more precise than NVFP4 in some situations.
114
 
 
 
115
  A simple launch command is:
116
 
117
  ```bash
118
-
119
  vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \
120
  --tensor-parallel-size 8 \
 
121
  --enable-auto-tool-choice --tool-call-parser mistral
122
  ```
123
 
@@ -132,6 +142,30 @@ Additional flags:
132
  * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
133
  * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
134
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
  #### Usage of the model
136
 
137
  Here we asumme that the model `mistralai/Mistral-Large-3-675B-Instruct-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.
 
18
  If you want to learn more about how we process your personal data, please read
19
  our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
20
  base_model:
21
+ - mistralai/Mistral-Large-3-675B-Base-2512
22
  tags:
23
  - mistral-common
24
+ - compressed-tensors
25
  ---
26
 
27
  # Mistral Large 3 675B Instruct 2512
28
+ From our family of large models, **Mistral Large 3** is a state-of-the-art general-purpose **Multimodal granular Mixture-of-Experts** model with **41B active parameters** and **675B total parameters** trained from the ground up with 3000 H200s.
29
 
30
  This model is the instruct post-trained version in **FP8**, fine-tuned for instruction tasks, making it ideal for chat, agentic and instruction based use cases.
31
  Designed for reliability and long-context comprehension - It is engineered for production-grade assistants, retrieval-augmented systems, scientific workloads, and complex enterprise workflows.
32
 
33
+ Learn more in our blog post [here](https://mistral.ai/news/mistral-3).
34
+
35
  Mistral Large 3 is deployable on-premises in:
36
  - **FP8** on a single node of B200s or H200s.
37
  - [NVFP4](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4) on a single node of H100s or A100s.
 
81
 
82
  We compare Mistral Large 3 to similar sized models.
83
 
84
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/64161701107962562e9b1006/IrPlvUUD-5-Phwi9QSevh.png)
85
+
86
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/64161701107962562e9b1006/fDFEymz4HZNsqFARB4u9Y.png)
87
 
88
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/64161701107962562e9b1006/eMdaAPcjOo8VyoGyFKxrE.png)
89
 
90
  ## Usage
91
 
92
  The model can be used with the following frameworks;
93
  - [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)
94
+
95
+ > [!Note]
96
+ > We sadly didn't have enough time to add Mistral Large 3 to transformers, but we would be very happy for a community contribution by opening a PR to [huggingface/transformers](https://github.com/huggingface/transformers).
97
+
98
  ### vLLM
99
 
100
  We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
101
 
102
  #### Installation
103
 
104
+ Make sure to install **vllm >= 1.12.0**:
105
 
106
  ```
107
  pip install vllm --upgrade
 
114
  python -c "import mistral_common; print(mistral_common.__version__)"
115
  ```
116
 
117
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest).
118
 
119
  #### Serve
120
 
121
  The Mistral Large 3 Instruct FP8 format can be used on one 8xH200 node. We recommend to use this format if you plan to fine-tuning as it can be more precise than NVFP4 in some situations.
122
 
123
+ **Simple**
124
+
125
  A simple launch command is:
126
 
127
  ```bash
 
128
  vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \
129
  --tensor-parallel-size 8 \
130
+ --tokenizer_mode mistral --config_format mistral --load_format mistral \
131
  --enable-auto-tool-choice --tool-call-parser mistral
132
  ```
133
 
 
142
  * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
143
  * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
144
 
145
+ **Accelerated with speculative decoding**
146
+
147
+ For maximum performance we recommend serving the checkpoint with its customized draft model [Mistral-Large-3-675B-Instruct-2512-Eagle](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle):
148
+
149
+ ```bash
150
+ vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \
151
+ --tensor-parallel-size 8 \
152
+ --load-format mistral \
153
+ --tokenizer-mode mistral \
154
+ --config-format mistral \
155
+ --enable-auto-tool-choice \
156
+ --tool-call-parser mistral \
157
+ --limit-mm-per-prompt '{"image": 10}' \
158
+ --speculative_config '{
159
+ "model": "mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle",
160
+ "num_speculative_tokens": 3,
161
+ "method": "eagle",
162
+ "max_model_len": "16384"
163
+ }'
164
+ ```
165
+
166
+ For more information on the draft model, please have a look at [Mistral-Large-3-675B-Instruct-2512-Eagle](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle).
167
+
168
+
169
  #### Usage of the model
170
 
171
  Here we asumme that the model `mistralai/Mistral-Large-3-675B-Instruct-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.