ToastyPigeon commited on
Commit
bd1e880
·
verified ·
1 Parent(s): a0c8434

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tekken.json filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,623 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: vllm
3
+ language:
4
+ - en
5
+ - fr
6
+ - de
7
+ - es
8
+ - it
9
+ - pt
10
+ - zh
11
+ - ja
12
+ - ru
13
+ - ko
14
+ license: other
15
+ license_name: mrl
16
+ inference: false
17
+ license_link: https://mistral.ai/licenses/MRL-0.1.md
18
+ extra_gated_prompt: >-
19
+ # Mistral AI Research License
20
+
21
+ If You want to use a Mistral Model, a Derivative or an Output for any purpose
22
+ that is not expressly authorized under this Agreement, You must request a
23
+ license from Mistral AI, which Mistral AI may grant to You in Mistral AI's
24
+ sole discretion. To discuss such a license, please contact Mistral AI via the
25
+ website contact form: https://mistral.ai/contact/
26
+
27
+ ## 1. Scope and acceptance
28
+
29
+ **1.1. Scope of the Agreement.** This Agreement applies to any use,
30
+ modification, or Distribution of any Mistral Model by You, regardless of the
31
+ source You obtained a copy of such Mistral Model.
32
+
33
+ **1.2. Acceptance.** By accessing, using, modifying, Distributing a Mistral
34
+ Model, or by creating, using or distributing a Derivative of the Mistral
35
+ Model, You agree to be bound by this Agreement.
36
+
37
+ **1.3. Acceptance on behalf of a third-party.** If You accept this Agreement
38
+ on behalf of Your employer or another person or entity, You warrant and
39
+ represent that You have the authority to act and accept this Agreement on
40
+ their behalf. In such a case, the word "You" in this Agreement will refer to
41
+ Your employer or such other person or entity.
42
+
43
+ ## 2. License
44
+
45
+ **2.1. Grant of rights**. Subject to Section 3 below, Mistral AI hereby
46
+ grants You a non-exclusive, royalty-free, worldwide, non-sublicensable,
47
+ non-transferable, limited license to use, copy, modify, and Distribute under
48
+ the conditions provided in Section 2.2 below, the Mistral Model and any
49
+ Derivatives made by or for Mistral AI and to create Derivatives of the Mistral
50
+ Model.
51
+
52
+ **2.2. Distribution of Mistral Model and Derivatives made by or for Mistral
53
+ AI.** Subject to Section 3 below, You may Distribute copies of the Mistral
54
+ Model and/or Derivatives made by or for Mistral AI, under the following
55
+ conditions: You must make available a copy of this Agreement to third-party
56
+ recipients of the Mistral Models and/or Derivatives made by or for Mistral AI
57
+ you Distribute, it being specified that any rights to use the Mistral Models
58
+ and/or Derivatives made by or for Mistral AI shall be directly granted by
59
+ Mistral AI to said third-party recipients pursuant to the Mistral AI Research
60
+ License agreement executed between these parties; You must retain in all
61
+ copies of the Mistral Models the following attribution notice within a
62
+ "Notice" text file distributed as part of such copies: "Licensed by Mistral AI
63
+ under the Mistral AI Research License".
64
+
65
+ **2.3. Distribution of Derivatives made by or for You.** Subject to Section 3
66
+ below, You may Distribute any Derivatives made by or for You under additional
67
+ or different terms and conditions, provided that: In any event, the use and
68
+ modification of Mistral Model and/or Derivatives made by or for Mistral AI
69
+ shall remain governed by the terms and conditions of this Agreement; You
70
+ include in any such Derivatives made by or for You prominent notices stating
71
+ that You modified the concerned Mistral Model; and Any terms and conditions
72
+ You impose on any third-party recipients relating to Derivatives made by or
73
+ for You shall neither limit such third-party recipients' use of the Mistral
74
+ Model or any Derivatives made by or for Mistral AI in accordance with the
75
+ Mistral AI Research License nor conflict with any of its terms and conditions.
76
+
77
+ ## 3. Limitations
78
+
79
+ **3.1. Misrepresentation.** You must not misrepresent or imply, through any
80
+ means, that the Derivatives made by or for You and/or any modified version of
81
+ the Mistral Model You Distribute under your name and responsibility is an
82
+ official product of Mistral AI or has been endorsed, approved or validated by
83
+ Mistral AI, unless You are authorized by Us to do so in writing.
84
+
85
+ **3.2. Usage Limitation.** You shall only use the Mistral Models, Derivatives
86
+ (whether or not created by Mistral AI) and Outputs for Research Purposes.
87
+
88
+ ## 4. Intellectual Property
89
+
90
+ **4.1. Trademarks.** No trademark licenses are granted under this Agreement,
91
+ and in connection with the Mistral Models, You may not use any name or mark
92
+ owned by or associated with Mistral AI or any of its affiliates, except (i) as
93
+ required for reasonable and customary use in describing and Distributing the
94
+ Mistral Models and Derivatives made by or for Mistral AI and (ii) for
95
+ attribution purposes as required by this Agreement.
96
+
97
+ **4.2. Outputs.** We claim no ownership rights in and to the Outputs. You are
98
+ solely responsible for the Outputs You generate and their subsequent uses in
99
+ accordance with this Agreement. Any Outputs shall be subject to the
100
+ restrictions set out in Section 3 of this Agreement.
101
+
102
+ **4.3. Derivatives.** By entering into this Agreement, You accept that any
103
+ Derivatives that You may create or that may be created for You shall be
104
+ subject to the restrictions set out in Section 3 of this Agreement.
105
+
106
+ ## 5. Liability
107
+
108
+ **5.1. Limitation of liability.** In no event, unless required by applicable
109
+ law (such as deliberate and grossly negligent acts) or agreed to in writing,
110
+ shall Mistral AI be liable to You for damages, including any direct, indirect,
111
+ special, incidental, or consequential damages of any character arising as a
112
+ result of this Agreement or out of the use or inability to use the Mistral
113
+ Models and Derivatives (including but not limited to damages for loss of data,
114
+ loss of goodwill, loss of expected profit or savings, work stoppage, computer
115
+ failure or malfunction, or any damage caused by malware or security breaches),
116
+ even if Mistral AI has been advised of the possibility of such damages.
117
+
118
+ **5.2. Indemnification.** You agree to indemnify and hold harmless Mistral AI
119
+ from and against any claims, damages, or losses arising out of or related to
120
+ Your use or Distribution of the Mistral Models and Derivatives.
121
+
122
+ ## 6. Warranty
123
+
124
+ **6.1. Disclaimer.** Unless required by applicable law or prior agreed to by
125
+ Mistral AI in writing, Mistral AI provides the Mistral Models and Derivatives
126
+ on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
127
+ express or implied, including, without limitation, any warranties or
128
+ conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
129
+ PARTICULAR PURPOSE. Mistral AI does not represent nor warrant that the Mistral
130
+ Models and Derivatives will be error-free, meet Your or any third party's
131
+ requirements, be secure or will allow You or any third party to achieve any
132
+ kind of result or generate any kind of content. You are solely responsible for
133
+ determining the appropriateness of using or Distributing the Mistral Models
134
+ and Derivatives and assume any risks associated with Your exercise of rights
135
+ under this Agreement.
136
+
137
+ ## 7. Termination
138
+
139
+ **7.1. Term.** This Agreement is effective as of the date of your acceptance
140
+ of this Agreement or access to the concerned Mistral Models or Derivatives and
141
+ will continue until terminated in accordance with the following terms.
142
+
143
+ **7.2. Termination.** Mistral AI may terminate this Agreement at any time if
144
+ You are in breach of this Agreement. Upon termination of this Agreement, You
145
+ must cease to use all Mistral Models and Derivatives and shall permanently
146
+ delete any copy thereof. The following provisions, in their relevant parts,
147
+ will survive any termination or expiration of this Agreement, each for the
148
+ duration necessary to achieve its own intended purpose (e.g. the liability
149
+ provision will survive until the end of the applicable limitation
150
+ period):Sections 5 (Liability), 6(Warranty), 7 (Termination) and 8 (General
151
+ Provisions).
152
+
153
+ **7.3. Litigation.** If You initiate any legal action or proceedings against
154
+ Us or any other entity (including a cross-claim or counterclaim in a lawsuit),
155
+ alleging that the Model or a Derivative, or any part thereof, infringe upon
156
+ intellectual property or other rights owned or licensable by You, then any
157
+ licenses granted to You under this Agreement will immediately terminate as of
158
+ the date such legal action or claim is filed or initiated.
159
+
160
+ ## 8. General provisions
161
+
162
+ **8.1. Governing laws.** This Agreement will be governed by the laws of
163
+ France, without regard to choice of law principles, and the UN Convention on
164
+ Contracts for the International Sale of Goods does not apply to this
165
+ Agreement.
166
+
167
+ **8.2. Competent jurisdiction.** The courts of Paris shall have exclusive
168
+ jurisdiction of any dispute arising out of this Agreement.
169
+
170
+ **8.3. Severability.** If any provision of this Agreement is held to be
171
+ invalid, illegal or unenforceable, the remaining provisions shall be
172
+ unaffected thereby and remain valid as if such provision had not been set
173
+ forth herein.
174
+
175
+ ## 9. Definitions
176
+
177
+ "Agreement": means this Mistral AI Research License agreement governing the
178
+ access, use, and Distribution of the Mistral Models, Derivatives and Outputs.
179
+
180
+ "Derivative": means any (i) modified version of the Mistral Model (including
181
+ but not limited to any customized or fine-tuned version thereof), (ii) work
182
+ based on the Mistral Model, or (iii) any other derivative work thereof.
183
+
184
+ "Distribution", "Distributing", "Distribute" or "Distributed": means
185
+ supplying, providing or making available, by any means, a copy of the Mistral
186
+ Models and/or the Derivatives as the case may be, subject to Section 3 of this
187
+ Agreement.
188
+
189
+ "Mistral AI", "We" or "Us": means Mistral AI, a French société par actions
190
+ simplifiée registered in the Paris commercial registry under the number 952
191
+ 418 325, and having its registered seat at 15, rue des Halles, 75001 Paris.
192
+
193
+ "Mistral Model": means the foundational large language model(s), and its
194
+ elements which include algorithms, software, instructed checkpoints,
195
+ parameters, source code (inference code, evaluation code and, if applicable,
196
+ fine-tuning code) and any other elements associated thereto made available by
197
+ Mistral AI under this Agreement, including, if any, the technical
198
+ documentation, manuals and instructions for the use and operation thereof.
199
+
200
+ "Research Purposes": means any use of a Mistral Model, Derivative, or Output
201
+ that is solely for (a) personal, scientific or academic research, and (b) for
202
+ non-profit and non-commercial purposes, and not directly or indirectly
203
+ connected to any commercial activities or business operations. For
204
+ illustration purposes, Research Purposes does not include (1) any usage of the
205
+ Mistral Model, Derivative or Output by individuals or contractors employed in
206
+ or engaged by companies in the context of (a) their daily tasks, or (b) any
207
+ activity (including but not limited to any testing or proof-of-concept) that
208
+ is intended to generate revenue, nor (2) any Distribution by a commercial
209
+ entity of the Mistral Model, Derivative or Output whether in return for
210
+ payment or free of charge, in any medium or form, including but not limited to
211
+ through a hosted or managed service (e.g. SaaS, cloud instances, etc.), or
212
+ behind a software layer.
213
+
214
+ "Outputs": means any content generated by the operation of the Mistral Models
215
+ or the Derivatives from a prompt (i.e., text instructions) provided by users.
216
+ For the avoidance of doubt, Outputs do not include any components of a Mistral
217
+ Models, such as any fine-tuned versions of the Mistral Models, the weights, or
218
+ parameters.
219
+
220
+ "You": means the individual or entity entering into this Agreement with
221
+ Mistral AI.
222
+
223
+
224
+ *Mistral AI processes your personal data below to provide the model and
225
+ enforce its license. If you are affiliated with a commercial entity, we may
226
+ also send you communications about our models. For more information on your
227
+ rights and data handling, please see our <a
228
+ href="https://mistral.ai/terms/">privacy policy</a>.*
229
+ extra_gated_fields:
230
+ First Name: text
231
+ Last Name: text
232
+ Country: country
233
+ Affiliation: text
234
+ Job title: text
235
+ I understand that I can only use the model, any derivative versions and their outputs for non-commercial research purposes: checkbox
236
+ I understand that if I am a commercial entity, I am not permitted to use or distribute the model internally or externally, or expose it in my own offerings without a commercial license: checkbox
237
+ I understand that if I upload the model, or any derivative version, on any platform, I must include the Mistral Research License: checkbox
238
+ I understand that for commercial use of the model, I can contact Mistral or use the Mistral AI API on la Plateforme or any of our cloud provider partners: checkbox
239
+ By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Mistral Privacy Policy: checkbox
240
+ geo: ip_location
241
+ extra_gated_description: >-
242
+ Mistral AI processes your personal data below to provide the model and enforce
243
+ its license. If you are affiliated with a commercial entity, we may also send
244
+ you communications about our models. For more information on your rights and
245
+ data handling, please see our <a href="https://mistral.ai/terms/">privacy
246
+ policy</a>.
247
+ extra_gated_button_content: Submit
248
+ tags:
249
+ - mistral-common
250
+ ---
251
+
252
+ # Model Card for Ministral-8B-Instruct-2410
253
+
254
+ We introduce two new state-of-the-art models for local intelligence, on-device computing, and at-the-edge use cases. We call them les Ministraux: Ministral 3B and Ministral 8B.
255
+
256
+ The Ministral-8B-Instruct-2410 Language Model is an instruct fine-tuned model significantly outperforming existing models of similar size, released under the Mistral Research License.
257
+
258
+ If you are interested in using Ministral-3B or Ministral-8B commercially, outperforming Mistral-7B, [reach out to us](https://mistral.ai/contact/).
259
+
260
+ For more details about les Ministraux please refer to our release [blog post](https://mistral.ai/news/ministraux).
261
+
262
+ ## Ministral 8B Key features
263
+ - Released under the **Mistral Research License**, reach out to us for a commercial license
264
+ - Trained with a **128k context window** with **interleaved sliding-window attention**
265
+ - Trained on a large proportion of **multilingual and code data**
266
+ - Supports **function calling**
267
+ - Vocabulary size of **131k**, using the **V3-Tekken** tokenizer
268
+
269
+ ### Basic Instruct Template (V3-Tekken)
270
+
271
+ ```
272
+ <s>[INST]user message[/INST]assistant response</s>[INST]new user message[/INST]
273
+ ```
274
+
275
+ *For more information about the tokenizer please refer to [mistral-common](https://github.com/mistralai/mistral-common)*
276
+
277
+ ## Ministral 8B Architecture
278
+
279
+ | Feature | Value |
280
+ |:---------------------:|:--------------------:|
281
+ | **Architecture** | Dense Transformer |
282
+ | **Parameters** | 8,019,808,256 |
283
+ | **Layers** | 36 |
284
+ | **Heads** | 32 |
285
+ | **Dim** | 4096 |
286
+ | **KV Heads (GQA)** | 8 |
287
+ | **Hidden Dim** | 12288 |
288
+ | **Head Dim** | 128 |
289
+ | **Vocab Size** | 131,072 |
290
+ | **Context Length** | 128k |
291
+ | **Attention Pattern** | Ragged (128k,32k,32k,32k) |
292
+
293
+ ## Benchmarks
294
+
295
+ #### Base Models
296
+
297
+ <u>Knowledge & Commonsense</u>
298
+
299
+ | Model | MMLU | AGIEval | Winogrande | Arc-c | TriviaQA |
300
+ |:-------------:|:------:|:---------:|:------------:|:-------:|:----------:|
301
+ | Mistral 7B Base | 62.5 | 42.5 | 74.2 | 67.9 | 62.5 |
302
+ | Llama 3.1 8B Base | 64.7 | 44.4 | 74.6 | 46.0 | 60.2 |
303
+ | ***Ministral 8B Base*** | ***<u>65.0</u>*** | ***<u>48.3</u>*** | ***<u>75.3</u>*** | ***<u>71.9</u>*** | ***<u>65.5</u>*** |
304
+ | | | | | | |
305
+ | Gemma 2 2B Base | 52.4 | 33.8 | 68.7 | 42.6 | 47.8 |
306
+ | Llama 3.2 3B Base | 56.2 | 37.4 | 59.6 | 43.1 | 50.7 |
307
+ | ***Ministral 3B Base*** | ***<u>60.9</u>*** | ***<u>42.1</u>*** | ***<u>72.7</u>*** | ***<u>64.2</u>*** | ***<u>56.7</u>*** |
308
+
309
+ <u>Code & Math</u>
310
+
311
+ | Model | HumanEval pass@1 |GSM8K maj@8 |
312
+ |:-------------:|:-------------------:|:---------------:|
313
+ | Mistral 7B Base | 26.8 | 32.0 |
314
+ | Llama 3.1 8B Base | ***<u>37.8</u>*** | 42.2 |
315
+ | ***Ministral 8B Base*** | 34.8 | ***<u>64.5</u>*** |
316
+ | | | |
317
+ | Gemma 2 2B | 20.1 | 35.5 |
318
+ | Llama 3.2 3B | 14.6 | 33.5 |
319
+ | ***Ministral 3B*** | ***<u>34.2</u>*** | ***<u>50.9</u>*** |
320
+
321
+ <u>Multilingual</u>
322
+
323
+ | Model | French MMLU | German MMLU | Spanish MMLU |
324
+ |:-------------:|:-------------:|:-------------:|:-------------:|
325
+ | Mistral 7B Base | 50.6 | 49.6 | 51.4 |
326
+ | Llama 3.1 8B Base | 50.8 | 52.8 | 54.6 |
327
+ | ***Ministral 8B Base*** | ***<u>57.5</u>*** | ***<u>57.4</u>*** | ***<u>59.6</u>*** |
328
+ | | | | |
329
+ | Gemma 2 2B Base | 41.0 | 40.1 | 41.7 |
330
+ | Llama 3.2 3B Base | 42.3 | 42.2 | 43.1 |
331
+ | ***Ministral 3B Base*** | ***<u>49.1</u>*** | ***<u>48.3</u>*** | ***<u>49.5</u>*** |
332
+
333
+ ### Instruct Models
334
+
335
+ <u>Chat/Arena (gpt-4o judge)</u>
336
+
337
+ | Model | MTBench | Arena Hard | Wild bench |
338
+ |:-------------:|:---------:|:------------:|:------------:|
339
+ | Mistral 7B Instruct v0.3 | 6.7 | 44.3 | 33.1 |
340
+ | Llama 3.1 8B Instruct | 7.5 | 62.4 | 37.0 |
341
+ | Gemma 2 9B Instruct | 7.6 | 68.7 | ***<u>43.8</u>*** |
342
+ | ***Ministral 8B Instruct*** | ***<u>8.3</u>*** | ***<u>70.9</u>*** | 41.3 |
343
+ | | | | |
344
+ | Gemma 2 2B Instruct | 7.5 | 51.7 | 32.5 |
345
+ | Llama 3.2 3B Instruct | 7.2 | 46.0 | 27.2 |
346
+ | ***Ministral 3B Instruct*** | ***<u>8.1</u>*** | ***<u>64.3</u>*** | ***<u>36.3</u>*** |
347
+
348
+ <u>Code & Math</u>
349
+
350
+ | Model | MBPP pass@1 | HumanEval pass@1 | Math maj@1 |
351
+ |:-------------:|:-------------:|:------------------:|:-------------:|
352
+ | Mistral 7B Instruct v0.3 | 50.2 | 38.4 | 13.2 |
353
+ | Gemma 2 9B Instruct | 68.5 | 67.7 | 47.4 |
354
+ Llama 3.1 8B Instruct | 69.7 | 67.1 | 49.3 |
355
+ | ***Ministral 8B Instruct*** | ***<u>70.0</u>*** | ***<u>76.8</u>*** | ***<u>54.5</u>*** |
356
+ | | | | |
357
+ | Gemma 2 2B Instruct | 54.5 | 42.7 | 22.8 |
358
+ | Llama 3.2 3B Instruct | 64.6 | 61.0 | 38.4 |
359
+ | ***Ministral 3B* Instruct** | ***<u>67.7</u>*** | ***<u>77.4</u>*** | ***<u>51.7</u>*** |
360
+
361
+ <u>Function calling</u>
362
+
363
+ | Model | Internal bench |
364
+ |:-------------:|:-----------------:|
365
+ | Mistral 7B Instruct v0.3 | 6.9 |
366
+ | Llama 3.1 8B Instruct | N/A |
367
+ | Gemma 2 9B Instruct | N/A |
368
+ | ***Ministral 8B Instruct*** | ***<u>31.6</u>*** |
369
+ | | |
370
+ | Gemma 2 2B Instruct | N/A |
371
+ | Llama 3.2 3B Instruct | N/A |
372
+ | ***Ministral 3B Instruct*** | ***<u>28.4</u>*** |
373
+
374
+ ## Usage Examples
375
+
376
+ ### vLLM (recommended)
377
+
378
+ We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
379
+ to implement production-ready inference pipelines.
380
+
381
+ > [!IMPORTANT]
382
+ > Currently vLLM is capped at 32k context size because interleaved attention kernels for paged attention are not yet implemented in vLLM.
383
+ > Attention kernels for paged attention are being worked on and as soon as it is fully supported in vLLM, this model card will be updated.
384
+ > To take advantage of the full 128k context size we recommend [Mistral Inference](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410#mistral-inference)
385
+
386
+ **_Installation_**
387
+
388
+
389
+ Make sure you install `vLLM >= v0.6.4`:
390
+
391
+ ```
392
+ pip install --upgrade vllm
393
+ ```
394
+
395
+ Also make sure you have `mistral_common >= 1.4.4` installed:
396
+
397
+ ```
398
+ pip install --upgrade mistral_common
399
+ ```
400
+
401
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile).
402
+
403
+ **_Offline_**
404
+
405
+ ```py
406
+ from vllm import LLM
407
+ from vllm.sampling_params import SamplingParams
408
+
409
+ model_name = "mistralai/Ministral-8B-Instruct-2410"
410
+
411
+ sampling_params = SamplingParams(max_tokens=8192)
412
+
413
+ # note that running Ministral 8B on a single GPU requires 24 GB of GPU RAM
414
+ # If you want to divide the GPU requirement over multiple devices, please add *e.g.* `tensor_parallel=2`
415
+ llm = LLM(model=model_name, tokenizer_mode="mistral", config_format="mistral", load_format="mistral")
416
+
417
+ prompt = "Do we need to think for 10 seconds to find the answer of 1 + 1?"
418
+
419
+ messages = [
420
+ {
421
+ "role": "user",
422
+ "content": prompt
423
+ },
424
+ ]
425
+
426
+ outputs = llm.chat(messages, sampling_params=sampling_params)
427
+
428
+ print(outputs[0].outputs[0].text)
429
+ # You don't need to think for 10 seconds to find the answer to 1 + 1. The answer is 2,
430
+ # and you can easily add these two numbers in your mind very quickly without any delay.
431
+ ```
432
+
433
+ **_Server_**
434
+
435
+ You can also use Ministral-8B in a server/client setting.
436
+
437
+ 1. Spin up a server:
438
+
439
+
440
+ ```
441
+ vllm serve mistralai/Ministral-8B-Instruct-2410 --tokenizer_mode mistral --config_format mistral --load_format mistral
442
+ ```
443
+
444
+ **Note:** Running Ministral-8B on a single GPU requires 24 GB of GPU RAM.
445
+
446
+ If you want to divide the GPU requirement over multiple devices, please add *e.g.* `--tensor_parallel=2`
447
+
448
+ 2. And ping the client:
449
+
450
+ ```
451
+ curl --location 'http://<your-node-url>:8000/v1/chat/completions' \
452
+ --header 'Content-Type: application/json' \
453
+ --header 'Authorization: Bearer token' \
454
+ --data '{
455
+ "model": "mistralai/Ministral-8B-Instruct-2410",
456
+ "messages": [
457
+ {
458
+ "role": "user",
459
+ "content": "Do we need to think for 10 seconds to find the answer of 1 + 1?"
460
+ }
461
+ ]
462
+ }'
463
+
464
+ ```
465
+
466
+ ### Mistral-inference
467
+
468
+ We recommend using [mistral-inference](https://github.com/mistralai/mistral-inference) to quickly try out / "vibe-check" the model.
469
+
470
+
471
+ **_Install_**
472
+
473
+ Make sure to have `mistral_inference >= 1.5.0` installed.
474
+
475
+ ```
476
+ pip install mistral_inference --upgrade
477
+ ```
478
+
479
+ **_Download_**
480
+
481
+ ```py
482
+ from huggingface_hub import snapshot_download
483
+ from pathlib import Path
484
+
485
+ mistral_models_path = Path.home().joinpath('mistral_models', '8B-Instruct')
486
+ mistral_models_path.mkdir(parents=True, exist_ok=True)
487
+
488
+ snapshot_download(repo_id="mistralai/Ministral-8B-Instruct-2410", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)
489
+ ```
490
+
491
+ ### Chat
492
+
493
+ After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment. You can chat with the model using
494
+
495
+ ```
496
+ mistral-chat $HOME/mistral_models/8B-Instruct --instruct --max_tokens 256
497
+ ```
498
+
499
+ ### Passkey detection
500
+
501
+ > [!IMPORTANT]
502
+ > In this example the passkey message has over >100k tokens and mistral-inference
503
+ > does not have a chunked pre-fill mechanism. Therefore you will need a lot of
504
+ > GPU memory in order to run the below example (80 GB). For a more memory-efficient
505
+ > solution we recommend using vLLM.
506
+
507
+ ```py
508
+ from mistral_inference.transformer import Transformer
509
+ from pathlib import Path
510
+ import json
511
+ from mistral_inference.generate import generate
512
+ from huggingface_hub import hf_hub_download
513
+
514
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
515
+ from mistral_common.protocol.instruct.messages import UserMessage
516
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
517
+
518
+ def load_passkey_request() -> ChatCompletionRequest:
519
+ passkey_file = hf_hub_download(repo_id="mistralai/Ministral-8B-Instruct-2410", filename="passkey_example.json")
520
+
521
+ with open(passkey_file, "r") as f:
522
+ data = json.load(f)
523
+
524
+ message_content = data["messages"][0]["content"]
525
+ return ChatCompletionRequest(messages=[UserMessage(content=message_content)])
526
+
527
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
528
+ model = Transformer.from_folder(mistral_models_path, softmax_fp32=False)
529
+
530
+ completion_request = load_passkey_request()
531
+
532
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
533
+
534
+ out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
535
+ result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
536
+
537
+ print(result) # The pass key is 13005.
538
+ ```
539
+
540
+
541
+ ### Instruct following
542
+
543
+ ```py
544
+ from mistral_inference.transformer import Transformer
545
+ from mistral_inference.generate import generate
546
+
547
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
548
+ from mistral_common.protocol.instruct.messages import UserMessage
549
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
550
+
551
+
552
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
553
+ model = Transformer.from_folder(mistral_models_path)
554
+
555
+ completion_request = ChatCompletionRequest(messages=[UserMessage(content="How often does the letter r occur in Mistral?")])
556
+
557
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
558
+
559
+ out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
560
+ result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
561
+
562
+ print(result)
563
+ ```
564
+
565
+ ### Function calling
566
+
567
+ ```py
568
+ from mistral_common.protocol.instruct.tool_calls import Function, Tool
569
+ from mistral_inference.transformer import Transformer
570
+ from mistral_inference.generate import generate
571
+
572
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
573
+ from mistral_common.protocol.instruct.messages import UserMessage
574
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
575
+ from mistral_common.tokens.tokenizers.tekken import SpecialTokenPolicy
576
+
577
+
578
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
579
+ tekken = tokenizer.instruct_tokenizer.tokenizer
580
+ tekken.special_token_policy = SpecialTokenPolicy.IGNORE
581
+
582
+ model = Transformer.from_folder(mistral_models_path)
583
+
584
+ completion_request = ChatCompletionRequest(
585
+ tools=[
586
+ Tool(
587
+ function=Function(
588
+ name="get_current_weather",
589
+ description="Get the current weather",
590
+ parameters={
591
+ "type": "object",
592
+ "properties": {
593
+ "location": {
594
+ "type": "string",
595
+ "description": "The city and state, e.g. San Francisco, CA",
596
+ },
597
+ "format": {
598
+ "type": "string",
599
+ "enum": ["celsius", "fahrenheit"],
600
+ "description": "The temperature unit to use. Infer this from the users location.",
601
+ },
602
+ },
603
+ "required": ["location", "format"],
604
+ },
605
+ )
606
+ )
607
+ ],
608
+ messages=[
609
+ UserMessage(content="What's the weather like today in Paris?"),
610
+ ],
611
+ )
612
+
613
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
614
+
615
+ out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
616
+ result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
617
+
618
+ print(result)
619
+ ```
620
+
621
+ ## The Mistral AI Team
622
+
623
+ Albert Jiang, Alexandre Abou Chahine, Alexandre Sablayrolles, Alexis Tacnet, Alodie Boissonnet, Alok Kothari, Amélie Héliou, Andy Lo, Anna Peronnin, Antoine Meunier, Antoine Roux, Antonin Faure, Aritra Paul, Arthur Darcet, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Avinash Sooriyarachchi, Baptiste Rozière, Barry Conklin, Bastien Bouillon, Blanche Savary de Beauregard, Carole Rambaud, Caroline Feldman, Charles de Freminville, Charline Mauro, Chih-Kuan Yeh, Chris Bamford, Clement Auguy, Corentin Heintz, Cyriaque Dubois, Devendra Singh Chaplot, Diego Las Casas, Diogo Costa, Eléonore Arcelin, Emma Bou Hanna, Etienne Metzger, Fanny Olivier Autran, Francois Lesage, Garance Gourdel, Gaspard Blanchet, Gaspard Donada Vidal, Gianna Maria Lengyel, Guillaume Bour, Guillaume Lample, Gustave Denis, Harizo Rajaona, Himanshu Jaju, Ian Mack, Ian Mathew, Jean-Malo Delignon, Jeremy Facchetti, Jessica Chudnovsky, Joachim Studnia, Justus Murke, Kartik Khandelwal, Kenneth Chiu, Kevin Riera, Leonard Blier, Leonard Suslian, Leonardo Deschaseaux, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Sophia Yang, Margaret Jennings, Marie Pellat, Marie Torelli, Marjorie Janiewicz, Mathis Felardos, Maxime Darrin, Michael Hoff, Mickaël Seznec, Misha Jessel Kenyon, Nayef Derwiche, Nicolas Carmont Zaragoza, Nicolas Faurie, Nicolas Moreau, Nicolas Schuhl, Nikhil Raghuraman, Niklas Muhs, Olivier de Garrigues, Patricia Rozé, Patricia Wang, Patrick von Platen, Paul Jacob, Pauline Buche, Pavankumar Reddy Muddireddy, Perry Savas, Pierre Stock, Pravesh Agrawal, Renaud de Peretti, Romain Sauvestre, Romain Sinthe, Roman Soletskyi, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Soham Ghosh, Sylvain Regnier, Szymon Antoniak, Teven Le Scao, Theophile Gervet, Thibault Schueller, Thibaut Lavril, Thomas Wang, Timothée Lacroix, Valeriia Nemychnikova, Wendy Shang, William El Sayed, William Marshall
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MistralForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "head_dim": 128,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 4096,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 12288,
13
+ "layer_types": [
14
+ "full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
15
+ "full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
16
+ "full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
17
+ "full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
18
+ "full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
19
+ "full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
20
+ "full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
21
+ "full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
22
+ "full_attention", "sliding_attention", "sliding_attention", "sliding_attention"
23
+ ],
24
+ "max_position_embeddings": 32768,
25
+ "model_type": "mistral",
26
+ "num_attention_heads": 32,
27
+ "num_hidden_layers": 36,
28
+ "num_key_value_heads": 8,
29
+ "rms_norm_eps": 1e-05,
30
+ "rope_theta": 100000000.0,
31
+ "sliding_window": 32768,
32
+ "tie_word_embeddings": false,
33
+ "torch_dtype": "bfloat16",
34
+ "transformers_version": "4.46.0.dev0",
35
+ "use_cache": true,
36
+ "vocab_size": 131072
37
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.46.0.dev0"
6
+ }
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c2954774d8f600560e5d75b4db7f6aa7a148a5c4dea6860148908503f649ecf
3
+ size 4983007904
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:888d9728a60744c4ce88984ae86952a6f07f47ee90588aa1faa4a5063951f0c1
3
+ size 4999836776
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a10db8c522602eee5fb8be78980217a51c75b160c0584d110a1c694bfd4f671a
3
+ size 4983067960
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e75c6183282198d40135032c9e757fb70c2455f8c0f4ded7b0895b49b72bbd5f
3
+ size 1073741952
model.safetensors.index.json ADDED
@@ -0,0 +1,334 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 16039616512
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00004-of-00004.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00002-of-00004.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00004.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00002-of-00004.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
242
+ "model.layers.32.input_layernorm.weight": "model-00003-of-00004.safetensors",
243
+ "model.layers.32.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
244
+ "model.layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
245
+ "model.layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
246
+ "model.layers.32.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
247
+ "model.layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
248
+ "model.layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
249
+ "model.layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
250
+ "model.layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
251
+ "model.layers.33.input_layernorm.weight": "model-00003-of-00004.safetensors",
252
+ "model.layers.33.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
253
+ "model.layers.33.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
254
+ "model.layers.33.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
255
+ "model.layers.33.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
256
+ "model.layers.33.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
257
+ "model.layers.33.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
258
+ "model.layers.33.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
259
+ "model.layers.33.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
260
+ "model.layers.34.input_layernorm.weight": "model-00003-of-00004.safetensors",
261
+ "model.layers.34.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
262
+ "model.layers.34.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
263
+ "model.layers.34.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
264
+ "model.layers.34.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
265
+ "model.layers.34.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
266
+ "model.layers.34.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
267
+ "model.layers.34.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
268
+ "model.layers.34.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
269
+ "model.layers.35.input_layernorm.weight": "model-00003-of-00004.safetensors",
270
+ "model.layers.35.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
271
+ "model.layers.35.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
272
+ "model.layers.35.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
273
+ "model.layers.35.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
274
+ "model.layers.35.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
275
+ "model.layers.35.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
276
+ "model.layers.35.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
277
+ "model.layers.35.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
278
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
279
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
280
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
281
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
282
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
283
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
284
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
285
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
286
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
287
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
288
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
289
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
290
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
291
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
292
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
293
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
294
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
295
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
296
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
297
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
298
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
299
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
300
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
301
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
302
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
303
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
304
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
305
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
306
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
307
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
308
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
309
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
310
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
311
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
312
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
313
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
314
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
315
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
316
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
317
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
318
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
319
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
320
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
321
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
322
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
323
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00004.safetensors",
324
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
325
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
326
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
327
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
328
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
329
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
330
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
331
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
332
+ "model.norm.weight": "model-00003-of-00004.safetensors"
333
+ }
334
+ }
params.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dim": 4096,
3
+ "n_layers": 36,
4
+ "head_dim": 128,
5
+ "hidden_dim": 12288,
6
+ "n_heads": 32,
7
+ "n_kv_heads": 8,
8
+ "norm_eps": 1e-05,
9
+ "vocab_size": 131072,
10
+ "rope_theta": 100000000.0,
11
+ "sliding_window": [null, 32768, 32768, 32768],
12
+ "max_position_embeddings": 131072
13
+ }
passkey_example.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tekken.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eccd1665d2e477697c33cb7f0daa6f6dfefc57a0a6bceb66d4be52952f827516
3
+ size 14801223
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d7edbeaf20dd7f571b5dd1c54d9ace4f9b6299127cc7ba2afb14a6d51a4a79a4
3
+ size 17078136
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff