Image-Text-to-Text
Transformers
Safetensors
llava
conversational
amant555 commited on
Commit
9995a75
·
0 Parent(s):

Add: Apriel 1.6

Browse files
.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,904 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-text-to-text
4
+ library_name: transformers
5
+ ---
6
+
7
+ # Apriel-1.6-15B-Thinker: Cost-efficient Frontier Multimodal Performance
8
+
9
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63d3095c2727d7888cbb54e2/Lt1t0tOO5emz1X23Azg-E.png" width="120" alt="thumbnail"/> `/ˈɑː.pri.əl/`
10
+
11
+ ---
12
+
13
+ # Table of Contents
14
+
15
+ 1. [Summary](#summary)
16
+ 2. [Evaluation](#evaluation)
17
+ 3. [Intended Use](#intended-use)
18
+ 4. [How to Use](#how-to-use)
19
+ 5. [Training Details](#training-details)
20
+ 6. [Limitations](#limitations)
21
+ 7. [Security and Responsible Use](#security-and-responsible-use)
22
+ 8. [License](#license)
23
+ 9. [Citation](#citation)
24
+
25
+ ---
26
+
27
+ # Summary
28
+
29
+ **Apriel-1.6-15B-Thinker** is an updated multimodal reasoning model in ServiceNow’s Apriel SLM series, building on [**Apriel-1.5-15B-Thinker**](https://huggingface.co/ServiceNow-AI/Apriel-1.5-15b-Thinker).
30
+ With significantly improved text and image reasoning capabilities, Apriel-1.6 achieves competitive performance against models up to 10x its size.
31
+ Like its predecessor, it benefits from extensive continual pre-training across both text and image domains.
32
+ We additionally perform post-training that focuses on Supervised Finetuning (SFT) and Reinforcement Learning (RL).
33
+ Apriel-1.6 obtains frontier performance without sacrificing reasoning token efficiency.
34
+ The model improves or maintains task performance when compared with Apriel-1.5-15B-Thinker, while **reducing reasoning token usage by more than 30\%**.
35
+
36
+ **Highlights**
37
+
38
+ - Achieves a score of **57** on the Artificial Analysis index outperforming models like Gemini 2.5 Flash, Claude Haiku 4.5 and GPT OSS 20b. It obtains a score on par with Qwen3 235B A22B, while being significantly more efficient.
39
+ - **Reduces reasoning token usage by more than 30%**, delivering significantly better efficiency than Apriel-1.5-15B-Thinker.
40
+ - Scores **69** on Tau2 Bench Telecom and **69** on IFBench, which are key benchmarks for the enterprise domain.
41
+ - At 15B parameters, the model fits on a single GPU, making it highly memory-efficient.
42
+ - Based on community feedback on Apriel-1.5-15B-Thinker, we simplified the chat template by removing redundant tags and introduced four special tokens to the tokenizer (`<tool_calls>`, `</tool_calls>`, `[BEGIN FINAL RESPONSE]`, `<|end|>`) for easier output parsing.
43
+
44
+ Please see our [blog post](https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-15b-thinker) for more details
45
+
46
+ ---
47
+
48
+ # Evaluation
49
+
50
+ - Text benchmarks included in the Artificial Analysis Index v3.0 use scores reported by [Artificial Analysis](https://artificialanalysis.ai/). All other benchmarks were evaluated internally.
51
+
52
+ <table>
53
+ <tr>
54
+ <th>Category</th>
55
+ <th>Benchmark</th>
56
+ <th>Apriel-1.6-15B-Thinker</th>
57
+ <th>Apriel-1.5-15B-Thinker</th>
58
+ <th>GPT OSS 120B</th>
59
+ <th>DeepSeek R1 0528</th>
60
+ <th>Gemini 2.5 Flash (Sep)</th>
61
+ <th>GPT 5 mini (high)</th>
62
+ <th>Claude 4.5 Sonnet (thinking)</th>
63
+ <th>o3-mini (high)</th>
64
+ </tr>
65
+
66
+ <!-- Function Calling -->
67
+ <tr>
68
+ <td rowspan="5" class="category">Function Calling</td>
69
+ <td>BFCL v3 only</td>
70
+ <td>63.50</td>
71
+ <td>51.88</td>
72
+ <td>50.62</td>
73
+ <td>39.75</td>
74
+ <td>39.75</td>
75
+ <td>17.62</td>
76
+ <td>-</td>
77
+ <td>50</td>
78
+ </tr>
79
+ <tr>
80
+ <td>Tau2 bench Telecom</td>
81
+ <td>69</td>
82
+ <td>57.8</td>
83
+ <td>66</td>
84
+ <td>37</td>
85
+ <td>32</td>
86
+ <td>68</td>
87
+ <td>50.8</td>
88
+ <td>31</td>
89
+ </tr>
90
+ <tr>
91
+ <td>Tau2 bench Retail</td>
92
+ <td>66.67</td>
93
+ <td>46.78</td>
94
+ <td>61.4</td>
95
+ <td>59.94</td>
96
+ <td>61.69</td>
97
+ <td>73.39</td>
98
+ <td>69.8</td>
99
+ <td>75.73</td>
100
+ </tr>
101
+ <tr>
102
+ <td>Tau2 bench Airline</td>
103
+ <td>58</td>
104
+ <td>52</td>
105
+ <td>45.3</td>
106
+ <td>47.33</td>
107
+ <td>56.66</td>
108
+ <td>59.33</td>
109
+ <td>58</td>
110
+ <td>61.33</td>
111
+ </tr>
112
+ <tr>
113
+ <td>ComplexFuncBench</td>
114
+ <td>33.2</td>
115
+ <td>19</td>
116
+ <td>24.6</td>
117
+ <td>24.2</td>
118
+ <td>26.3</td>
119
+ <td>37.5</td>
120
+ <td>24.6</td>
121
+ <td>18.9</td>
122
+ </tr>
123
+
124
+ <!-- Instruction Following -->
125
+ <tr>
126
+ <td rowspan="4" class="category">Instruction Following</td>
127
+ <td>Agent IF</td>
128
+ <td>57.2</td>
129
+ <td>55</td>
130
+ <td>54.20</td>
131
+ <td>52.20</td>
132
+ <td>49.70</td>
133
+ <td>57.60</td>
134
+ <td>54.50</td>
135
+ <td>54.90</td>
136
+ </tr>
137
+ <tr>
138
+ <td>Multi IF</td>
139
+ <td>83.34</td>
140
+ <td>76.91</td>
141
+ <td>82.95</td>
142
+ <td>73.76</td>
143
+ <td>82.49</td>
144
+ <td>85.37</td>
145
+ <td>84.32</td>
146
+ <td>87.28</td>
147
+ </tr>
148
+ <tr>
149
+ <td>Multi-Challenge</td>
150
+ <td>46.15</td>
151
+ <td>41.39</td>
152
+ <td>46.90</td>
153
+ <td>44.50</td>
154
+ <td>49.08</td>
155
+ <td>57.90</td>
156
+ <td>42.49</td>
157
+ <td>38.46</td>
158
+ </tr>
159
+ <tr>
160
+ <td>IF Bench</td>
161
+ <td>69</td>
162
+ <td>62</td>
163
+ <td>69</td>
164
+ <td>40</td>
165
+ <td>50</td>
166
+ <td>75</td>
167
+ <td>57</td>
168
+ <td>70.07</td>
169
+ </tr>
170
+
171
+ <!-- Math -->
172
+ <tr>
173
+ <td class="category">Math</td>
174
+ <td>AIME 25</td>
175
+ <td>88</td>
176
+ <td>88</td>
177
+ <td>93</td>
178
+ <td>76</td>
179
+ <td>73</td>
180
+ <td>91</td>
181
+ <td>88</td>
182
+ <td>86.67</td>
183
+ </tr>
184
+
185
+ <!-- Coding -->
186
+ <tr>
187
+ <td rowspan="3" class="category">Coding</td>
188
+ <td>Struct Eval</td>
189
+ <td>79</td>
190
+ <td>48.50</td>
191
+ <td>71</td>
192
+ <td>73</td>
193
+ <td>70</td>
194
+ <td>69.92</td>
195
+ <td>76</td>
196
+ <td>73</td>
197
+ </tr>
198
+ <tr>
199
+ <td>LCB</td>
200
+ <td>81</td>
201
+ <td>73</td>
202
+ <td>65</td>
203
+ <td>77</td>
204
+ <td>70</td>
205
+ <td>84</td>
206
+ <td>71</td>
207
+ <td>73</td>
208
+ </tr>
209
+ <tr>
210
+ <td>SciCode</td>
211
+ <td>37</td>
212
+ <td>35</td>
213
+ <td>36</td>
214
+ <td>40</td>
215
+ <td>41</td>
216
+ <td>39</td>
217
+ <td>45</td>
218
+ <td>40</td>
219
+ </tr>
220
+
221
+ <!-- Agentic -->
222
+ <tr>
223
+ <td rowspan="7" class="category">Agentic</td>
224
+ <td>DeepresearchBench</td>
225
+ <td>36.47</td>
226
+ <td>32.73</td>
227
+ <td>36.30</td>
228
+ <td>34.19</td>
229
+ <td>38.15</td>
230
+ <td>-</td>
231
+ <td>-</td>
232
+ <td>33.40</td>
233
+ </tr>
234
+ <tr>
235
+ <td>GAIA</td>
236
+ <td>40</td>
237
+ <td>30.91</td>
238
+ <td>21.21</td>
239
+ <td>32.12</td>
240
+ <td>47.88</td>
241
+ <td>65.45</td>
242
+ <td>69.09</td>
243
+ <td>23.03</td>
244
+ </tr>
245
+ <tr>
246
+ <td>Work-Arena L1</td>
247
+ <td>58</td>
248
+ <td>51.5</td>
249
+ <td>50.9</td>
250
+ <td>63.9</td>
251
+ <td>51.8</td>
252
+ <td>65.5</td>
253
+ <td>62.7</td>
254
+ <td>52.4</td>
255
+ </tr>
256
+ <tr>
257
+ <td>OS World Small</td>
258
+ <td>16.70</td>
259
+ <td>13.90</td>
260
+ <td>16.70</td>
261
+ <td>25</td>
262
+ <td>19.40</td>
263
+ <td>22.20</td>
264
+ <td>30.60</td>
265
+ <td>19.40</td>
266
+ </tr>
267
+ <tr>
268
+ <td>SWE Bench Verified</td>
269
+ <td>23</td>
270
+ <td>16</td>
271
+ <td>31</td>
272
+ <td>29.60</td>
273
+ <td>34.20</td>
274
+ <td>61</td>
275
+ <td>64.2</td>
276
+ <td>22.60</td>
277
+ </tr>
278
+ <tr>
279
+ <td>Terminal Bench</td>
280
+ <td>14</td>
281
+ <td>10</td>
282
+ <td>22</td>
283
+ <td>15</td>
284
+ <td>13</td>
285
+ <td>31</td>
286
+ <td>33</td>
287
+ <td>5.67</td>
288
+ </tr>
289
+ <tr>
290
+ <td>Aider Polyglot</td>
291
+ <td>37.68</td>
292
+ <td>26.37</td>
293
+ <td>42</td>
294
+ <td>71.40</td>
295
+ <td>40</td>
296
+ <td>71.60</td>
297
+ <td>78</td>
298
+ <td>60.40</td>
299
+ </tr>
300
+
301
+ <!-- Knowledge -->
302
+ <tr>
303
+ <td class="category">Knowledge</td>
304
+ <td>MMLU Pro</td>
305
+ <td>79</td>
306
+ <td>77</td>
307
+ <td>85</td>
308
+ <td>85</td>
309
+ <td>83</td>
310
+ <td>84</td>
311
+ <td>88</td>
312
+ <td>80</td>
313
+ </tr>
314
+
315
+ <!-- Creative Writing -->
316
+ <tr>
317
+ <td class="category">Creative Writing</td>
318
+ <td>Creative writing v3 / EQ Bench</td>
319
+ <td>59.73</td>
320
+ <td>60.24</td>
321
+ <td>53.70</td>
322
+ <td>79.40</td>
323
+ <td>74.25</td>
324
+ <td>75.25</td>
325
+ <td>80.70</td>
326
+ <td>30.40</td>
327
+ </tr>
328
+
329
+ <!-- Others -->
330
+ <tr>
331
+ <td rowspan="2" class="category">Others</td>
332
+ <td>GPQA Diamond</td>
333
+ <td>73</td>
334
+ <td>71</td>
335
+ <td>78</td>
336
+ <td>81</td>
337
+ <td>79</td>
338
+ <td>83</td>
339
+ <td>83</td>
340
+ <td>77</td>
341
+ </tr>
342
+ <tr>
343
+ <td>HLE</td>
344
+ <td>10</td>
345
+ <td>12</td>
346
+ <td>18.5</td>
347
+ <td>14.9</td>
348
+ <td>11.1</td>
349
+ <td>19.7</td>
350
+ <td>17.3</td>
351
+ <td>12.3</td>
352
+ </tr>
353
+
354
+ <!-- Long Context -->
355
+ <tr>
356
+ <td class="category">Long Context</td>
357
+ <td>AA LCR</td>
358
+ <td>50*</td>
359
+ <td>20</td>
360
+ <td>51</td>
361
+ <td>55</td>
362
+ <td>62</td>
363
+ <td>68</td>
364
+ <td>66</td>
365
+ <td>-</td>
366
+ </tr>
367
+ </table>
368
+
369
+
370
+
371
+ \* AA LCR score in the table is with [DCA](https://arxiv.org/pdf/2402.17463) enabled. With default config, the model scores 36 on AA LCR.
372
+
373
+ ---
374
+
375
+ - For image benchmarks, we report evaluations obtained by https://github.com/open-compass/VLMEvalKit
376
+
377
+ <table>
378
+ <tr>
379
+ <td><strong>Benchmark</strong>
380
+ </td>
381
+ <td><strong>Apriel-1.6-15B-Thinker</strong>
382
+ </td>
383
+ <td><strong>Apriel-1.5-15B-Thinker</strong>
384
+ </td>
385
+ <td><strong>GPT-5 (high)</strong>
386
+ </td>
387
+ <td><strong>GLM-4.5V (Thinking)</strong>
388
+ </td>
389
+ <td><strong>Gemini 2.5 Flash (high)</strong>
390
+ </td>
391
+ <td><strong>Claude Sonnet 3.7 (Thinking)</strong>
392
+ </td>
393
+ <td><strong>GPT-5 (Minimal)</strong>
394
+ </td>
395
+ <td><strong>Grok 4 Fast (Thinking)</strong>
396
+ </td>
397
+ </tr>
398
+ <tr>
399
+ <td> MMMU (validation)
400
+ </td>
401
+ <td> 72
402
+ </td>
403
+ <td> 70.22
404
+ </td>
405
+ <td> 81.33
406
+ </td>
407
+ <td> 74.33
408
+ </td>
409
+ <td> 70.66
410
+ </td>
411
+ <td> 73.66
412
+ </td>
413
+ <td> 66.66
414
+ </td>
415
+ <td> 70.11
416
+ </td>
417
+ </tr>
418
+ <tr>
419
+ <td>MMMU-PRO (10 choice)
420
+ </td>
421
+ <td> 60.28
422
+ </td>
423
+ <td> 55.38
424
+ </td>
425
+ <td> 74.73
426
+ </td>
427
+ <td> 64.16
428
+ </td>
429
+ <td> 67.86
430
+ </td>
431
+ <td> 64.50
432
+ </td>
433
+ <td> 66.06
434
+ </td>
435
+ <td> 61.61
436
+ </td>
437
+ </tr>
438
+ <tr>
439
+ <td>MMMU-PRO (Vision Only)
440
+ </td>
441
+ <td> 52.89
442
+ </td>
443
+ <td> 48.21
444
+ </td>
445
+ <td> 66.93
446
+ </td>
447
+ <td> 61.50
448
+ </td>
449
+ <td> 56.76
450
+ </td>
451
+ <td> 60.11
452
+ </td>
453
+ <td> 57.68
454
+ </td>
455
+ <td> 22.94
456
+ </td>
457
+ </tr>
458
+ <tr>
459
+ <td>LogicVista
460
+ </td>
461
+ <td> 58.61
462
+ </td>
463
+ <td> 58.39
464
+ </td>
465
+ <td> 69.35
466
+ </td>
467
+ <td> 63.53
468
+ </td>
469
+ <td> 63.75
470
+ </td>
471
+ <td> 69.12
472
+ </td>
473
+ <td> 44.51
474
+ </td>
475
+ <td> 47.42
476
+ </td>
477
+ </tr>
478
+ <tr>
479
+ <td>MathVision
480
+ </td>
481
+ <td> 60.85
482
+ </td>
483
+ <td> 50.99
484
+ </td>
485
+ <td> 67.10
486
+ </td>
487
+ <td> 59.53
488
+ </td>
489
+ <td> 59.21
490
+ </td>
491
+ <td> 50.32
492
+ </td>
493
+ <td> 35.52
494
+ </td>
495
+ <td> 48.35
496
+ </td>
497
+ </tr>
498
+ <tr>
499
+ <td>MathVista
500
+ </td>
501
+ <td> 79.90
502
+ </td>
503
+ <td> 75.50
504
+ </td>
505
+ <td> 83.30
506
+ </td>
507
+ <td> 83.60
508
+ </td>
509
+ <td> 78.50
510
+ </td>
511
+ <td> 74.60
512
+ </td>
513
+ <td> 61.20
514
+ </td>
515
+ <td> 68.20
516
+ </td>
517
+ </tr>
518
+ <tr>
519
+ <td>MathVerse (Vision Dominant)
520
+ </td>
521
+ <td> 66.75
522
+ </td>
523
+ <td> 58.38
524
+ </td>
525
+ <td> 79.82
526
+ </td>
527
+ <td> 68.65
528
+ </td>
529
+ <td> 70.68
530
+ </td>
531
+ <td> 56.09
532
+ </td>
533
+ <td> 39.84
534
+ </td>
535
+ <td> 54.69
536
+ </td>
537
+ </tr>
538
+ <tr>
539
+ <td>MathVerse (Text Dominant)
540
+ </td>
541
+ <td> 79.06
542
+ </td>
543
+ <td> 76.40
544
+ </td>
545
+ <td> 84.64
546
+ </td>
547
+ <td> 77.41
548
+ </td>
549
+ <td> 78.80
550
+ </td>
551
+ <td> 69.28
552
+ </td>
553
+ <td> 43.78
554
+ </td>
555
+ <td> 72.20
556
+ </td>
557
+ </tr>
558
+ <tr>
559
+ <td>MMStar
560
+ </td>
561
+ <td> 70.66
562
+ </td>
563
+ <td> 67.73
564
+ </td>
565
+ <td> 77.74
566
+ </td>
567
+ <td> 74.46
568
+ </td>
569
+ <td> 73.86
570
+ </td>
571
+ <td> 70
572
+ </td>
573
+ <td> 63.60
574
+ </td>
575
+ <td> 64.80
576
+ </td>
577
+ </tr>
578
+ <tr>
579
+ <td>CharXiv (descriptive)
580
+ </td>
581
+ <td> 89.85
582
+ </td>
583
+ <td> 88.20
584
+ </td>
585
+ <td> 91.25
586
+ </td>
587
+ <td> 90.80
588
+ </td>
589
+ <td> 83.60
590
+ </td>
591
+ <td> 93.27
592
+ </td>
593
+ <td> 82.45
594
+ </td>
595
+ <td> 68.15
596
+ </td>
597
+ </tr>
598
+ <tr>
599
+ <td>CharXiv (reasoning)
600
+ </td>
601
+ <td> 56.00
602
+ </td>
603
+ <td> 50.10
604
+ </td>
605
+ <td> 71.50
606
+ </td>
607
+ <td> 63.00
608
+ </td>
609
+ <td> 56.50
610
+ </td>
611
+ <td> 70.90
612
+ </td>
613
+ <td> 52.80
614
+ </td>
615
+ <td> 33.50
616
+ </td>
617
+ </tr>
618
+ <tr>
619
+ <td>AI2D Test
620
+ </td>
621
+ <td> 86.04
622
+ </td>
623
+ <td> 82.87
624
+ </td>
625
+ <td> 90.05
626
+ </td>
627
+ <td> 87.75
628
+ </td>
629
+ <td> 82.09
630
+ </td>
631
+ <td> 84.19
632
+ </td>
633
+ <td> 85.16
634
+ </td>
635
+ <td> 81.86
636
+ </td>
637
+ </tr>
638
+ <tr>
639
+ <td>BLINK
640
+ </td>
641
+ <td> 63.96
642
+ </td>
643
+ <td> 58.71
644
+ </td>
645
+ <td> 70.22
646
+ </td>
647
+ <td> 66.59
648
+ </td>
649
+ <td> 65.64
650
+ </td>
651
+ <td> 64.49
652
+ </td>
653
+ <td> 64.59
654
+ </td>
655
+ <td> 54.39
656
+ </td>
657
+ </tr>
658
+ </table>
659
+
660
+ ---
661
+
662
+ # Intended Use
663
+
664
+ The Apriel family of models are designed for a variety of general-purpose instruction tasks, including:
665
+
666
+ - Code assistance and generation
667
+ - Logical reasoning and multi-step tasks
668
+ - Question answering and information retrieval
669
+ - Function calling, complex instruction following and agent use cases
670
+
671
+ They are **not intended** for use in safety-critical applications without human oversight or in scenarios requiring guaranteed factual accuracy.
672
+
673
+ ---
674
+ # How to Use
675
+
676
+ ```bash
677
+ pip install transformers
678
+ ```
679
+
680
+ ## Running the Reasoning model
681
+
682
+
683
+ Here is a code snippet demonstrating the model's usage with the transformers library's generate function:
684
+
685
+ ```python
686
+ # Tested with transformers==4.48
687
+
688
+ import re
689
+ import requests
690
+ import torch
691
+ from PIL import Image
692
+ from transformers import AutoProcessor, AutoModelForImageTextToText
693
+
694
+ # Load model
695
+ model_id = "ServiceNow-AI/Apriel-1.6-15b-Thinker"
696
+ model = AutoModelForImageTextToText.from_pretrained(
697
+ model_id,
698
+ torch_dtype=torch.bfloat16,
699
+ device_map="auto"
700
+ )
701
+ processor = AutoProcessor.from_pretrained(model_id)
702
+
703
+ # Example 1: Text-only prompt
704
+ chat = [
705
+ {
706
+ "role": "user",
707
+ "content": [
708
+ {"type": "text", "text": "What is the capital for France?"},
709
+ ],
710
+ }
711
+ ]
712
+
713
+ inputs = processor.apply_chat_template(chat, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt")
714
+ inputs = {k: v.to(model.device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
715
+ inputs.pop("token_type_ids", None)
716
+
717
+ with torch.no_grad():
718
+ output_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=True, temperature=0.6)
719
+
720
+ generated_ids = output_ids[:, inputs['input_ids'].shape[1]:]
721
+ output = processor.decode(generated_ids[0], skip_special_tokens=True)
722
+ response = re.findall(r"\[BEGIN FINAL RESPONSE\](.*?)(?:<\|end\|>)", output, re.DOTALL)[0].strip()
723
+
724
+ print("Text-only Response:", response)
725
+
726
+ # Example 2: Image understanding
727
+ url = "https://picsum.photos/id/237/200/300"
728
+ image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
729
+
730
+ chat = [
731
+ {
732
+ "role": "user",
733
+ "content": [
734
+ {"type": "text", "text": "Which animal is this?"},
735
+ {"type": "image"},
736
+ ],
737
+ }
738
+ ]
739
+
740
+ prompt = processor.apply_chat_template(chat, add_generation_prompt=True, tokenize=False)
741
+ inputs = processor(text=prompt, images=[image], return_tensors="pt").to(model.device)
742
+
743
+ with torch.no_grad():
744
+ output_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=True, temperature=0.6)
745
+
746
+ generated_ids = output_ids[:, inputs['input_ids'].shape[1]:]
747
+ output = processor.decode(generated_ids[0], skip_special_tokens=True)
748
+ response = re.findall(r"\[BEGIN FINAL RESPONSE\](.*?)(?:<\|end\|>)", output, re.DOTALL)[0].strip()
749
+
750
+ print("Image Response:", response)
751
+
752
+ ```
753
+
754
+ ## Usage Guidelines
755
+ 1. Use the model’s default chat template, which already includes a system prompt.
756
+ 2. We recommend setting temperature to `0.6`.
757
+ 3. We ensure the model starts with `Here are my reasoning steps:\n` during all our evaluations. This is implemented in the default chat template.
758
+ 4. For multi-turn conversations, intermediate turns (historical model outputs) are expected to contain only the final response, without reasoning steps.
759
+
760
+ ---
761
+
762
+ ## Chat Template
763
+
764
+
765
+ ```
766
+ <|begin_system|>
767
+ You are a thoughtful, systematic AI assistant from ServiceNow Language Models (SLAM) lab. Analyze each question carefully, present your reasoning step-by-step, then provide the final response after the marker [BEGIN FINAL RESPONSE].
768
+ <|begin_user|>
769
+ # user message here
770
+ <|begin_assistant|>
771
+ Here are my reasoning steps:
772
+ # thoughts here
773
+ [BEGIN FINAL RESPONSE]
774
+ # assistant response here
775
+ <|end|>
776
+ ```
777
+ The model will first generate its thinking process and then generate its final response, starting with `[BEGIN FINAL RESPONSE]`. Here is a code snippet demonstrating the application of the chat template:
778
+
779
+
780
+
781
+ ```python
782
+ from transformers import AutoTokenizer
783
+ model_name = "ServiceNow-AI/Apriel-1.6-15b-Thinker"
784
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
785
+
786
+ # prepare the model input
787
+ custom_system_prompt = "Answer like a pirate."
788
+ prompt = "You are an expert assistant in the implementation of customer experience management aspect of retail applications \n \nYou will be using Python as the programming language. \n \nYou will utilize a factory design pattern for the implementation and following the dependency inversion principle \n \nYou will modify the implementation based on user requirements. \n \nUpon user request, you will add, update, and remove the features & enhancements in the implementation provided by you. \n \nYou will ask whether the user wants to refactor the provided code or needs a sample implementation for reference. Upon user confirmation, I will proceed accordingly. \n \n**Guidelines:** \n 1. **User Requirements:** \n - You have to ask users about their requirements, clarify the user expectations, and suggest the best possible solution by providing examples of Python code snippets. \n - Ask users about which type of reports they need to assess the AI model's performance, accuracy, and reliability. \n - After providing the solution, you have to ask the user about the trial of the solution and modify the solution based on the user feedback. \n \n 2. **Libraries/Frameworks:** \n - You will be utilizing Python as a programming language. \n - You will be using Flask framework for REST APIS implementation \n \n 3. **Communication Gesture:** \n - Your conversation with the user should be interactive, supportive, courageous, and professional. \n - You have to break down the complex concepts into sub-concepts and try to explain them to the user. \n - You have to ask the user for the required parameters. If the user refuses to provide in 2 attempts, politely exit the conversation. \n - You have to provide your supported parameters to the user, if the user refuses to accept them then you have to put an apology note and exit the conversation. \n - You have to track the conversation about unasked questions by the user. If some/one of the questions remain then you have to remind the user about these questions and proceed to answer them based on the user's confirmation \n \n 4. **Implementation:** \n - Your code/implementations should be reliable, scaleable, modular, and reusable. \n - You will be providing unit tests for the implementation upon user request. \n - You will be following MVC architecture for the applications \n - Your implementations must be well-commented and readable \n \n \n- Today's date is 23rd August 2024. \n- The default sender email is sender-assistant@email.com.\nHi, I am conducting research on retail customer feedback systems and I need assistance with designing and implementing them. Could you kindly provide me with a list of general customer feedback system modules?"
789
+ messages = [
790
+ {"role": "user", "content": custom_system_prompt + "\n\n" + prompt}
791
+ ]
792
+ # example tools
793
+ tools = [{"type": "function", "function": {"name": "getRetailFeedbackModules", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"page": {"type": "integer", "description": "The current page number.", "default": 1}, "page_size": {"type": "integer", "description": "The number of items per page.", "default": 3}}}}}, {"type": "function", "function": {"name": "verifyImplementation", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"coding_language": {"type": "string", "description": "The supported languages for verification of implementation.", "default": "python", "enum": ["python", "java", "php"]}, "code": {"type": "string", "description": "The code which needs verification"}, "design_pattern": {"type": "string", "description": "The design pattern to verify in the implementation", "enum": ["factory", "strategy", "singleton"]}, "verify_best_practices": {"type": "boolean", "description": "The verification of the coding style based on the language selected", "default": true}}}}}]
794
+ text = tokenizer.apply_chat_template(
795
+ messages,
796
+ tokenize=False,
797
+ add_generation_prompt=True,
798
+ tools=tools
799
+ )
800
+ model_inputs = tokenizer([text], return_tensors="pt")
801
+ ```
802
+
803
+ ---
804
+
805
+ ## Running with vLLM
806
+
807
+ As the upstream PR is not yet merged, you can use this custom image as an alternate way to run the model with tool and reasoning parsers enabled.
808
+
809
+ ### Docker Image
810
+
811
+ ```
812
+ docker.io/amant555/vllm_apriel:latest
813
+ ```
814
+
815
+ ### Start Command
816
+
817
+ ```bash
818
+ python3 -m vllm.entrypoints.openai.api_server \
819
+ --model ServiceNow-AI/Apriel-1.6-15b-Thinker \
820
+ --served-model-name Apriel-1p6-15B-Thinker \
821
+ --trust_remote_code \
822
+ --max-model-len 131072 \
823
+ --enable-auto-tool-choice \
824
+ --tool-call-parser apriel \
825
+ --reasoning-parser apriel
826
+ ```
827
+
828
+ ---
829
+
830
+
831
+ # Training Details
832
+
833
+ **Training stack:** [Fast-LLM](https://github.com/ServiceNow/Fast-LLM), [VERL](https://github.com/volcengine/verl)
834
+
835
+ **Continual Pre-training:** Billions of tokens covering math, code, science, logical reasoning, and multimodal image-text data
836
+
837
+ **SFT:** 2.4M samples spanning math, code, instruction-following, function calling, and conversation, followed by an incremental lightweight multimodal SFT.
838
+
839
+ **RL:** Multi-stage RL with verifiable rewards and [GSPO](https://arxiv.org/abs/2507.18071) on text and vision tasks. Our RL stage optimizes reasoning efficiency: using fewer tokens by discouraging unnecessary intermediate steps, stopping earlier when confident, and giving direct answers on simple queries.
840
+
841
+ For more details on our training methodology, see our [blog post](https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-15b-thinker).
842
+
843
+ ---
844
+
845
+ # Limitations
846
+
847
+ - **Factual accuracy:** May produce incorrect, misleading, or outdated content. Outputs should be verified before use in critical contexts.
848
+ - **Bias:** May reflect societal, cultural, or systemic biases present in training data.
849
+ - **Ethics:** Do not use the model to produce harmful, unlawful, or unethical content.
850
+ - **Language:** Strongest performance is in English. Output quality may degrade in underrepresented languages.
851
+ - **Critical use:** Not suitable for medical, legal, financial, or other high-risk applications without safeguards.
852
+
853
+ ---
854
+
855
+ # Security and Responsible Use
856
+
857
+ **Security Responsibilities:**
858
+ Deployers and users are strongly encouraged to align their security practices with established frameworks and regulatory guidelines such as the EU AI Act and the NIST AI Risk Management Framework (RMF).
859
+
860
+ <details>
861
+ <summary>Guidelines for Deployers</summary>
862
+
863
+ - Regularly conduct robustness assessments to identify and mitigate adversarial inputs.
864
+ - Implement validation and filtering processes to prevent harmful or biased outputs.
865
+ - Continuously perform data privacy checks to guard against unintended data leaks.
866
+ - Document and communicate the model's limitations, intended usage, and known security risks to all end-users.
867
+ - Schedule periodic security reviews and updates to address emerging threats and vulnerabilities.
868
+
869
+ </details>
870
+
871
+ <details>
872
+ <summary>Guidelines for Users</summary>
873
+
874
+ - Follow established security policies and usage guidelines provided by deployers.
875
+ - Protect and manage sensitive information when interacting with the model.
876
+ - Report anomalies, suspicious behavior, or unsafe outputs to deployers or developers.
877
+ - Maintain human oversight and apply judgment to mitigate potential security or ethical risks during interactions.
878
+
879
+ </details>
880
+
881
+ **Disclaimer:**
882
+ Users accept responsibility for securely deploying, managing, and using this open-source LLM. The model is provided "as-is," without explicit or implied warranty regarding security or fitness for any specific application or environment.
883
+
884
+ ---
885
+
886
+ # License
887
+
888
+ MIT
889
+
890
+ ---
891
+
892
+ # Citation
893
+
894
+ ```bibtex
895
+ @misc{radhakrishna2025apriel1515bthinker,
896
+ title={Apriel-1.5-15b-Thinker},
897
+ author={Shruthan Radhakrishna and Aman Tiwari and Aanjaneya Shukla and Masoud Hashemi and Rishabh Maheshwary and Shiva Krishna Reddy Malay and Jash Mehta and Pulkit Pattnaik and Saloni Mittal and Khalil Slimi and Kelechi Ogueji and Akintunde Oladipo and Soham Parikh and Oluwanifemi Bamgbose and Toby Liang and Ahmed Masry and Khyati Mahajan and Sai Rajeswar Mudumba and Vikas Yadav and Sathwik Tejaswi Madhusudhan and Torsten Scholak and Sagar Davasam and Srinivas Sunkara and Nicholas Chapados},
898
+ year={2025},
899
+ eprint={2510.01141},
900
+ archivePrefix={arXiv},
901
+ primaryClass={cs.AI},
902
+ url={https://arxiv.org/abs/2510.01141},
903
+ }
904
+ ```
chat_template.jinja ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {# ---------------------------------------------------------------------- #}
2
+ {# ƛƬ Default setup and flags #}
3
+ {# ---------------------------------------------------------------------- #}
4
+ {%- set messages = messages or [] -%}
5
+ {%- set tools = tools or [] -%}
6
+ {%- set add_generation_prompt = add_generation_prompt or false -%}
7
+ {%- set available_tool_string, add_tool_id = '', true -%}
8
+ {%- set add_thoughts = false -%} {# whether to include <thinking> reasoning blocks #}
9
+ {%- set add_generation_prompt = true -%} {# whether to emit reasoning starter before assistant response #}
10
+ {# Optional token placeholders (safe defaults) #}
11
+ {%- set bos_token = bos_token if (bos_token is defined) else '' -%}
12
+ {%- set eos_token = eos_token if (eos_token is defined) else '' -%}
13
+ {# ---------------------------------------------------------------------- #}
14
+ {# Core reasoning prompt and assistant reasoning prefix #}
15
+ {# ---------------------------------------------------------------------- #}
16
+ {%- set reasoning_prompt =
17
+ 'You are a thoughtful, systematic AI assistant from ServiceNow Language Models (SLAM) lab. '
18
+ 'Analyze each question carefully, present your reasoning step-by-step, then provide the final '
19
+ 'response after the marker [BEGIN FINAL RESPONSE].'
20
+ -%}
21
+ {%- set reasoning_asst_turn_start = 'Here are my reasoning steps:\n' -%}
22
+ {# ---------------------------------------------------------------------- #}
23
+ {# Tool list and tool call output format #}
24
+ {# ---------------------------------------------------------------------- #}
25
+ {%- if tools is not none and tools|length > 0 -%}
26
+ {%- set available_tool_string -%}
27
+ You are provided with function signatures within <available_tools></available_tools> XML tags.
28
+ You may call one or more functions to assist with the user query.
29
+ Don't make assumptions about the arguments. You should infer the argument values from previous
30
+ user responses and the system message.
31
+ Here are the available tools:
32
+ <available_tools>
33
+ {% for tool in tools %}{{ tool|string }}{% endfor %}
34
+
35
+ </available_tools>.
36
+
37
+ Return all function calls as a list of JSON objects within <tool_calls></tool_calls> XML tags.
38
+ Each JSON object should contain a function name and arguments as follows:
39
+ <tool_calls>[
40
+ {"name": <function-name-1>, "arguments": <args-dict-1>},
41
+ {"name": <function-name-2>, "arguments": <args-dict-2>},
42
+ ...
43
+ ]</tool_calls>
44
+ {%- endset -%}
45
+ {%- endif -%}
46
+ {# ---------------------------------------------------------------------- #}
47
+ {# Start system block if first message is not system #}
48
+ {# ---------------------------------------------------------------------- #}
49
+ {%- if messages|length > 0 and messages[0]['role'] != 'system' -%}
50
+ {%- if tools is not none and tools|length > 0 -%}
51
+ {{ bos_token + '<|begin_system|>\n' + reasoning_prompt + '\n' + available_tool_string + '\n' }}
52
+ {%- else -%}
53
+ {{ bos_token + '<|begin_system|>\n' + reasoning_prompt + '\n' }}
54
+ {%- endif -%}
55
+ {%- endif -%}
56
+ {# ---------------------------------------------------------------------- #}
57
+ {# Iterate through messages #}
58
+ {# ---------------------------------------------------------------------- #}
59
+ {%- for message in messages -%}
60
+
61
+ {# ---------------- USER MESSAGE ---------------- #}
62
+ {%- if message['role'] == 'user' -%}
63
+ {{ '<|begin_user|>\n' }}
64
+ {%- if message['content'] is not string -%}
65
+ {%- for chunk in message['content'] -%}
66
+ {%- if chunk['type'] == 'text' -%}
67
+ {{ chunk['text'] }}
68
+ {%- elif chunk['type'] in ['image', 'image_url'] -%}
69
+ {{ '[IMG]' }}
70
+ {%- else -%}
71
+ {{ raise_exception('Unrecognized content type!') }}
72
+ {%- endif -%}
73
+ {%- endfor -%}
74
+ {%- else -%}
75
+ {{ message['content'] }}
76
+ {%- endif -%}
77
+
78
+ {# ---------------- SYSTEM MESSAGE ---------------- #}
79
+ {%- elif message['role'] == 'system' -%}
80
+ {%- if message['content'] is not none and message['content']|length > 0 -%}
81
+ {%- if message['content'] is string -%}
82
+ {%- set system_message = message['content'] -%}
83
+ {%- else -%}
84
+ {%- set system_message = message['content'][0]['text'] -%}
85
+ {%- endif -%}
86
+ {%- else -%}
87
+ {%- set system_message = '' -%}
88
+ {%- endif -%}
89
+
90
+ {%- if tools is not none and tools|length > 0 -%}
91
+ {{ bos_token + '<|begin_system|>\n' + reasoning_prompt + '\n' + system_message + '\n' + available_tool_string + '\n' }}
92
+ {%- else -%}
93
+ {{ bos_token + '<|begin_system|>\n' + reasoning_prompt + '\n' + system_message + '\n' }}
94
+ {%- endif -%}
95
+
96
+ {# ---------------- ASSISTANT MESSAGE ---------------- #}
97
+ {%- elif message['role'] == 'assistant' -%}
98
+ {%- if loop.last -%}
99
+ {%- set add_tool_id = false -%}
100
+ {%- endif -%}
101
+
102
+ {{ '\n<|begin_assistant|>\n' }}
103
+
104
+ {%- if add_thoughts and 'thought' in message and message['thought'] is not none -%}
105
+ <thinking>{{ message['thought'] }}</thinking>
106
+ {%- endif -%}
107
+
108
+ {%- if message['content'] is not none and message['content']|length > 0 -%}
109
+ {%- if message['content'] is not string -%}
110
+ {{ message['content'][0]['text'] }}
111
+ {%- else -%}
112
+ {{ message['content'] }}
113
+ {%- endif -%}
114
+ {%- elif message['chosen'] is not none and message['chosen']|length > 0 -%}
115
+ {{ message['chosen'][0] }}
116
+ {%- endif -%}
117
+
118
+ {# Tool call output #}
119
+ {%- if message['tool_calls'] is not none and message['tool_calls']|length > 0 -%}
120
+ {{ '\n<tool_calls>[' }}
121
+ {%- for tool_call in message['tool_calls'] -%}
122
+ {{ '{"name": "' + tool_call['function']['name'] + '", "arguments": ' + tool_call['function']['arguments']|string }}
123
+ {%- if add_tool_id == true and 'id' in tool_call -%}
124
+ {{ ', "id": "' + tool_call['id'] + '"' }}
125
+ {%- endif -%}
126
+ {{ '}' }}
127
+ {%- if not loop.last -%}{{ ', ' }}{%- endif -%}
128
+ {%- endfor -%}
129
+ {{ ']</tool_calls>' }}
130
+ {%- endif -%}
131
+
132
+ {%- if not loop.last or training_prompt -%}
133
+ {{ '\n<|end|>\n' }}
134
+ {%- endif -%}
135
+
136
+ {# ---------------- TOOL RESULT MESSAGE ---------------- #}
137
+ {%- elif message['role'] == 'tool' -%}
138
+ {%- if message['content'] is string -%}
139
+ {%- set tool_message = message['content'] -%}
140
+ {%- else -%}
141
+ {%- set tool_message = message['content'][0]['text'] -%}
142
+ {%- endif -%}
143
+ {{ '<|begin_tool_result|>\n' + tool_message|string + '\n' }}
144
+
145
+ {# ---------------- CONTENT MESSAGE ---------------- #}
146
+ {%- elif message['role'] == 'content' -%}
147
+ {%- if message['content'] is not string -%}
148
+ {{ '<|begin_content|>\n' + message['content'][0]['text'] + '\n' }}
149
+ {%- else -%}
150
+ {{ '<|begin_content|>\n' + message['content'] + '\n' }}
151
+ {%- endif -%}
152
+ {%- endif -%}
153
+
154
+ {# ---------------- REASONING PROMPT BEFORE NEXT ASSISTANT ---------------- #}
155
+ {%- if loop.last and add_generation_prompt and message['role'] != 'assistant' -%}
156
+ {{ '\n<|begin_assistant|>\n' + reasoning_asst_turn_start }}
157
+ {%- endif -%}
158
+
159
+ {%- endfor -%}
config.json ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlavaForConditionalGeneration"
4
+ ],
5
+ "ignore_index": -100,
6
+ "image_seq_length": 1,
7
+ "image_token_index": 10,
8
+ "model_type": "llava",
9
+ "multimodal_projector_bias": true,
10
+ "projector_hidden_act": "gelu",
11
+ "text_config": {
12
+ "_attn_implementation_autoset": false,
13
+ "_name_or_path": "",
14
+ "add_cross_attention": false,
15
+ "architectures": null,
16
+ "attention_dropout": 0.0,
17
+ "bad_words_ids": null,
18
+ "begin_suppress_tokens": null,
19
+ "bos_token_id": 1,
20
+ "chunk_size_feed_forward": 0,
21
+ "cross_attention_hidden_size": null,
22
+ "decoder_start_token_id": null,
23
+ "diversity_penalty": 0.0,
24
+ "do_sample": false,
25
+ "early_stopping": false,
26
+ "encoder_no_repeat_ngram_size": 0,
27
+ "eos_token_id": 2,
28
+ "exponential_decay_length_penalty": null,
29
+ "finetuning_task": null,
30
+ "forced_bos_token_id": null,
31
+ "forced_eos_token_id": null,
32
+ "head_dim": 128,
33
+ "hidden_act": "silu",
34
+ "hidden_size": 5120,
35
+ "id2label": {
36
+ "0": "LABEL_0",
37
+ "1": "LABEL_1"
38
+ },
39
+ "initializer_range": 0.02,
40
+ "intermediate_size": 14336,
41
+ "is_decoder": false,
42
+ "is_encoder_decoder": false,
43
+ "label2id": {
44
+ "LABEL_0": 0,
45
+ "LABEL_1": 1
46
+ },
47
+ "length_penalty": 1.0,
48
+ "max_length": 20,
49
+ "max_position_embeddings": 262400,
50
+ "min_length": 0,
51
+ "model_type": "mistral",
52
+ "no_repeat_ngram_size": 0,
53
+ "num_attention_heads": 32,
54
+ "num_beam_groups": 1,
55
+ "num_beams": 1,
56
+ "num_hidden_layers": 48,
57
+ "num_key_value_heads": 8,
58
+ "num_return_sequences": 1,
59
+ "output_attentions": false,
60
+ "output_hidden_states": false,
61
+ "output_scores": false,
62
+ "pad_token_id": null,
63
+ "prefix": null,
64
+ "problem_type": null,
65
+ "pruned_heads": {},
66
+ "remove_invalid_values": false,
67
+ "repetition_penalty": 1.0,
68
+ "return_dict": true,
69
+ "return_dict_in_generate": false,
70
+ "rms_norm_eps": 1e-05,
71
+ "rope_theta": 1000000000.0,
72
+ "sep_token_id": null,
73
+ "sliding_window": null,
74
+ "suppress_tokens": null,
75
+ "task_specific_params": null,
76
+ "temperature": 1.0,
77
+ "tf_legacy_loss": false,
78
+ "tie_encoder_decoder": false,
79
+ "tie_word_embeddings": false,
80
+ "tokenizer_class": null,
81
+ "top_k": 50,
82
+ "top_p": 1.0,
83
+ "torch_dtype": "bfloat16",
84
+ "torchscript": false,
85
+ "typical_p": 1.0,
86
+ "use_bfloat16": false,
87
+ "use_cache": true,
88
+ "vocab_size": 131072
89
+ },
90
+ "torch_dtype": "bfloat16",
91
+ "transformers_version": "4.49.0",
92
+ "vision_config": {
93
+ "_attn_implementation_autoset": false,
94
+ "_name_or_path": "",
95
+ "add_cross_attention": false,
96
+ "architectures": null,
97
+ "attention_dropout": 0.0,
98
+ "bad_words_ids": null,
99
+ "begin_suppress_tokens": null,
100
+ "bos_token_id": null,
101
+ "chunk_size_feed_forward": 0,
102
+ "cross_attention_hidden_size": null,
103
+ "decoder_start_token_id": null,
104
+ "diversity_penalty": 0.0,
105
+ "do_sample": false,
106
+ "early_stopping": false,
107
+ "encoder_no_repeat_ngram_size": 0,
108
+ "eos_token_id": null,
109
+ "exponential_decay_length_penalty": null,
110
+ "finetuning_task": null,
111
+ "forced_bos_token_id": null,
112
+ "forced_eos_token_id": null,
113
+ "head_dim": 64,
114
+ "hidden_act": "silu",
115
+ "hidden_size": 1024,
116
+ "id2label": {
117
+ "0": "LABEL_0",
118
+ "1": "LABEL_1"
119
+ },
120
+ "image_size": 1024,
121
+ "initializer_range": 0.02,
122
+ "intermediate_size": 4096,
123
+ "is_decoder": false,
124
+ "is_encoder_decoder": false,
125
+ "label2id": {
126
+ "LABEL_0": 0,
127
+ "LABEL_1": 1
128
+ },
129
+ "length_penalty": 1.0,
130
+ "max_length": 20,
131
+ "min_length": 0,
132
+ "model_type": "pixtral",
133
+ "no_repeat_ngram_size": 0,
134
+ "num_attention_heads": 16,
135
+ "num_beam_groups": 1,
136
+ "num_beams": 1,
137
+ "num_channels": 3,
138
+ "num_hidden_layers": 24,
139
+ "num_return_sequences": 1,
140
+ "output_attentions": false,
141
+ "output_hidden_states": false,
142
+ "output_scores": false,
143
+ "pad_token_id": null,
144
+ "patch_size": 16,
145
+ "prefix": null,
146
+ "problem_type": null,
147
+ "pruned_heads": {},
148
+ "remove_invalid_values": false,
149
+ "repetition_penalty": 1.0,
150
+ "return_dict": true,
151
+ "return_dict_in_generate": false,
152
+ "rope_theta": 10000.0,
153
+ "sep_token_id": null,
154
+ "suppress_tokens": null,
155
+ "task_specific_params": null,
156
+ "temperature": 1.0,
157
+ "tf_legacy_loss": false,
158
+ "tie_encoder_decoder": false,
159
+ "tie_word_embeddings": true,
160
+ "tokenizer_class": null,
161
+ "top_k": 50,
162
+ "top_p": 1.0,
163
+ "torch_dtype": "bfloat16",
164
+ "torchscript": false,
165
+ "typical_p": 1.0,
166
+ "use_bfloat16": false
167
+ },
168
+ "vision_feature_layer": -1,
169
+ "vision_feature_select_strategy": "full"
170
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.49.0"
6
+ }
model-00001-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9149ddf3576f6158dc820e02d845fa5cd62799daf1cff11bb222f1c52f43885
3
+ size 4990957080
model-00002-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d2ff720f335f220ede02d90b2dc26dacd6da47cead08211eb9273addea13c7e6
3
+ size 4959959696
model-00003-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5ff378334665cd4d603972dce1fc1859f3c92b2e30aa266f42bc5032b3e168b
3
+ size 4907530672
model-00004-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11edd8e8be8c42bb0ff2f0ee40f418b172f3d15e3f451a7e586a6d0afe7aedcb
3
+ size 4907530672
model-00005-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93bd3526a124d28a027d89ac344750d77e199a7f1a5ace95a26fff78fa25725a
3
+ size 4907530672
model-00006-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68c219bb05eaf34145f53f3dd173f9981afab86f14c46c8e028f23f7f7d8e1aa
3
+ size 3712120512
model-00007-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c324e3a00f0ad9864e42dd972ac746e86654275aa71ad57164ce42bbd1d24e07
3
+ size 1342177424
model.safetensors.index.json ADDED
@@ -0,0 +1,664 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 29727719424
4
+ },
5
+ "weight_map": {
6
+ "language_model.lm_head.weight": "model-00007-of-00007.safetensors",
7
+ "language_model.model.embed_tokens.weight": "model-00001-of-00007.safetensors",
8
+ "language_model.model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
9
+ "language_model.model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
10
+ "language_model.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
11
+ "language_model.model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
12
+ "language_model.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
13
+ "language_model.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
14
+ "language_model.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
15
+ "language_model.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
16
+ "language_model.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
17
+ "language_model.model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
18
+ "language_model.model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
19
+ "language_model.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
20
+ "language_model.model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
21
+ "language_model.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
22
+ "language_model.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
23
+ "language_model.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
24
+ "language_model.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
25
+ "language_model.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
26
+ "language_model.model.layers.10.input_layernorm.weight": "model-00002-of-00007.safetensors",
27
+ "language_model.model.layers.10.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
28
+ "language_model.model.layers.10.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
29
+ "language_model.model.layers.10.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
30
+ "language_model.model.layers.10.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
31
+ "language_model.model.layers.10.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
32
+ "language_model.model.layers.10.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
33
+ "language_model.model.layers.10.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
34
+ "language_model.model.layers.10.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
35
+ "language_model.model.layers.11.input_layernorm.weight": "model-00002-of-00007.safetensors",
36
+ "language_model.model.layers.11.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
37
+ "language_model.model.layers.11.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
38
+ "language_model.model.layers.11.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
39
+ "language_model.model.layers.11.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
40
+ "language_model.model.layers.11.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
41
+ "language_model.model.layers.11.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
42
+ "language_model.model.layers.11.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
43
+ "language_model.model.layers.11.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
44
+ "language_model.model.layers.12.input_layernorm.weight": "model-00002-of-00007.safetensors",
45
+ "language_model.model.layers.12.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
46
+ "language_model.model.layers.12.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
47
+ "language_model.model.layers.12.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
48
+ "language_model.model.layers.12.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
49
+ "language_model.model.layers.12.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
50
+ "language_model.model.layers.12.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
51
+ "language_model.model.layers.12.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
52
+ "language_model.model.layers.12.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
53
+ "language_model.model.layers.13.input_layernorm.weight": "model-00002-of-00007.safetensors",
54
+ "language_model.model.layers.13.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
55
+ "language_model.model.layers.13.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
56
+ "language_model.model.layers.13.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
57
+ "language_model.model.layers.13.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
58
+ "language_model.model.layers.13.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
59
+ "language_model.model.layers.13.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
60
+ "language_model.model.layers.13.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
61
+ "language_model.model.layers.13.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
62
+ "language_model.model.layers.14.input_layernorm.weight": "model-00003-of-00007.safetensors",
63
+ "language_model.model.layers.14.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
64
+ "language_model.model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
65
+ "language_model.model.layers.14.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
66
+ "language_model.model.layers.14.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
67
+ "language_model.model.layers.14.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
68
+ "language_model.model.layers.14.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
69
+ "language_model.model.layers.14.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
70
+ "language_model.model.layers.14.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
71
+ "language_model.model.layers.15.input_layernorm.weight": "model-00003-of-00007.safetensors",
72
+ "language_model.model.layers.15.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
73
+ "language_model.model.layers.15.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
74
+ "language_model.model.layers.15.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
75
+ "language_model.model.layers.15.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
76
+ "language_model.model.layers.15.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
77
+ "language_model.model.layers.15.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
78
+ "language_model.model.layers.15.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
79
+ "language_model.model.layers.15.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
80
+ "language_model.model.layers.16.input_layernorm.weight": "model-00003-of-00007.safetensors",
81
+ "language_model.model.layers.16.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
82
+ "language_model.model.layers.16.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
83
+ "language_model.model.layers.16.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
84
+ "language_model.model.layers.16.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
85
+ "language_model.model.layers.16.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
86
+ "language_model.model.layers.16.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
87
+ "language_model.model.layers.16.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
88
+ "language_model.model.layers.16.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
89
+ "language_model.model.layers.17.input_layernorm.weight": "model-00003-of-00007.safetensors",
90
+ "language_model.model.layers.17.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
91
+ "language_model.model.layers.17.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
92
+ "language_model.model.layers.17.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
93
+ "language_model.model.layers.17.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
94
+ "language_model.model.layers.17.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
95
+ "language_model.model.layers.17.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
96
+ "language_model.model.layers.17.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
97
+ "language_model.model.layers.17.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
98
+ "language_model.model.layers.18.input_layernorm.weight": "model-00003-of-00007.safetensors",
99
+ "language_model.model.layers.18.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
100
+ "language_model.model.layers.18.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
101
+ "language_model.model.layers.18.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
102
+ "language_model.model.layers.18.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
103
+ "language_model.model.layers.18.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
104
+ "language_model.model.layers.18.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
105
+ "language_model.model.layers.18.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
106
+ "language_model.model.layers.18.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
107
+ "language_model.model.layers.19.input_layernorm.weight": "model-00003-of-00007.safetensors",
108
+ "language_model.model.layers.19.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
109
+ "language_model.model.layers.19.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
110
+ "language_model.model.layers.19.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
111
+ "language_model.model.layers.19.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
112
+ "language_model.model.layers.19.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
113
+ "language_model.model.layers.19.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
114
+ "language_model.model.layers.19.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
115
+ "language_model.model.layers.19.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
116
+ "language_model.model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
117
+ "language_model.model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
118
+ "language_model.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
119
+ "language_model.model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
120
+ "language_model.model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
121
+ "language_model.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
122
+ "language_model.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
123
+ "language_model.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
124
+ "language_model.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
125
+ "language_model.model.layers.20.input_layernorm.weight": "model-00003-of-00007.safetensors",
126
+ "language_model.model.layers.20.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
127
+ "language_model.model.layers.20.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
128
+ "language_model.model.layers.20.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
129
+ "language_model.model.layers.20.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
130
+ "language_model.model.layers.20.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
131
+ "language_model.model.layers.20.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
132
+ "language_model.model.layers.20.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
133
+ "language_model.model.layers.20.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
134
+ "language_model.model.layers.21.input_layernorm.weight": "model-00003-of-00007.safetensors",
135
+ "language_model.model.layers.21.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
136
+ "language_model.model.layers.21.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
137
+ "language_model.model.layers.21.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
138
+ "language_model.model.layers.21.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
139
+ "language_model.model.layers.21.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
140
+ "language_model.model.layers.21.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
141
+ "language_model.model.layers.21.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
142
+ "language_model.model.layers.21.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
143
+ "language_model.model.layers.22.input_layernorm.weight": "model-00003-of-00007.safetensors",
144
+ "language_model.model.layers.22.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
145
+ "language_model.model.layers.22.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
146
+ "language_model.model.layers.22.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
147
+ "language_model.model.layers.22.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
148
+ "language_model.model.layers.22.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
149
+ "language_model.model.layers.22.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
150
+ "language_model.model.layers.22.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
151
+ "language_model.model.layers.22.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
152
+ "language_model.model.layers.23.input_layernorm.weight": "model-00004-of-00007.safetensors",
153
+ "language_model.model.layers.23.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
154
+ "language_model.model.layers.23.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
155
+ "language_model.model.layers.23.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
156
+ "language_model.model.layers.23.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
157
+ "language_model.model.layers.23.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
158
+ "language_model.model.layers.23.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
159
+ "language_model.model.layers.23.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
160
+ "language_model.model.layers.23.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
161
+ "language_model.model.layers.24.input_layernorm.weight": "model-00004-of-00007.safetensors",
162
+ "language_model.model.layers.24.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
163
+ "language_model.model.layers.24.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
164
+ "language_model.model.layers.24.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
165
+ "language_model.model.layers.24.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
166
+ "language_model.model.layers.24.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
167
+ "language_model.model.layers.24.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
168
+ "language_model.model.layers.24.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
169
+ "language_model.model.layers.24.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
170
+ "language_model.model.layers.25.input_layernorm.weight": "model-00004-of-00007.safetensors",
171
+ "language_model.model.layers.25.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
172
+ "language_model.model.layers.25.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
173
+ "language_model.model.layers.25.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
174
+ "language_model.model.layers.25.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
175
+ "language_model.model.layers.25.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
176
+ "language_model.model.layers.25.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
177
+ "language_model.model.layers.25.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
178
+ "language_model.model.layers.25.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
179
+ "language_model.model.layers.26.input_layernorm.weight": "model-00004-of-00007.safetensors",
180
+ "language_model.model.layers.26.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
181
+ "language_model.model.layers.26.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
182
+ "language_model.model.layers.26.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
183
+ "language_model.model.layers.26.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
184
+ "language_model.model.layers.26.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
185
+ "language_model.model.layers.26.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
186
+ "language_model.model.layers.26.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
187
+ "language_model.model.layers.26.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
188
+ "language_model.model.layers.27.input_layernorm.weight": "model-00004-of-00007.safetensors",
189
+ "language_model.model.layers.27.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
190
+ "language_model.model.layers.27.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
191
+ "language_model.model.layers.27.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
192
+ "language_model.model.layers.27.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
193
+ "language_model.model.layers.27.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
194
+ "language_model.model.layers.27.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
195
+ "language_model.model.layers.27.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
196
+ "language_model.model.layers.27.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
197
+ "language_model.model.layers.28.input_layernorm.weight": "model-00004-of-00007.safetensors",
198
+ "language_model.model.layers.28.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
199
+ "language_model.model.layers.28.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
200
+ "language_model.model.layers.28.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
201
+ "language_model.model.layers.28.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
202
+ "language_model.model.layers.28.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
203
+ "language_model.model.layers.28.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
204
+ "language_model.model.layers.28.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
205
+ "language_model.model.layers.28.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
206
+ "language_model.model.layers.29.input_layernorm.weight": "model-00004-of-00007.safetensors",
207
+ "language_model.model.layers.29.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
208
+ "language_model.model.layers.29.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
209
+ "language_model.model.layers.29.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
210
+ "language_model.model.layers.29.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
211
+ "language_model.model.layers.29.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
212
+ "language_model.model.layers.29.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
213
+ "language_model.model.layers.29.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
214
+ "language_model.model.layers.29.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
215
+ "language_model.model.layers.3.input_layernorm.weight": "model-00001-of-00007.safetensors",
216
+ "language_model.model.layers.3.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
217
+ "language_model.model.layers.3.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
218
+ "language_model.model.layers.3.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
219
+ "language_model.model.layers.3.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
220
+ "language_model.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
221
+ "language_model.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
222
+ "language_model.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
223
+ "language_model.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
224
+ "language_model.model.layers.30.input_layernorm.weight": "model-00004-of-00007.safetensors",
225
+ "language_model.model.layers.30.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
226
+ "language_model.model.layers.30.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
227
+ "language_model.model.layers.30.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
228
+ "language_model.model.layers.30.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
229
+ "language_model.model.layers.30.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
230
+ "language_model.model.layers.30.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
231
+ "language_model.model.layers.30.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
232
+ "language_model.model.layers.30.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
233
+ "language_model.model.layers.31.input_layernorm.weight": "model-00004-of-00007.safetensors",
234
+ "language_model.model.layers.31.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
235
+ "language_model.model.layers.31.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
236
+ "language_model.model.layers.31.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
237
+ "language_model.model.layers.31.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
238
+ "language_model.model.layers.31.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
239
+ "language_model.model.layers.31.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
240
+ "language_model.model.layers.31.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
241
+ "language_model.model.layers.31.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
242
+ "language_model.model.layers.32.input_layernorm.weight": "model-00005-of-00007.safetensors",
243
+ "language_model.model.layers.32.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
244
+ "language_model.model.layers.32.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
245
+ "language_model.model.layers.32.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
246
+ "language_model.model.layers.32.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
247
+ "language_model.model.layers.32.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
248
+ "language_model.model.layers.32.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
249
+ "language_model.model.layers.32.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
250
+ "language_model.model.layers.32.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
251
+ "language_model.model.layers.33.input_layernorm.weight": "model-00005-of-00007.safetensors",
252
+ "language_model.model.layers.33.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
253
+ "language_model.model.layers.33.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
254
+ "language_model.model.layers.33.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
255
+ "language_model.model.layers.33.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
256
+ "language_model.model.layers.33.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
257
+ "language_model.model.layers.33.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
258
+ "language_model.model.layers.33.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
259
+ "language_model.model.layers.33.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
260
+ "language_model.model.layers.34.input_layernorm.weight": "model-00005-of-00007.safetensors",
261
+ "language_model.model.layers.34.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
262
+ "language_model.model.layers.34.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
263
+ "language_model.model.layers.34.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
264
+ "language_model.model.layers.34.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
265
+ "language_model.model.layers.34.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
266
+ "language_model.model.layers.34.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
267
+ "language_model.model.layers.34.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
268
+ "language_model.model.layers.34.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
269
+ "language_model.model.layers.35.input_layernorm.weight": "model-00005-of-00007.safetensors",
270
+ "language_model.model.layers.35.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
271
+ "language_model.model.layers.35.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
272
+ "language_model.model.layers.35.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
273
+ "language_model.model.layers.35.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
274
+ "language_model.model.layers.35.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
275
+ "language_model.model.layers.35.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
276
+ "language_model.model.layers.35.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
277
+ "language_model.model.layers.35.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
278
+ "language_model.model.layers.36.input_layernorm.weight": "model-00005-of-00007.safetensors",
279
+ "language_model.model.layers.36.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
280
+ "language_model.model.layers.36.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
281
+ "language_model.model.layers.36.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
282
+ "language_model.model.layers.36.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
283
+ "language_model.model.layers.36.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
284
+ "language_model.model.layers.36.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
285
+ "language_model.model.layers.36.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
286
+ "language_model.model.layers.36.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
287
+ "language_model.model.layers.37.input_layernorm.weight": "model-00005-of-00007.safetensors",
288
+ "language_model.model.layers.37.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
289
+ "language_model.model.layers.37.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
290
+ "language_model.model.layers.37.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
291
+ "language_model.model.layers.37.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
292
+ "language_model.model.layers.37.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
293
+ "language_model.model.layers.37.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
294
+ "language_model.model.layers.37.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
295
+ "language_model.model.layers.37.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
296
+ "language_model.model.layers.38.input_layernorm.weight": "model-00005-of-00007.safetensors",
297
+ "language_model.model.layers.38.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
298
+ "language_model.model.layers.38.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
299
+ "language_model.model.layers.38.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
300
+ "language_model.model.layers.38.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
301
+ "language_model.model.layers.38.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
302
+ "language_model.model.layers.38.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
303
+ "language_model.model.layers.38.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
304
+ "language_model.model.layers.38.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
305
+ "language_model.model.layers.39.input_layernorm.weight": "model-00005-of-00007.safetensors",
306
+ "language_model.model.layers.39.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
307
+ "language_model.model.layers.39.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
308
+ "language_model.model.layers.39.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
309
+ "language_model.model.layers.39.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
310
+ "language_model.model.layers.39.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
311
+ "language_model.model.layers.39.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
312
+ "language_model.model.layers.39.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
313
+ "language_model.model.layers.39.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
314
+ "language_model.model.layers.4.input_layernorm.weight": "model-00001-of-00007.safetensors",
315
+ "language_model.model.layers.4.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
316
+ "language_model.model.layers.4.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
317
+ "language_model.model.layers.4.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
318
+ "language_model.model.layers.4.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
319
+ "language_model.model.layers.4.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
320
+ "language_model.model.layers.4.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
321
+ "language_model.model.layers.4.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
322
+ "language_model.model.layers.4.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
323
+ "language_model.model.layers.40.input_layernorm.weight": "model-00005-of-00007.safetensors",
324
+ "language_model.model.layers.40.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
325
+ "language_model.model.layers.40.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
326
+ "language_model.model.layers.40.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
327
+ "language_model.model.layers.40.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
328
+ "language_model.model.layers.40.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
329
+ "language_model.model.layers.40.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
330
+ "language_model.model.layers.40.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
331
+ "language_model.model.layers.40.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
332
+ "language_model.model.layers.41.input_layernorm.weight": "model-00006-of-00007.safetensors",
333
+ "language_model.model.layers.41.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
334
+ "language_model.model.layers.41.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
335
+ "language_model.model.layers.41.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
336
+ "language_model.model.layers.41.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
337
+ "language_model.model.layers.41.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
338
+ "language_model.model.layers.41.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
339
+ "language_model.model.layers.41.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
340
+ "language_model.model.layers.41.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
341
+ "language_model.model.layers.42.input_layernorm.weight": "model-00006-of-00007.safetensors",
342
+ "language_model.model.layers.42.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
343
+ "language_model.model.layers.42.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
344
+ "language_model.model.layers.42.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
345
+ "language_model.model.layers.42.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
346
+ "language_model.model.layers.42.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
347
+ "language_model.model.layers.42.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
348
+ "language_model.model.layers.42.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
349
+ "language_model.model.layers.42.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
350
+ "language_model.model.layers.43.input_layernorm.weight": "model-00006-of-00007.safetensors",
351
+ "language_model.model.layers.43.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
352
+ "language_model.model.layers.43.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
353
+ "language_model.model.layers.43.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
354
+ "language_model.model.layers.43.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
355
+ "language_model.model.layers.43.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
356
+ "language_model.model.layers.43.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
357
+ "language_model.model.layers.43.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
358
+ "language_model.model.layers.43.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
359
+ "language_model.model.layers.44.input_layernorm.weight": "model-00006-of-00007.safetensors",
360
+ "language_model.model.layers.44.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
361
+ "language_model.model.layers.44.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
362
+ "language_model.model.layers.44.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
363
+ "language_model.model.layers.44.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
364
+ "language_model.model.layers.44.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
365
+ "language_model.model.layers.44.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
366
+ "language_model.model.layers.44.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
367
+ "language_model.model.layers.44.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
368
+ "language_model.model.layers.45.input_layernorm.weight": "model-00006-of-00007.safetensors",
369
+ "language_model.model.layers.45.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
370
+ "language_model.model.layers.45.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
371
+ "language_model.model.layers.45.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
372
+ "language_model.model.layers.45.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
373
+ "language_model.model.layers.45.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
374
+ "language_model.model.layers.45.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
375
+ "language_model.model.layers.45.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
376
+ "language_model.model.layers.45.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
377
+ "language_model.model.layers.46.input_layernorm.weight": "model-00006-of-00007.safetensors",
378
+ "language_model.model.layers.46.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
379
+ "language_model.model.layers.46.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
380
+ "language_model.model.layers.46.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
381
+ "language_model.model.layers.46.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
382
+ "language_model.model.layers.46.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
383
+ "language_model.model.layers.46.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
384
+ "language_model.model.layers.46.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
385
+ "language_model.model.layers.46.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
386
+ "language_model.model.layers.47.input_layernorm.weight": "model-00006-of-00007.safetensors",
387
+ "language_model.model.layers.47.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
388
+ "language_model.model.layers.47.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
389
+ "language_model.model.layers.47.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
390
+ "language_model.model.layers.47.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
391
+ "language_model.model.layers.47.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
392
+ "language_model.model.layers.47.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
393
+ "language_model.model.layers.47.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
394
+ "language_model.model.layers.47.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
395
+ "language_model.model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
396
+ "language_model.model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
397
+ "language_model.model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
398
+ "language_model.model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
399
+ "language_model.model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
400
+ "language_model.model.layers.5.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
401
+ "language_model.model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
402
+ "language_model.model.layers.5.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
403
+ "language_model.model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
404
+ "language_model.model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
405
+ "language_model.model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
406
+ "language_model.model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
407
+ "language_model.model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
408
+ "language_model.model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
409
+ "language_model.model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
410
+ "language_model.model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
411
+ "language_model.model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
412
+ "language_model.model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
413
+ "language_model.model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
414
+ "language_model.model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
415
+ "language_model.model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
416
+ "language_model.model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
417
+ "language_model.model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
418
+ "language_model.model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
419
+ "language_model.model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
420
+ "language_model.model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
421
+ "language_model.model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
422
+ "language_model.model.layers.8.input_layernorm.weight": "model-00002-of-00007.safetensors",
423
+ "language_model.model.layers.8.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
424
+ "language_model.model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
425
+ "language_model.model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
426
+ "language_model.model.layers.8.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
427
+ "language_model.model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
428
+ "language_model.model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
429
+ "language_model.model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
430
+ "language_model.model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
431
+ "language_model.model.layers.9.input_layernorm.weight": "model-00002-of-00007.safetensors",
432
+ "language_model.model.layers.9.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
433
+ "language_model.model.layers.9.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
434
+ "language_model.model.layers.9.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
435
+ "language_model.model.layers.9.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
436
+ "language_model.model.layers.9.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
437
+ "language_model.model.layers.9.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
438
+ "language_model.model.layers.9.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
439
+ "language_model.model.layers.9.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
440
+ "language_model.model.norm.weight": "model-00006-of-00007.safetensors",
441
+ "multi_modal_projector.linear_1.bias": "model-00001-of-00007.safetensors",
442
+ "multi_modal_projector.linear_1.weight": "model-00001-of-00007.safetensors",
443
+ "multi_modal_projector.linear_2.bias": "model-00001-of-00007.safetensors",
444
+ "multi_modal_projector.linear_2.weight": "model-00001-of-00007.safetensors",
445
+ "vision_tower.ln_pre.weight": "model-00001-of-00007.safetensors",
446
+ "vision_tower.patch_conv.weight": "model-00001-of-00007.safetensors",
447
+ "vision_tower.transformer.layers.0.attention.k_proj.weight": "model-00001-of-00007.safetensors",
448
+ "vision_tower.transformer.layers.0.attention.o_proj.weight": "model-00001-of-00007.safetensors",
449
+ "vision_tower.transformer.layers.0.attention.q_proj.weight": "model-00001-of-00007.safetensors",
450
+ "vision_tower.transformer.layers.0.attention.v_proj.weight": "model-00001-of-00007.safetensors",
451
+ "vision_tower.transformer.layers.0.attention_norm.weight": "model-00001-of-00007.safetensors",
452
+ "vision_tower.transformer.layers.0.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
453
+ "vision_tower.transformer.layers.0.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
454
+ "vision_tower.transformer.layers.0.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
455
+ "vision_tower.transformer.layers.0.ffn_norm.weight": "model-00001-of-00007.safetensors",
456
+ "vision_tower.transformer.layers.1.attention.k_proj.weight": "model-00001-of-00007.safetensors",
457
+ "vision_tower.transformer.layers.1.attention.o_proj.weight": "model-00001-of-00007.safetensors",
458
+ "vision_tower.transformer.layers.1.attention.q_proj.weight": "model-00001-of-00007.safetensors",
459
+ "vision_tower.transformer.layers.1.attention.v_proj.weight": "model-00001-of-00007.safetensors",
460
+ "vision_tower.transformer.layers.1.attention_norm.weight": "model-00001-of-00007.safetensors",
461
+ "vision_tower.transformer.layers.1.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
462
+ "vision_tower.transformer.layers.1.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
463
+ "vision_tower.transformer.layers.1.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
464
+ "vision_tower.transformer.layers.1.ffn_norm.weight": "model-00001-of-00007.safetensors",
465
+ "vision_tower.transformer.layers.10.attention.k_proj.weight": "model-00001-of-00007.safetensors",
466
+ "vision_tower.transformer.layers.10.attention.o_proj.weight": "model-00001-of-00007.safetensors",
467
+ "vision_tower.transformer.layers.10.attention.q_proj.weight": "model-00001-of-00007.safetensors",
468
+ "vision_tower.transformer.layers.10.attention.v_proj.weight": "model-00001-of-00007.safetensors",
469
+ "vision_tower.transformer.layers.10.attention_norm.weight": "model-00001-of-00007.safetensors",
470
+ "vision_tower.transformer.layers.10.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
471
+ "vision_tower.transformer.layers.10.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
472
+ "vision_tower.transformer.layers.10.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
473
+ "vision_tower.transformer.layers.10.ffn_norm.weight": "model-00001-of-00007.safetensors",
474
+ "vision_tower.transformer.layers.11.attention.k_proj.weight": "model-00001-of-00007.safetensors",
475
+ "vision_tower.transformer.layers.11.attention.o_proj.weight": "model-00001-of-00007.safetensors",
476
+ "vision_tower.transformer.layers.11.attention.q_proj.weight": "model-00001-of-00007.safetensors",
477
+ "vision_tower.transformer.layers.11.attention.v_proj.weight": "model-00001-of-00007.safetensors",
478
+ "vision_tower.transformer.layers.11.attention_norm.weight": "model-00001-of-00007.safetensors",
479
+ "vision_tower.transformer.layers.11.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
480
+ "vision_tower.transformer.layers.11.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
481
+ "vision_tower.transformer.layers.11.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
482
+ "vision_tower.transformer.layers.11.ffn_norm.weight": "model-00001-of-00007.safetensors",
483
+ "vision_tower.transformer.layers.12.attention.k_proj.weight": "model-00001-of-00007.safetensors",
484
+ "vision_tower.transformer.layers.12.attention.o_proj.weight": "model-00001-of-00007.safetensors",
485
+ "vision_tower.transformer.layers.12.attention.q_proj.weight": "model-00001-of-00007.safetensors",
486
+ "vision_tower.transformer.layers.12.attention.v_proj.weight": "model-00001-of-00007.safetensors",
487
+ "vision_tower.transformer.layers.12.attention_norm.weight": "model-00001-of-00007.safetensors",
488
+ "vision_tower.transformer.layers.12.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
489
+ "vision_tower.transformer.layers.12.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
490
+ "vision_tower.transformer.layers.12.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
491
+ "vision_tower.transformer.layers.12.ffn_norm.weight": "model-00001-of-00007.safetensors",
492
+ "vision_tower.transformer.layers.13.attention.k_proj.weight": "model-00001-of-00007.safetensors",
493
+ "vision_tower.transformer.layers.13.attention.o_proj.weight": "model-00001-of-00007.safetensors",
494
+ "vision_tower.transformer.layers.13.attention.q_proj.weight": "model-00001-of-00007.safetensors",
495
+ "vision_tower.transformer.layers.13.attention.v_proj.weight": "model-00001-of-00007.safetensors",
496
+ "vision_tower.transformer.layers.13.attention_norm.weight": "model-00001-of-00007.safetensors",
497
+ "vision_tower.transformer.layers.13.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
498
+ "vision_tower.transformer.layers.13.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
499
+ "vision_tower.transformer.layers.13.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
500
+ "vision_tower.transformer.layers.13.ffn_norm.weight": "model-00001-of-00007.safetensors",
501
+ "vision_tower.transformer.layers.14.attention.k_proj.weight": "model-00001-of-00007.safetensors",
502
+ "vision_tower.transformer.layers.14.attention.o_proj.weight": "model-00001-of-00007.safetensors",
503
+ "vision_tower.transformer.layers.14.attention.q_proj.weight": "model-00001-of-00007.safetensors",
504
+ "vision_tower.transformer.layers.14.attention.v_proj.weight": "model-00001-of-00007.safetensors",
505
+ "vision_tower.transformer.layers.14.attention_norm.weight": "model-00001-of-00007.safetensors",
506
+ "vision_tower.transformer.layers.14.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
507
+ "vision_tower.transformer.layers.14.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
508
+ "vision_tower.transformer.layers.14.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
509
+ "vision_tower.transformer.layers.14.ffn_norm.weight": "model-00001-of-00007.safetensors",
510
+ "vision_tower.transformer.layers.15.attention.k_proj.weight": "model-00001-of-00007.safetensors",
511
+ "vision_tower.transformer.layers.15.attention.o_proj.weight": "model-00001-of-00007.safetensors",
512
+ "vision_tower.transformer.layers.15.attention.q_proj.weight": "model-00001-of-00007.safetensors",
513
+ "vision_tower.transformer.layers.15.attention.v_proj.weight": "model-00001-of-00007.safetensors",
514
+ "vision_tower.transformer.layers.15.attention_norm.weight": "model-00001-of-00007.safetensors",
515
+ "vision_tower.transformer.layers.15.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
516
+ "vision_tower.transformer.layers.15.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
517
+ "vision_tower.transformer.layers.15.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
518
+ "vision_tower.transformer.layers.15.ffn_norm.weight": "model-00001-of-00007.safetensors",
519
+ "vision_tower.transformer.layers.16.attention.k_proj.weight": "model-00001-of-00007.safetensors",
520
+ "vision_tower.transformer.layers.16.attention.o_proj.weight": "model-00001-of-00007.safetensors",
521
+ "vision_tower.transformer.layers.16.attention.q_proj.weight": "model-00001-of-00007.safetensors",
522
+ "vision_tower.transformer.layers.16.attention.v_proj.weight": "model-00001-of-00007.safetensors",
523
+ "vision_tower.transformer.layers.16.attention_norm.weight": "model-00001-of-00007.safetensors",
524
+ "vision_tower.transformer.layers.16.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
525
+ "vision_tower.transformer.layers.16.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
526
+ "vision_tower.transformer.layers.16.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
527
+ "vision_tower.transformer.layers.16.ffn_norm.weight": "model-00001-of-00007.safetensors",
528
+ "vision_tower.transformer.layers.17.attention.k_proj.weight": "model-00001-of-00007.safetensors",
529
+ "vision_tower.transformer.layers.17.attention.o_proj.weight": "model-00001-of-00007.safetensors",
530
+ "vision_tower.transformer.layers.17.attention.q_proj.weight": "model-00001-of-00007.safetensors",
531
+ "vision_tower.transformer.layers.17.attention.v_proj.weight": "model-00001-of-00007.safetensors",
532
+ "vision_tower.transformer.layers.17.attention_norm.weight": "model-00001-of-00007.safetensors",
533
+ "vision_tower.transformer.layers.17.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
534
+ "vision_tower.transformer.layers.17.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
535
+ "vision_tower.transformer.layers.17.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
536
+ "vision_tower.transformer.layers.17.ffn_norm.weight": "model-00001-of-00007.safetensors",
537
+ "vision_tower.transformer.layers.18.attention.k_proj.weight": "model-00001-of-00007.safetensors",
538
+ "vision_tower.transformer.layers.18.attention.o_proj.weight": "model-00001-of-00007.safetensors",
539
+ "vision_tower.transformer.layers.18.attention.q_proj.weight": "model-00001-of-00007.safetensors",
540
+ "vision_tower.transformer.layers.18.attention.v_proj.weight": "model-00001-of-00007.safetensors",
541
+ "vision_tower.transformer.layers.18.attention_norm.weight": "model-00001-of-00007.safetensors",
542
+ "vision_tower.transformer.layers.18.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
543
+ "vision_tower.transformer.layers.18.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
544
+ "vision_tower.transformer.layers.18.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
545
+ "vision_tower.transformer.layers.18.ffn_norm.weight": "model-00001-of-00007.safetensors",
546
+ "vision_tower.transformer.layers.19.attention.k_proj.weight": "model-00001-of-00007.safetensors",
547
+ "vision_tower.transformer.layers.19.attention.o_proj.weight": "model-00001-of-00007.safetensors",
548
+ "vision_tower.transformer.layers.19.attention.q_proj.weight": "model-00001-of-00007.safetensors",
549
+ "vision_tower.transformer.layers.19.attention.v_proj.weight": "model-00001-of-00007.safetensors",
550
+ "vision_tower.transformer.layers.19.attention_norm.weight": "model-00001-of-00007.safetensors",
551
+ "vision_tower.transformer.layers.19.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
552
+ "vision_tower.transformer.layers.19.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
553
+ "vision_tower.transformer.layers.19.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
554
+ "vision_tower.transformer.layers.19.ffn_norm.weight": "model-00001-of-00007.safetensors",
555
+ "vision_tower.transformer.layers.2.attention.k_proj.weight": "model-00001-of-00007.safetensors",
556
+ "vision_tower.transformer.layers.2.attention.o_proj.weight": "model-00001-of-00007.safetensors",
557
+ "vision_tower.transformer.layers.2.attention.q_proj.weight": "model-00001-of-00007.safetensors",
558
+ "vision_tower.transformer.layers.2.attention.v_proj.weight": "model-00001-of-00007.safetensors",
559
+ "vision_tower.transformer.layers.2.attention_norm.weight": "model-00001-of-00007.safetensors",
560
+ "vision_tower.transformer.layers.2.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
561
+ "vision_tower.transformer.layers.2.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
562
+ "vision_tower.transformer.layers.2.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
563
+ "vision_tower.transformer.layers.2.ffn_norm.weight": "model-00001-of-00007.safetensors",
564
+ "vision_tower.transformer.layers.20.attention.k_proj.weight": "model-00001-of-00007.safetensors",
565
+ "vision_tower.transformer.layers.20.attention.o_proj.weight": "model-00001-of-00007.safetensors",
566
+ "vision_tower.transformer.layers.20.attention.q_proj.weight": "model-00001-of-00007.safetensors",
567
+ "vision_tower.transformer.layers.20.attention.v_proj.weight": "model-00001-of-00007.safetensors",
568
+ "vision_tower.transformer.layers.20.attention_norm.weight": "model-00001-of-00007.safetensors",
569
+ "vision_tower.transformer.layers.20.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
570
+ "vision_tower.transformer.layers.20.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
571
+ "vision_tower.transformer.layers.20.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
572
+ "vision_tower.transformer.layers.20.ffn_norm.weight": "model-00001-of-00007.safetensors",
573
+ "vision_tower.transformer.layers.21.attention.k_proj.weight": "model-00001-of-00007.safetensors",
574
+ "vision_tower.transformer.layers.21.attention.o_proj.weight": "model-00001-of-00007.safetensors",
575
+ "vision_tower.transformer.layers.21.attention.q_proj.weight": "model-00001-of-00007.safetensors",
576
+ "vision_tower.transformer.layers.21.attention.v_proj.weight": "model-00001-of-00007.safetensors",
577
+ "vision_tower.transformer.layers.21.attention_norm.weight": "model-00001-of-00007.safetensors",
578
+ "vision_tower.transformer.layers.21.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
579
+ "vision_tower.transformer.layers.21.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
580
+ "vision_tower.transformer.layers.21.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
581
+ "vision_tower.transformer.layers.21.ffn_norm.weight": "model-00001-of-00007.safetensors",
582
+ "vision_tower.transformer.layers.22.attention.k_proj.weight": "model-00001-of-00007.safetensors",
583
+ "vision_tower.transformer.layers.22.attention.o_proj.weight": "model-00001-of-00007.safetensors",
584
+ "vision_tower.transformer.layers.22.attention.q_proj.weight": "model-00001-of-00007.safetensors",
585
+ "vision_tower.transformer.layers.22.attention.v_proj.weight": "model-00001-of-00007.safetensors",
586
+ "vision_tower.transformer.layers.22.attention_norm.weight": "model-00001-of-00007.safetensors",
587
+ "vision_tower.transformer.layers.22.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
588
+ "vision_tower.transformer.layers.22.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
589
+ "vision_tower.transformer.layers.22.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
590
+ "vision_tower.transformer.layers.22.ffn_norm.weight": "model-00001-of-00007.safetensors",
591
+ "vision_tower.transformer.layers.23.attention.k_proj.weight": "model-00001-of-00007.safetensors",
592
+ "vision_tower.transformer.layers.23.attention.o_proj.weight": "model-00001-of-00007.safetensors",
593
+ "vision_tower.transformer.layers.23.attention.q_proj.weight": "model-00001-of-00007.safetensors",
594
+ "vision_tower.transformer.layers.23.attention.v_proj.weight": "model-00001-of-00007.safetensors",
595
+ "vision_tower.transformer.layers.23.attention_norm.weight": "model-00001-of-00007.safetensors",
596
+ "vision_tower.transformer.layers.23.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
597
+ "vision_tower.transformer.layers.23.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
598
+ "vision_tower.transformer.layers.23.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
599
+ "vision_tower.transformer.layers.23.ffn_norm.weight": "model-00001-of-00007.safetensors",
600
+ "vision_tower.transformer.layers.3.attention.k_proj.weight": "model-00001-of-00007.safetensors",
601
+ "vision_tower.transformer.layers.3.attention.o_proj.weight": "model-00001-of-00007.safetensors",
602
+ "vision_tower.transformer.layers.3.attention.q_proj.weight": "model-00001-of-00007.safetensors",
603
+ "vision_tower.transformer.layers.3.attention.v_proj.weight": "model-00001-of-00007.safetensors",
604
+ "vision_tower.transformer.layers.3.attention_norm.weight": "model-00001-of-00007.safetensors",
605
+ "vision_tower.transformer.layers.3.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
606
+ "vision_tower.transformer.layers.3.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
607
+ "vision_tower.transformer.layers.3.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
608
+ "vision_tower.transformer.layers.3.ffn_norm.weight": "model-00001-of-00007.safetensors",
609
+ "vision_tower.transformer.layers.4.attention.k_proj.weight": "model-00001-of-00007.safetensors",
610
+ "vision_tower.transformer.layers.4.attention.o_proj.weight": "model-00001-of-00007.safetensors",
611
+ "vision_tower.transformer.layers.4.attention.q_proj.weight": "model-00001-of-00007.safetensors",
612
+ "vision_tower.transformer.layers.4.attention.v_proj.weight": "model-00001-of-00007.safetensors",
613
+ "vision_tower.transformer.layers.4.attention_norm.weight": "model-00001-of-00007.safetensors",
614
+ "vision_tower.transformer.layers.4.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
615
+ "vision_tower.transformer.layers.4.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
616
+ "vision_tower.transformer.layers.4.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
617
+ "vision_tower.transformer.layers.4.ffn_norm.weight": "model-00001-of-00007.safetensors",
618
+ "vision_tower.transformer.layers.5.attention.k_proj.weight": "model-00001-of-00007.safetensors",
619
+ "vision_tower.transformer.layers.5.attention.o_proj.weight": "model-00001-of-00007.safetensors",
620
+ "vision_tower.transformer.layers.5.attention.q_proj.weight": "model-00001-of-00007.safetensors",
621
+ "vision_tower.transformer.layers.5.attention.v_proj.weight": "model-00001-of-00007.safetensors",
622
+ "vision_tower.transformer.layers.5.attention_norm.weight": "model-00001-of-00007.safetensors",
623
+ "vision_tower.transformer.layers.5.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
624
+ "vision_tower.transformer.layers.5.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
625
+ "vision_tower.transformer.layers.5.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
626
+ "vision_tower.transformer.layers.5.ffn_norm.weight": "model-00001-of-00007.safetensors",
627
+ "vision_tower.transformer.layers.6.attention.k_proj.weight": "model-00001-of-00007.safetensors",
628
+ "vision_tower.transformer.layers.6.attention.o_proj.weight": "model-00001-of-00007.safetensors",
629
+ "vision_tower.transformer.layers.6.attention.q_proj.weight": "model-00001-of-00007.safetensors",
630
+ "vision_tower.transformer.layers.6.attention.v_proj.weight": "model-00001-of-00007.safetensors",
631
+ "vision_tower.transformer.layers.6.attention_norm.weight": "model-00001-of-00007.safetensors",
632
+ "vision_tower.transformer.layers.6.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
633
+ "vision_tower.transformer.layers.6.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
634
+ "vision_tower.transformer.layers.6.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
635
+ "vision_tower.transformer.layers.6.ffn_norm.weight": "model-00001-of-00007.safetensors",
636
+ "vision_tower.transformer.layers.7.attention.k_proj.weight": "model-00001-of-00007.safetensors",
637
+ "vision_tower.transformer.layers.7.attention.o_proj.weight": "model-00001-of-00007.safetensors",
638
+ "vision_tower.transformer.layers.7.attention.q_proj.weight": "model-00001-of-00007.safetensors",
639
+ "vision_tower.transformer.layers.7.attention.v_proj.weight": "model-00001-of-00007.safetensors",
640
+ "vision_tower.transformer.layers.7.attention_norm.weight": "model-00001-of-00007.safetensors",
641
+ "vision_tower.transformer.layers.7.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
642
+ "vision_tower.transformer.layers.7.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
643
+ "vision_tower.transformer.layers.7.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
644
+ "vision_tower.transformer.layers.7.ffn_norm.weight": "model-00001-of-00007.safetensors",
645
+ "vision_tower.transformer.layers.8.attention.k_proj.weight": "model-00001-of-00007.safetensors",
646
+ "vision_tower.transformer.layers.8.attention.o_proj.weight": "model-00001-of-00007.safetensors",
647
+ "vision_tower.transformer.layers.8.attention.q_proj.weight": "model-00001-of-00007.safetensors",
648
+ "vision_tower.transformer.layers.8.attention.v_proj.weight": "model-00001-of-00007.safetensors",
649
+ "vision_tower.transformer.layers.8.attention_norm.weight": "model-00001-of-00007.safetensors",
650
+ "vision_tower.transformer.layers.8.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
651
+ "vision_tower.transformer.layers.8.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
652
+ "vision_tower.transformer.layers.8.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
653
+ "vision_tower.transformer.layers.8.ffn_norm.weight": "model-00001-of-00007.safetensors",
654
+ "vision_tower.transformer.layers.9.attention.k_proj.weight": "model-00001-of-00007.safetensors",
655
+ "vision_tower.transformer.layers.9.attention.o_proj.weight": "model-00001-of-00007.safetensors",
656
+ "vision_tower.transformer.layers.9.attention.q_proj.weight": "model-00001-of-00007.safetensors",
657
+ "vision_tower.transformer.layers.9.attention.v_proj.weight": "model-00001-of-00007.safetensors",
658
+ "vision_tower.transformer.layers.9.attention_norm.weight": "model-00001-of-00007.safetensors",
659
+ "vision_tower.transformer.layers.9.feed_forward.down_proj.weight": "model-00001-of-00007.safetensors",
660
+ "vision_tower.transformer.layers.9.feed_forward.gate_proj.weight": "model-00001-of-00007.safetensors",
661
+ "vision_tower.transformer.layers.9.feed_forward.up_proj.weight": "model-00001-of-00007.safetensors",
662
+ "vision_tower.transformer.layers.9.ffn_norm.weight": "model-00001-of-00007.safetensors"
663
+ }
664
+ }
preprocessor_config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": true,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.48145466,
8
+ 0.4578275,
9
+ 0.40821073
10
+ ],
11
+ "image_processor_type": "PixtralImageProcessor",
12
+ "image_std": [
13
+ 0.26862954,
14
+ 0.26130258,
15
+ 0.27577711
16
+ ],
17
+ "patch_size": {
18
+ "height": 16,
19
+ "width": 16
20
+ },
21
+ "processor_class": "PixtralProcessor",
22
+ "resample": 3,
23
+ "rescale_factor": 0.00392156862745098,
24
+ "size": {
25
+ "longest_edge": 1024
26
+ }
27
+ }
processor_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "image_break_token": "[IMG_BREAK]",
3
+ "image_end_token": "[IMG_END]",
4
+ "image_token": "[IMG]",
5
+ "patch_size": 16,
6
+ "processor_class": "PixtralProcessor"
7
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<pad>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:50c4196bd3d61abf4a6f9d116435140b8ac0606e15eb4d02e235f9036257dc3e
3
+ size 17077327
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff