IParraMartin commited on
Commit
78a5251
·
verified ·
1 Parent(s): b2780bd

End of training

Browse files
Files changed (3) hide show
  1. README.md +438 -0
  2. generation_config.json +6 -0
  3. model.safetensors +1 -1
README.md ADDED
@@ -0,0 +1,438 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - generated_from_trainer
5
+ model-index:
6
+ - name: impossible-llms-spanish-random
7
+ results: []
8
+ ---
9
+
10
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
+ should probably proofread and complete it, then remove this comment. -->
12
+
13
+ # impossible-llms-spanish-random
14
+
15
+ This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
16
+ It achieves the following results on the evaluation set:
17
+ - Loss: 8.1724
18
+
19
+ ## Model description
20
+
21
+ More information needed
22
+
23
+ ## Intended uses & limitations
24
+
25
+ More information needed
26
+
27
+ ## Training and evaluation data
28
+
29
+ More information needed
30
+
31
+ ## Training procedure
32
+
33
+ ### Training hyperparameters
34
+
35
+ The following hyperparameters were used during training:
36
+ - learning_rate: 0.0001
37
+ - train_batch_size: 12
38
+ - eval_batch_size: 8
39
+ - seed: 0
40
+ - distributed_type: multi-GPU
41
+ - num_devices: 4
42
+ - gradient_accumulation_steps: 8
43
+ - total_train_batch_size: 384
44
+ - total_eval_batch_size: 32
45
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
46
+ - lr_scheduler_type: cosine
47
+ - lr_scheduler_warmup_ratio: 0.1
48
+ - training_steps: 3000
49
+ - mixed_precision_training: Native AMP
50
+ - label_smoothing_factor: 0.1
51
+
52
+ ### Training results
53
+
54
+ | Training Loss | Epoch | Step | Validation Loss |
55
+ |:-------------:|:-----:|:----:|:---------------:|
56
+ | 82.972 | 1.0 | 8 | 10.1437 |
57
+ | 75.8665 | 2.0 | 16 | 9.4079 |
58
+ | 72.9421 | 3.0 | 24 | 9.0817 |
59
+ | 71.5898 | 4.0 | 32 | 8.9488 |
60
+ | 70.6476 | 5.0 | 40 | 8.8065 |
61
+ | 68.8115 | 6.0 | 48 | 8.6534 |
62
+ | 67.8914 | 7.0 | 56 | 8.4742 |
63
+ | 66.0709 | 8.0 | 64 | 8.2686 |
64
+ | 64.3698 | 9.0 | 72 | 8.0523 |
65
+ | 62.7776 | 10.0 | 80 | 7.8382 |
66
+ | 60.7833 | 11.0 | 88 | 7.6234 |
67
+ | 59.2516 | 12.0 | 96 | 7.4120 |
68
+ | 57.4926 | 13.0 | 104 | 7.2052 |
69
+ | 55.9101 | 14.0 | 112 | 7.0040 |
70
+ | 54.4078 | 15.0 | 120 | 6.8264 |
71
+ | 53.194 | 16.0 | 128 | 6.6852 |
72
+ | 52.4471 | 17.0 | 136 | 6.5875 |
73
+ | 51.8722 | 18.0 | 144 | 6.5158 |
74
+ | 51.3478 | 19.0 | 152 | 6.4705 |
75
+ | 50.9916 | 20.0 | 160 | 6.4311 |
76
+ | 50.7082 | 21.0 | 168 | 6.3975 |
77
+ | 50.6366 | 22.0 | 176 | 6.3702 |
78
+ | 50.343 | 23.0 | 184 | 6.3389 |
79
+ | 49.9145 | 24.0 | 192 | 6.3057 |
80
+ | 49.838 | 25.0 | 200 | 6.2873 |
81
+ | 49.7563 | 26.0 | 208 | 6.2540 |
82
+ | 49.5158 | 27.0 | 216 | 6.2257 |
83
+ | 49.2432 | 28.0 | 224 | 6.2086 |
84
+ | 48.762 | 29.0 | 232 | 6.1814 |
85
+ | 48.8961 | 30.0 | 240 | 6.1708 |
86
+ | 48.4002 | 31.0 | 248 | 6.1538 |
87
+ | 48.3542 | 32.0 | 256 | 6.1337 |
88
+ | 48.1679 | 33.0 | 264 | 6.1212 |
89
+ | 47.8023 | 34.0 | 272 | 6.1096 |
90
+ | 47.7961 | 35.0 | 280 | 6.0889 |
91
+ | 47.7335 | 36.0 | 288 | 6.0733 |
92
+ | 47.5702 | 37.0 | 296 | 6.0695 |
93
+ | 47.2494 | 38.0 | 304 | 6.0505 |
94
+ | 47.2174 | 39.0 | 312 | 6.0415 |
95
+ | 47.1788 | 40.0 | 320 | 6.0221 |
96
+ | 46.554 | 41.0 | 328 | 6.0133 |
97
+ | 46.7689 | 42.0 | 336 | 5.9990 |
98
+ | 46.4913 | 43.0 | 344 | 5.9892 |
99
+ | 46.1306 | 44.0 | 352 | 5.9711 |
100
+ | 46.1097 | 45.0 | 360 | 5.9614 |
101
+ | 45.7164 | 46.0 | 368 | 5.9520 |
102
+ | 45.5182 | 47.0 | 376 | 5.9496 |
103
+ | 45.4885 | 48.0 | 384 | 5.9320 |
104
+ | 45.1577 | 49.0 | 392 | 5.9272 |
105
+ | 45.0292 | 50.0 | 400 | 5.9191 |
106
+ | 44.8174 | 51.0 | 408 | 5.9093 |
107
+ | 44.6519 | 52.0 | 416 | 5.9019 |
108
+ | 44.6193 | 53.0 | 424 | 5.8935 |
109
+ | 44.3462 | 54.0 | 432 | 5.8878 |
110
+ | 44.0637 | 55.0 | 440 | 5.8897 |
111
+ | 44.1169 | 56.0 | 448 | 5.8865 |
112
+ | 43.9161 | 57.0 | 456 | 5.8805 |
113
+ | 43.7147 | 58.0 | 464 | 5.8861 |
114
+ | 43.7084 | 59.0 | 472 | 5.8782 |
115
+ | 43.2609 | 60.0 | 480 | 5.8831 |
116
+ | 43.2273 | 61.0 | 488 | 5.8801 |
117
+ | 42.7276 | 62.0 | 496 | 5.8861 |
118
+ | 43.1133 | 63.0 | 504 | 5.8880 |
119
+ | 42.7384 | 64.0 | 512 | 5.8816 |
120
+ | 42.5384 | 65.0 | 520 | 5.8900 |
121
+ | 42.2051 | 66.0 | 528 | 5.8990 |
122
+ | 41.9476 | 67.0 | 536 | 5.9014 |
123
+ | 41.9329 | 68.0 | 544 | 5.9072 |
124
+ | 41.7742 | 69.0 | 552 | 5.9140 |
125
+ | 41.463 | 70.0 | 560 | 5.9210 |
126
+ | 41.4958 | 71.0 | 568 | 5.9284 |
127
+ | 41.136 | 72.0 | 576 | 5.9301 |
128
+ | 40.956 | 73.0 | 584 | 5.9540 |
129
+ | 40.8498 | 74.0 | 592 | 5.9539 |
130
+ | 40.5533 | 75.0 | 600 | 5.9616 |
131
+ | 40.2274 | 76.0 | 608 | 5.9715 |
132
+ | 40.3252 | 77.0 | 616 | 5.9883 |
133
+ | 39.8976 | 78.0 | 624 | 6.0010 |
134
+ | 39.8581 | 79.0 | 632 | 6.0068 |
135
+ | 39.8272 | 80.0 | 640 | 6.0155 |
136
+ | 39.4982 | 81.0 | 648 | 6.0321 |
137
+ | 39.5351 | 82.0 | 656 | 6.0410 |
138
+ | 38.9904 | 83.0 | 664 | 6.0553 |
139
+ | 39.1677 | 84.0 | 672 | 6.0767 |
140
+ | 38.8723 | 85.0 | 680 | 6.0767 |
141
+ | 38.5522 | 86.0 | 688 | 6.0952 |
142
+ | 38.4087 | 87.0 | 696 | 6.1166 |
143
+ | 38.1894 | 88.0 | 704 | 6.1213 |
144
+ | 38.0162 | 89.0 | 712 | 6.1408 |
145
+ | 37.7106 | 90.0 | 720 | 6.1528 |
146
+ | 37.5611 | 91.0 | 728 | 6.1629 |
147
+ | 37.5819 | 92.0 | 736 | 6.1787 |
148
+ | 37.2784 | 93.0 | 744 | 6.2005 |
149
+ | 37.1369 | 94.0 | 752 | 6.2101 |
150
+ | 36.9646 | 95.0 | 760 | 6.2348 |
151
+ | 36.5899 | 96.0 | 768 | 6.2418 |
152
+ | 36.4853 | 97.0 | 776 | 6.2503 |
153
+ | 36.2162 | 98.0 | 784 | 6.2787 |
154
+ | 36.0686 | 99.0 | 792 | 6.2848 |
155
+ | 36.0744 | 100.0 | 800 | 6.2993 |
156
+ | 35.6287 | 101.0 | 808 | 6.3291 |
157
+ | 35.7513 | 102.0 | 816 | 6.3400 |
158
+ | 35.4872 | 103.0 | 824 | 6.3426 |
159
+ | 35.3318 | 104.0 | 832 | 6.3629 |
160
+ | 34.9538 | 105.0 | 840 | 6.3858 |
161
+ | 34.691 | 106.0 | 848 | 6.3975 |
162
+ | 34.5705 | 107.0 | 856 | 6.4129 |
163
+ | 34.4758 | 108.0 | 864 | 6.4285 |
164
+ | 34.2078 | 109.0 | 872 | 6.4490 |
165
+ | 34.1076 | 110.0 | 880 | 6.4658 |
166
+ | 34.0043 | 111.0 | 888 | 6.4769 |
167
+ | 34.0075 | 112.0 | 896 | 6.4995 |
168
+ | 33.6314 | 113.0 | 904 | 6.5134 |
169
+ | 33.3817 | 114.0 | 912 | 6.5242 |
170
+ | 33.1647 | 115.0 | 920 | 6.5635 |
171
+ | 33.0683 | 116.0 | 928 | 6.5620 |
172
+ | 32.7977 | 117.0 | 936 | 6.5616 |
173
+ | 32.8607 | 118.0 | 944 | 6.5835 |
174
+ | 32.6613 | 119.0 | 952 | 6.6042 |
175
+ | 32.4106 | 120.0 | 960 | 6.6099 |
176
+ | 32.3315 | 121.0 | 968 | 6.6254 |
177
+ | 32.1132 | 122.0 | 976 | 6.6463 |
178
+ | 31.9501 | 123.0 | 984 | 6.6597 |
179
+ | 31.7851 | 124.0 | 992 | 6.6751 |
180
+ | 31.4313 | 125.0 | 1000 | 6.6976 |
181
+ | 31.5001 | 126.0 | 1008 | 6.7109 |
182
+ | 31.3214 | 127.0 | 1016 | 6.7316 |
183
+ | 31.1682 | 128.0 | 1024 | 6.7486 |
184
+ | 30.9942 | 129.0 | 1032 | 6.7701 |
185
+ | 30.8465 | 130.0 | 1040 | 6.7661 |
186
+ | 30.7709 | 131.0 | 1048 | 6.7948 |
187
+ | 30.398 | 132.0 | 1056 | 6.8041 |
188
+ | 30.3257 | 133.0 | 1064 | 6.8126 |
189
+ | 30.3051 | 134.0 | 1072 | 6.8455 |
190
+ | 30.0202 | 135.0 | 1080 | 6.8592 |
191
+ | 29.9442 | 136.0 | 1088 | 6.8462 |
192
+ | 29.7463 | 137.0 | 1096 | 6.8768 |
193
+ | 29.6297 | 138.0 | 1104 | 6.8716 |
194
+ | 29.4535 | 139.0 | 1112 | 6.9012 |
195
+ | 29.4787 | 140.0 | 1120 | 6.9159 |
196
+ | 29.1604 | 141.0 | 1128 | 6.9202 |
197
+ | 29.0881 | 142.0 | 1136 | 6.9359 |
198
+ | 28.9373 | 143.0 | 1144 | 6.9679 |
199
+ | 28.8773 | 144.0 | 1152 | 6.9782 |
200
+ | 28.7431 | 145.0 | 1160 | 6.9778 |
201
+ | 28.6143 | 146.0 | 1168 | 6.9963 |
202
+ | 28.3066 | 147.0 | 1176 | 7.0068 |
203
+ | 28.2248 | 148.0 | 1184 | 7.0118 |
204
+ | 28.2809 | 149.0 | 1192 | 7.0317 |
205
+ | 28.0549 | 150.0 | 1200 | 7.0388 |
206
+ | 27.9011 | 151.0 | 1208 | 7.0743 |
207
+ | 27.8213 | 152.0 | 1216 | 7.0790 |
208
+ | 27.6611 | 153.0 | 1224 | 7.0833 |
209
+ | 27.4538 | 154.0 | 1232 | 7.1158 |
210
+ | 27.4113 | 155.0 | 1240 | 7.1376 |
211
+ | 27.3292 | 156.0 | 1248 | 7.1283 |
212
+ | 27.1707 | 157.0 | 1256 | 7.1569 |
213
+ | 27.0338 | 158.0 | 1264 | 7.1534 |
214
+ | 26.9888 | 159.0 | 1272 | 7.1748 |
215
+ | 26.8195 | 160.0 | 1280 | 7.1789 |
216
+ | 26.5915 | 161.0 | 1288 | 7.1898 |
217
+ | 26.59 | 162.0 | 1296 | 7.1993 |
218
+ | 26.3942 | 163.0 | 1304 | 7.2154 |
219
+ | 26.4012 | 164.0 | 1312 | 7.2217 |
220
+ | 26.174 | 165.0 | 1320 | 7.2349 |
221
+ | 25.9937 | 166.0 | 1328 | 7.2536 |
222
+ | 26.0928 | 167.0 | 1336 | 7.2645 |
223
+ | 25.9871 | 168.0 | 1344 | 7.2783 |
224
+ | 25.7454 | 169.0 | 1352 | 7.2914 |
225
+ | 25.6648 | 170.0 | 1360 | 7.2922 |
226
+ | 25.5124 | 171.0 | 1368 | 7.3141 |
227
+ | 25.3788 | 172.0 | 1376 | 7.3069 |
228
+ | 25.465 | 173.0 | 1384 | 7.3387 |
229
+ | 25.2841 | 174.0 | 1392 | 7.3320 |
230
+ | 25.0767 | 175.0 | 1400 | 7.3440 |
231
+ | 25.1146 | 176.0 | 1408 | 7.3695 |
232
+ | 25.0539 | 177.0 | 1416 | 7.3708 |
233
+ | 24.9689 | 178.0 | 1424 | 7.3734 |
234
+ | 24.601 | 179.0 | 1432 | 7.4054 |
235
+ | 24.6286 | 180.0 | 1440 | 7.4016 |
236
+ | 24.5223 | 181.0 | 1448 | 7.4134 |
237
+ | 24.5607 | 182.0 | 1456 | 7.4229 |
238
+ | 24.4185 | 183.0 | 1464 | 7.4404 |
239
+ | 24.1607 | 184.0 | 1472 | 7.4465 |
240
+ | 24.1936 | 185.0 | 1480 | 7.4559 |
241
+ | 24.0903 | 186.0 | 1488 | 7.4554 |
242
+ | 24.1294 | 187.0 | 1496 | 7.4836 |
243
+ | 24.0333 | 188.0 | 1504 | 7.4779 |
244
+ | 23.9775 | 189.0 | 1512 | 7.4905 |
245
+ | 23.734 | 190.0 | 1520 | 7.5246 |
246
+ | 23.665 | 191.0 | 1528 | 7.5111 |
247
+ | 23.5242 | 192.0 | 1536 | 7.5182 |
248
+ | 23.4995 | 193.0 | 1544 | 7.5242 |
249
+ | 23.3869 | 194.0 | 1552 | 7.5368 |
250
+ | 23.3194 | 195.0 | 1560 | 7.5561 |
251
+ | 23.2282 | 196.0 | 1568 | 7.5571 |
252
+ | 23.1417 | 197.0 | 1576 | 7.5627 |
253
+ | 23.1608 | 198.0 | 1584 | 7.5738 |
254
+ | 22.9937 | 199.0 | 1592 | 7.5883 |
255
+ | 22.9023 | 200.0 | 1600 | 7.5910 |
256
+ | 22.8301 | 201.0 | 1608 | 7.6062 |
257
+ | 22.839 | 202.0 | 1616 | 7.6199 |
258
+ | 22.6699 | 203.0 | 1624 | 7.6279 |
259
+ | 22.6976 | 204.0 | 1632 | 7.6251 |
260
+ | 22.4869 | 205.0 | 1640 | 7.6344 |
261
+ | 22.5602 | 206.0 | 1648 | 7.6426 |
262
+ | 22.3682 | 207.0 | 1656 | 7.6510 |
263
+ | 22.3643 | 208.0 | 1664 | 7.6597 |
264
+ | 22.4216 | 209.0 | 1672 | 7.6692 |
265
+ | 22.295 | 210.0 | 1680 | 7.6737 |
266
+ | 22.1837 | 211.0 | 1688 | 7.6762 |
267
+ | 21.9896 | 212.0 | 1696 | 7.7008 |
268
+ | 22.0444 | 213.0 | 1704 | 7.6955 |
269
+ | 22.0932 | 214.0 | 1712 | 7.7042 |
270
+ | 22.0176 | 215.0 | 1720 | 7.7077 |
271
+ | 21.7476 | 216.0 | 1728 | 7.7200 |
272
+ | 21.7287 | 217.0 | 1736 | 7.7299 |
273
+ | 21.7611 | 218.0 | 1744 | 7.7431 |
274
+ | 21.6926 | 219.0 | 1752 | 7.7508 |
275
+ | 21.6303 | 220.0 | 1760 | 7.7481 |
276
+ | 21.6038 | 221.0 | 1768 | 7.7627 |
277
+ | 21.4299 | 222.0 | 1776 | 7.7623 |
278
+ | 21.3599 | 223.0 | 1784 | 7.7765 |
279
+ | 21.4066 | 224.0 | 1792 | 7.7826 |
280
+ | 21.1809 | 225.0 | 1800 | 7.7892 |
281
+ | 21.2748 | 226.0 | 1808 | 7.8009 |
282
+ | 21.3174 | 227.0 | 1816 | 7.8090 |
283
+ | 21.0702 | 228.0 | 1824 | 7.8045 |
284
+ | 21.0049 | 229.0 | 1832 | 7.8039 |
285
+ | 20.9373 | 230.0 | 1840 | 7.8277 |
286
+ | 21.0195 | 231.0 | 1848 | 7.8261 |
287
+ | 20.9109 | 232.0 | 1856 | 7.8388 |
288
+ | 20.8391 | 233.0 | 1864 | 7.8440 |
289
+ | 20.8554 | 234.0 | 1872 | 7.8535 |
290
+ | 20.6738 | 235.0 | 1880 | 7.8542 |
291
+ | 20.6079 | 236.0 | 1888 | 7.8567 |
292
+ | 20.6093 | 237.0 | 1896 | 7.8668 |
293
+ | 20.5409 | 238.0 | 1904 | 7.8744 |
294
+ | 20.4727 | 239.0 | 1912 | 7.8772 |
295
+ | 20.4992 | 240.0 | 1920 | 7.8811 |
296
+ | 20.3505 | 241.0 | 1928 | 7.8905 |
297
+ | 20.4625 | 242.0 | 1936 | 7.8894 |
298
+ | 20.3406 | 243.0 | 1944 | 7.8973 |
299
+ | 20.2562 | 244.0 | 1952 | 7.9066 |
300
+ | 20.1959 | 245.0 | 1960 | 7.9082 |
301
+ | 20.1324 | 246.0 | 1968 | 7.9125 |
302
+ | 20.1758 | 247.0 | 1976 | 7.9254 |
303
+ | 20.1901 | 248.0 | 1984 | 7.9210 |
304
+ | 20.0953 | 249.0 | 1992 | 7.9278 |
305
+ | 19.9865 | 250.0 | 2000 | 7.9338 |
306
+ | 19.9955 | 251.0 | 2008 | 7.9386 |
307
+ | 20.0445 | 252.0 | 2016 | 7.9394 |
308
+ | 19.7181 | 253.0 | 2024 | 7.9515 |
309
+ | 19.8769 | 254.0 | 2032 | 7.9557 |
310
+ | 19.7927 | 255.0 | 2040 | 7.9631 |
311
+ | 19.7656 | 256.0 | 2048 | 7.9625 |
312
+ | 19.729 | 257.0 | 2056 | 7.9690 |
313
+ | 19.7746 | 258.0 | 2064 | 7.9742 |
314
+ | 19.7607 | 259.0 | 2072 | 7.9804 |
315
+ | 19.577 | 260.0 | 2080 | 7.9826 |
316
+ | 19.5543 | 261.0 | 2088 | 7.9884 |
317
+ | 19.5187 | 262.0 | 2096 | 7.9923 |
318
+ | 19.5525 | 263.0 | 2104 | 7.9918 |
319
+ | 19.4421 | 264.0 | 2112 | 8.0028 |
320
+ | 19.4744 | 265.0 | 2120 | 7.9992 |
321
+ | 19.4247 | 266.0 | 2128 | 8.0032 |
322
+ | 19.3781 | 267.0 | 2136 | 8.0096 |
323
+ | 19.3096 | 268.0 | 2144 | 8.0175 |
324
+ | 19.3122 | 269.0 | 2152 | 8.0165 |
325
+ | 19.2698 | 270.0 | 2160 | 8.0216 |
326
+ | 19.3156 | 271.0 | 2168 | 8.0266 |
327
+ | 19.218 | 272.0 | 2176 | 8.0304 |
328
+ | 19.1812 | 273.0 | 2184 | 8.0270 |
329
+ | 19.1861 | 274.0 | 2192 | 8.0371 |
330
+ | 19.2505 | 275.0 | 2200 | 8.0363 |
331
+ | 19.0715 | 276.0 | 2208 | 8.0451 |
332
+ | 19.0956 | 277.0 | 2216 | 8.0520 |
333
+ | 19.0811 | 278.0 | 2224 | 8.0517 |
334
+ | 18.9746 | 279.0 | 2232 | 8.0554 |
335
+ | 19.0338 | 280.0 | 2240 | 8.0611 |
336
+ | 18.9882 | 281.0 | 2248 | 8.0619 |
337
+ | 18.894 | 282.0 | 2256 | 8.0615 |
338
+ | 18.8913 | 283.0 | 2264 | 8.0667 |
339
+ | 18.9493 | 284.0 | 2272 | 8.0657 |
340
+ | 18.8434 | 285.0 | 2280 | 8.0686 |
341
+ | 18.8559 | 286.0 | 2288 | 8.0732 |
342
+ | 18.8983 | 287.0 | 2296 | 8.0761 |
343
+ | 18.7152 | 288.0 | 2304 | 8.0779 |
344
+ | 18.7383 | 289.0 | 2312 | 8.0811 |
345
+ | 18.7166 | 290.0 | 2320 | 8.0861 |
346
+ | 18.6856 | 291.0 | 2328 | 8.0899 |
347
+ | 18.7324 | 292.0 | 2336 | 8.0886 |
348
+ | 18.6808 | 293.0 | 2344 | 8.0957 |
349
+ | 18.5322 | 294.0 | 2352 | 8.0955 |
350
+ | 18.6197 | 295.0 | 2360 | 8.0970 |
351
+ | 18.496 | 296.0 | 2368 | 8.1006 |
352
+ | 18.6525 | 297.0 | 2376 | 8.1045 |
353
+ | 18.5264 | 298.0 | 2384 | 8.1058 |
354
+ | 18.5063 | 299.0 | 2392 | 8.1092 |
355
+ | 18.5643 | 300.0 | 2400 | 8.1089 |
356
+ | 18.586 | 301.0 | 2408 | 8.1150 |
357
+ | 18.4556 | 302.0 | 2416 | 8.1101 |
358
+ | 18.4819 | 303.0 | 2424 | 8.1163 |
359
+ | 18.378 | 304.0 | 2432 | 8.1196 |
360
+ | 18.4967 | 305.0 | 2440 | 8.1191 |
361
+ | 18.3321 | 306.0 | 2448 | 8.1202 |
362
+ | 18.4337 | 307.0 | 2456 | 8.1193 |
363
+ | 18.3281 | 308.0 | 2464 | 8.1272 |
364
+ | 18.3006 | 309.0 | 2472 | 8.1311 |
365
+ | 18.2738 | 310.0 | 2480 | 8.1309 |
366
+ | 18.3395 | 311.0 | 2488 | 8.1294 |
367
+ | 18.292 | 312.0 | 2496 | 8.1290 |
368
+ | 18.2963 | 313.0 | 2504 | 8.1356 |
369
+ | 18.2223 | 314.0 | 2512 | 8.1344 |
370
+ | 18.1901 | 315.0 | 2520 | 8.1355 |
371
+ | 18.3431 | 316.0 | 2528 | 8.1415 |
372
+ | 18.2177 | 317.0 | 2536 | 8.1419 |
373
+ | 18.2611 | 318.0 | 2544 | 8.1423 |
374
+ | 18.2588 | 319.0 | 2552 | 8.1423 |
375
+ | 18.1111 | 320.0 | 2560 | 8.1466 |
376
+ | 18.2206 | 321.0 | 2568 | 8.1455 |
377
+ | 18.0979 | 322.0 | 2576 | 8.1510 |
378
+ | 18.0469 | 323.0 | 2584 | 8.1501 |
379
+ | 18.2613 | 324.0 | 2592 | 8.1539 |
380
+ | 18.0968 | 325.0 | 2600 | 8.1486 |
381
+ | 18.1836 | 326.0 | 2608 | 8.1546 |
382
+ | 18.1618 | 327.0 | 2616 | 8.1538 |
383
+ | 18.1316 | 328.0 | 2624 | 8.1556 |
384
+ | 18.0576 | 329.0 | 2632 | 8.1540 |
385
+ | 18.0833 | 330.0 | 2640 | 8.1584 |
386
+ | 18.1602 | 331.0 | 2648 | 8.1636 |
387
+ | 18.1423 | 332.0 | 2656 | 8.1587 |
388
+ | 18.0728 | 333.0 | 2664 | 8.1621 |
389
+ | 18.1527 | 334.0 | 2672 | 8.1604 |
390
+ | 18.0712 | 335.0 | 2680 | 8.1610 |
391
+ | 17.9818 | 336.0 | 2688 | 8.1640 |
392
+ | 18.0128 | 337.0 | 2696 | 8.1609 |
393
+ | 18.1254 | 338.0 | 2704 | 8.1635 |
394
+ | 18.078 | 339.0 | 2712 | 8.1650 |
395
+ | 17.9944 | 340.0 | 2720 | 8.1635 |
396
+ | 18.0741 | 341.0 | 2728 | 8.1650 |
397
+ | 18.0014 | 342.0 | 2736 | 8.1663 |
398
+ | 18.0411 | 343.0 | 2744 | 8.1675 |
399
+ | 17.983 | 344.0 | 2752 | 8.1654 |
400
+ | 17.8898 | 345.0 | 2760 | 8.1692 |
401
+ | 17.9698 | 346.0 | 2768 | 8.1691 |
402
+ | 17.9761 | 347.0 | 2776 | 8.1677 |
403
+ | 18.009 | 348.0 | 2784 | 8.1698 |
404
+ | 17.9711 | 349.0 | 2792 | 8.1693 |
405
+ | 18.0451 | 350.0 | 2800 | 8.1700 |
406
+ | 18.0299 | 351.0 | 2808 | 8.1699 |
407
+ | 18.0121 | 352.0 | 2816 | 8.1702 |
408
+ | 17.957 | 353.0 | 2824 | 8.1698 |
409
+ | 17.9758 | 354.0 | 2832 | 8.1699 |
410
+ | 18.0933 | 355.0 | 2840 | 8.1711 |
411
+ | 18.0287 | 356.0 | 2848 | 8.1707 |
412
+ | 17.9982 | 357.0 | 2856 | 8.1711 |
413
+ | 17.9634 | 358.0 | 2864 | 8.1715 |
414
+ | 18.0232 | 359.0 | 2872 | 8.1713 |
415
+ | 17.9229 | 360.0 | 2880 | 8.1713 |
416
+ | 18.0117 | 361.0 | 2888 | 8.1711 |
417
+ | 18.0183 | 362.0 | 2896 | 8.1708 |
418
+ | 17.9465 | 363.0 | 2904 | 8.1713 |
419
+ | 17.9505 | 364.0 | 2912 | 8.1719 |
420
+ | 17.9601 | 365.0 | 2920 | 8.1718 |
421
+ | 17.9163 | 366.0 | 2928 | 8.1719 |
422
+ | 18.0543 | 367.0 | 2936 | 8.1720 |
423
+ | 17.9808 | 368.0 | 2944 | 8.1722 |
424
+ | 17.9907 | 369.0 | 2952 | 8.1722 |
425
+ | 17.9784 | 370.0 | 2960 | 8.1724 |
426
+ | 17.9297 | 371.0 | 2968 | 8.1724 |
427
+ | 17.9071 | 372.0 | 2976 | 8.1725 |
428
+ | 18.0145 | 373.0 | 2984 | 8.1724 |
429
+ | 17.9581 | 374.0 | 2992 | 8.1724 |
430
+ | 18.0314 | 375.0 | 3000 | 8.1724 |
431
+
432
+
433
+ ### Framework versions
434
+
435
+ - Transformers 4.49.0
436
+ - Pytorch 2.4.0+cu121
437
+ - Datasets 3.4.0
438
+ - Tokenizers 0.21.0
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "transformers_version": "4.49.0"
6
+ }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:32ccb73d84a1dc9868a8e06000df2d4d52bdfe5e59d75b04df19a002a52f92a1
3
  size 503128704
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f447c726c7ece27dbc4e815699da2146ec81eb456e3a40b2b3a98d9c7ac09b1c
3
  size 503128704