ereniko commited on
Commit
58697bb
·
verified ·
1 Parent(s): 8e1f017

Delete checkpoint-500

Browse files
checkpoint-500/README.md DELETED
@@ -1,209 +0,0 @@
1
- ---
2
- base_model: HuggingFaceTB/SmolLM2-135M-Instruct
3
- library_name: peft
4
- pipeline_tag: text-generation
5
- tags:
6
- - base_model:adapter:HuggingFaceTB/SmolLM2-135M-Instruct
7
- - lora
8
- - sft
9
- - transformers
10
- - trl
11
- ---
12
-
13
- # Model Card for Model ID
14
-
15
- <!-- Provide a quick summary of what the model is/does. -->
16
-
17
-
18
-
19
- ## Model Details
20
-
21
- ### Model Description
22
-
23
- <!-- Provide a longer summary of what this model is. -->
24
-
25
-
26
-
27
- - **Developed by:** [More Information Needed]
28
- - **Funded by [optional]:** [More Information Needed]
29
- - **Shared by [optional]:** [More Information Needed]
30
- - **Model type:** [More Information Needed]
31
- - **Language(s) (NLP):** [More Information Needed]
32
- - **License:** [More Information Needed]
33
- - **Finetuned from model [optional]:** [More Information Needed]
34
-
35
- ### Model Sources [optional]
36
-
37
- <!-- Provide the basic links for the model. -->
38
-
39
- - **Repository:** [More Information Needed]
40
- - **Paper [optional]:** [More Information Needed]
41
- - **Demo [optional]:** [More Information Needed]
42
-
43
- ## Uses
44
-
45
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
46
-
47
- ### Direct Use
48
-
49
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
50
-
51
- [More Information Needed]
52
-
53
- ### Downstream Use [optional]
54
-
55
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
56
-
57
- [More Information Needed]
58
-
59
- ### Out-of-Scope Use
60
-
61
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
62
-
63
- [More Information Needed]
64
-
65
- ## Bias, Risks, and Limitations
66
-
67
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
68
-
69
- [More Information Needed]
70
-
71
- ### Recommendations
72
-
73
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
74
-
75
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
76
-
77
- ## How to Get Started with the Model
78
-
79
- Use the code below to get started with the model.
80
-
81
- [More Information Needed]
82
-
83
- ## Training Details
84
-
85
- ### Training Data
86
-
87
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
88
-
89
- [More Information Needed]
90
-
91
- ### Training Procedure
92
-
93
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
94
-
95
- #### Preprocessing [optional]
96
-
97
- [More Information Needed]
98
-
99
-
100
- #### Training Hyperparameters
101
-
102
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
103
-
104
- #### Speeds, Sizes, Times [optional]
105
-
106
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
107
-
108
- [More Information Needed]
109
-
110
- ## Evaluation
111
-
112
- <!-- This section describes the evaluation protocols and provides the results. -->
113
-
114
- ### Testing Data, Factors & Metrics
115
-
116
- #### Testing Data
117
-
118
- <!-- This should link to a Dataset Card if possible. -->
119
-
120
- [More Information Needed]
121
-
122
- #### Factors
123
-
124
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
125
-
126
- [More Information Needed]
127
-
128
- #### Metrics
129
-
130
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
131
-
132
- [More Information Needed]
133
-
134
- ### Results
135
-
136
- [More Information Needed]
137
-
138
- #### Summary
139
-
140
-
141
-
142
- ## Model Examination [optional]
143
-
144
- <!-- Relevant interpretability work for the model goes here -->
145
-
146
- [More Information Needed]
147
-
148
- ## Environmental Impact
149
-
150
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
151
-
152
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
153
-
154
- - **Hardware Type:** [More Information Needed]
155
- - **Hours used:** [More Information Needed]
156
- - **Cloud Provider:** [More Information Needed]
157
- - **Compute Region:** [More Information Needed]
158
- - **Carbon Emitted:** [More Information Needed]
159
-
160
- ## Technical Specifications [optional]
161
-
162
- ### Model Architecture and Objective
163
-
164
- [More Information Needed]
165
-
166
- ### Compute Infrastructure
167
-
168
- [More Information Needed]
169
-
170
- #### Hardware
171
-
172
- [More Information Needed]
173
-
174
- #### Software
175
-
176
- [More Information Needed]
177
-
178
- ## Citation [optional]
179
-
180
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
181
-
182
- **BibTeX:**
183
-
184
- [More Information Needed]
185
-
186
- **APA:**
187
-
188
- [More Information Needed]
189
-
190
- ## Glossary [optional]
191
-
192
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
193
-
194
- [More Information Needed]
195
-
196
- ## More Information [optional]
197
-
198
- [More Information Needed]
199
-
200
- ## Model Card Authors [optional]
201
-
202
- [More Information Needed]
203
-
204
- ## Model Card Contact
205
-
206
- [More Information Needed]
207
- ### Framework versions
208
-
209
- - PEFT 0.18.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-500/adapter_config.json DELETED
@@ -1,43 +0,0 @@
1
- {
2
- "alora_invocation_tokens": null,
3
- "alpha_pattern": {},
4
- "arrow_config": null,
5
- "auto_mapping": null,
6
- "base_model_name_or_path": "HuggingFaceTB/SmolLM2-135M-Instruct",
7
- "bias": "none",
8
- "corda_config": null,
9
- "ensure_weight_tying": false,
10
- "eva_config": null,
11
- "exclude_modules": null,
12
- "fan_in_fan_out": false,
13
- "inference_mode": true,
14
- "init_lora_weights": true,
15
- "layer_replication": null,
16
- "layers_pattern": null,
17
- "layers_to_transform": null,
18
- "loftq_config": {},
19
- "lora_alpha": 32,
20
- "lora_bias": false,
21
- "lora_dropout": 0.05,
22
- "megatron_config": null,
23
- "megatron_core": "megatron.core",
24
- "modules_to_save": null,
25
- "peft_type": "LORA",
26
- "peft_version": "0.18.1",
27
- "qalora_group_size": 16,
28
- "r": 16,
29
- "rank_pattern": {},
30
- "revision": null,
31
- "target_modules": [
32
- "o_proj",
33
- "k_proj",
34
- "q_proj",
35
- "v_proj"
36
- ],
37
- "target_parameters": null,
38
- "task_type": "CAUSAL_LM",
39
- "trainable_token_indices": null,
40
- "use_dora": false,
41
- "use_qalora": false,
42
- "use_rslora": false
43
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-500/adapter_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:fa7990f97f806a5ca35e6c6fbeb73528ef072b484aa5b018c59c13c78e90f90e
3
- size 7404368
 
 
 
 
checkpoint-500/chat_template.jinja DELETED
@@ -1,6 +0,0 @@
1
- {% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system
2
- You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
3
- ' }}{% endif %}{{'<|im_start|>' + message['role'] + '
4
- ' + message['content'] + '<|im_end|>' + '
5
- '}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
6
- ' }}{% endif %}
 
 
 
 
 
 
 
checkpoint-500/optimizer.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:40159c5fd3460141b13169d253859137073f9b816cb234a93ddaebb9cced807a
3
- size 14950667
 
 
 
 
checkpoint-500/rng_state.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:71fdddcc073255adb0886fcc6bc840ad5473c35f42a4a6358de3fb1ebfc0b168
3
- size 14645
 
 
 
 
checkpoint-500/scaler.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:f77569c2e850b04af982cc8c1389f1430851448915c593b69e5da36ce05b71d7
3
- size 1383
 
 
 
 
checkpoint-500/scheduler.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:f93153ebd2954a5fdb05744476d618441b613584136cf9a87571f2c05d8f3e7c
3
- size 1465
 
 
 
 
checkpoint-500/tokenizer.json DELETED
The diff for this file is too large to render. See raw diff
 
checkpoint-500/tokenizer_config.json DELETED
@@ -1,17 +0,0 @@
1
- {
2
- "add_prefix_space": false,
3
- "backend": "tokenizers",
4
- "bos_token": "<|im_start|>",
5
- "clean_up_tokenization_spaces": false,
6
- "eos_token": "<|im_end|>",
7
- "extra_special_tokens": [
8
- "<|im_start|>",
9
- "<|im_end|>"
10
- ],
11
- "is_local": false,
12
- "model_max_length": 8192,
13
- "pad_token": "<|im_end|>",
14
- "tokenizer_class": "TokenizersBackend",
15
- "unk_token": "<|endoftext|>",
16
- "vocab_size": 49152
17
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-500/trainer_state.json DELETED
@@ -1,534 +0,0 @@
1
- {
2
- "best_global_step": null,
3
- "best_metric": null,
4
- "best_model_checkpoint": null,
5
- "epoch": 0.42955326460481097,
6
- "eval_steps": 500,
7
- "global_step": 500,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "entropy": 1.1566081464290618,
14
- "epoch": 0.00859106529209622,
15
- "grad_norm": 0.15214066207408905,
16
- "learning_rate": 0.00029884020618556697,
17
- "loss": 1.226055908203125,
18
- "mean_token_accuracy": 0.7206311166286469,
19
- "num_tokens": 26541.0,
20
- "step": 10
21
- },
22
- {
23
- "entropy": 1.2002840995788575,
24
- "epoch": 0.01718213058419244,
25
- "grad_norm": 0.1833968460559845,
26
- "learning_rate": 0.00029755154639175255,
27
- "loss": 1.2007878303527832,
28
- "mean_token_accuracy": 0.7304173231124877,
29
- "num_tokens": 52280.0,
30
- "step": 20
31
- },
32
- {
33
- "entropy": 1.083837980031967,
34
- "epoch": 0.02577319587628866,
35
- "grad_norm": 0.15183115005493164,
36
- "learning_rate": 0.0002962628865979381,
37
- "loss": 1.1582960128784179,
38
- "mean_token_accuracy": 0.7341346919536591,
39
- "num_tokens": 79603.0,
40
- "step": 30
41
- },
42
- {
43
- "entropy": 0.9830838143825531,
44
- "epoch": 0.03436426116838488,
45
- "grad_norm": 0.1640024185180664,
46
- "learning_rate": 0.00029497422680412364,
47
- "loss": 0.989743423461914,
48
- "mean_token_accuracy": 0.7652481079101563,
49
- "num_tokens": 106301.0,
50
- "step": 40
51
- },
52
- {
53
- "entropy": 0.9237875938415527,
54
- "epoch": 0.0429553264604811,
55
- "grad_norm": 0.1496010422706604,
56
- "learning_rate": 0.0002936855670103092,
57
- "loss": 0.9489053726196289,
58
- "mean_token_accuracy": 0.7682253539562225,
59
- "num_tokens": 131791.0,
60
- "step": 50
61
- },
62
- {
63
- "entropy": 0.945674192905426,
64
- "epoch": 0.05154639175257732,
65
- "grad_norm": 0.16326577961444855,
66
- "learning_rate": 0.0002923969072164948,
67
- "loss": 0.9549709320068359,
68
- "mean_token_accuracy": 0.7691607773303986,
69
- "num_tokens": 156376.0,
70
- "step": 60
71
- },
72
- {
73
- "entropy": 0.9946551442146301,
74
- "epoch": 0.06013745704467354,
75
- "grad_norm": 0.14451684057712555,
76
- "learning_rate": 0.00029110824742268037,
77
- "loss": 0.9979846954345704,
78
- "mean_token_accuracy": 0.7634965121746063,
79
- "num_tokens": 183753.0,
80
- "step": 70
81
- },
82
- {
83
- "entropy": 0.9941512286663056,
84
- "epoch": 0.06872852233676977,
85
- "grad_norm": 0.15088772773742676,
86
- "learning_rate": 0.00028981958762886595,
87
- "loss": 0.9917948722839356,
88
- "mean_token_accuracy": 0.7609466493129731,
89
- "num_tokens": 209154.0,
90
- "step": 80
91
- },
92
- {
93
- "entropy": 0.899951022863388,
94
- "epoch": 0.07731958762886598,
95
- "grad_norm": 0.1635286509990692,
96
- "learning_rate": 0.0002885309278350515,
97
- "loss": 0.9084607124328613,
98
- "mean_token_accuracy": 0.773688817024231,
99
- "num_tokens": 232285.0,
100
- "step": 90
101
- },
102
- {
103
- "entropy": 0.9840596318244934,
104
- "epoch": 0.0859106529209622,
105
- "grad_norm": 0.1687513142824173,
106
- "learning_rate": 0.0002872422680412371,
107
- "loss": 1.005988883972168,
108
- "mean_token_accuracy": 0.7565966248512268,
109
- "num_tokens": 258165.0,
110
- "step": 100
111
- },
112
- {
113
- "entropy": 0.9256654202938079,
114
- "epoch": 0.09450171821305842,
115
- "grad_norm": 0.15977086126804352,
116
- "learning_rate": 0.0002859536082474227,
117
- "loss": 0.8968193054199218,
118
- "mean_token_accuracy": 0.7728870630264282,
119
- "num_tokens": 283613.0,
120
- "step": 110
121
- },
122
- {
123
- "entropy": 0.9430408120155335,
124
- "epoch": 0.10309278350515463,
125
- "grad_norm": 0.14840301871299744,
126
- "learning_rate": 0.0002846649484536082,
127
- "loss": 0.9234983444213867,
128
- "mean_token_accuracy": 0.7652122378349304,
129
- "num_tokens": 308257.0,
130
- "step": 120
131
- },
132
- {
133
- "entropy": 0.9537142395973206,
134
- "epoch": 0.11168384879725086,
135
- "grad_norm": 0.1754530817270279,
136
- "learning_rate": 0.0002833762886597938,
137
- "loss": 0.949033260345459,
138
- "mean_token_accuracy": 0.7616437077522278,
139
- "num_tokens": 333700.0,
140
- "step": 130
141
- },
142
- {
143
- "entropy": 0.9385378897190094,
144
- "epoch": 0.12027491408934708,
145
- "grad_norm": 0.16817958652973175,
146
- "learning_rate": 0.00028208762886597935,
147
- "loss": 0.9222366333007812,
148
- "mean_token_accuracy": 0.7721138775348664,
149
- "num_tokens": 359746.0,
150
- "step": 140
151
- },
152
- {
153
- "entropy": 0.8476657152175904,
154
- "epoch": 0.12886597938144329,
155
- "grad_norm": 0.15769453346729279,
156
- "learning_rate": 0.0002807989690721649,
157
- "loss": 0.8225071907043457,
158
- "mean_token_accuracy": 0.7880932271480561,
159
- "num_tokens": 383468.0,
160
- "step": 150
161
- },
162
- {
163
- "entropy": 0.8862757325172425,
164
- "epoch": 0.13745704467353953,
165
- "grad_norm": 0.17620904743671417,
166
- "learning_rate": 0.0002795103092783505,
167
- "loss": 0.8871653556823731,
168
- "mean_token_accuracy": 0.7726393282413483,
169
- "num_tokens": 409660.0,
170
- "step": 160
171
- },
172
- {
173
- "entropy": 0.9064472615718842,
174
- "epoch": 0.14604810996563575,
175
- "grad_norm": 0.15342549979686737,
176
- "learning_rate": 0.0002782216494845361,
177
- "loss": 0.9092922210693359,
178
- "mean_token_accuracy": 0.7713103413581848,
179
- "num_tokens": 437071.0,
180
- "step": 170
181
- },
182
- {
183
- "entropy": 0.9095455348491669,
184
- "epoch": 0.15463917525773196,
185
- "grad_norm": 0.17525391280651093,
186
- "learning_rate": 0.00027693298969072165,
187
- "loss": 0.8956180572509765,
188
- "mean_token_accuracy": 0.7772489190101624,
189
- "num_tokens": 460852.0,
190
- "step": 180
191
- },
192
- {
193
- "entropy": 0.8514094173908233,
194
- "epoch": 0.16323024054982818,
195
- "grad_norm": 0.15829823911190033,
196
- "learning_rate": 0.0002756443298969072,
197
- "loss": 0.8513413429260254,
198
- "mean_token_accuracy": 0.779003256559372,
199
- "num_tokens": 486114.0,
200
- "step": 190
201
- },
202
- {
203
- "entropy": 0.8995061337947845,
204
- "epoch": 0.1718213058419244,
205
- "grad_norm": 0.15622206032276154,
206
- "learning_rate": 0.00027435567010309275,
207
- "loss": 0.898340892791748,
208
- "mean_token_accuracy": 0.7708433449268342,
209
- "num_tokens": 512397.0,
210
- "step": 200
211
- },
212
- {
213
- "entropy": 0.8732269763946533,
214
- "epoch": 0.18041237113402062,
215
- "grad_norm": 0.17759552597999573,
216
- "learning_rate": 0.0002730670103092783,
217
- "loss": 0.8695596694946289,
218
- "mean_token_accuracy": 0.7793173313140869,
219
- "num_tokens": 536554.0,
220
- "step": 210
221
- },
222
- {
223
- "entropy": 0.9366106212139129,
224
- "epoch": 0.18900343642611683,
225
- "grad_norm": 0.1767597794532776,
226
- "learning_rate": 0.0002717783505154639,
227
- "loss": 0.9249121665954589,
228
- "mean_token_accuracy": 0.7724991559982299,
229
- "num_tokens": 561637.0,
230
- "step": 220
231
- },
232
- {
233
- "entropy": 0.8857321858406066,
234
- "epoch": 0.19759450171821305,
235
- "grad_norm": 0.20788735151290894,
236
- "learning_rate": 0.0002704896907216495,
237
- "loss": 0.8790401458740235,
238
- "mean_token_accuracy": 0.7778670430183411,
239
- "num_tokens": 586614.0,
240
- "step": 230
241
- },
242
- {
243
- "entropy": 0.9273143649101258,
244
- "epoch": 0.20618556701030927,
245
- "grad_norm": 0.16100187599658966,
246
- "learning_rate": 0.00026920103092783505,
247
- "loss": 0.9232232093811035,
248
- "mean_token_accuracy": 0.7677364766597747,
249
- "num_tokens": 612428.0,
250
- "step": 240
251
- },
252
- {
253
- "entropy": 0.8786617696285248,
254
- "epoch": 0.21477663230240548,
255
- "grad_norm": 0.1608886867761612,
256
- "learning_rate": 0.00026791237113402063,
257
- "loss": 0.8883329391479492,
258
- "mean_token_accuracy": 0.7775621056556702,
259
- "num_tokens": 638564.0,
260
- "step": 250
261
- },
262
- {
263
- "entropy": 0.9227964103221893,
264
- "epoch": 0.22336769759450173,
265
- "grad_norm": 0.1510680615901947,
266
- "learning_rate": 0.00026662371134020615,
267
- "loss": 0.9296038627624512,
268
- "mean_token_accuracy": 0.7714014649391174,
269
- "num_tokens": 665263.0,
270
- "step": 260
271
- },
272
- {
273
- "entropy": 0.9126189887523651,
274
- "epoch": 0.23195876288659795,
275
- "grad_norm": 0.16434459388256073,
276
- "learning_rate": 0.00026533505154639173,
277
- "loss": 0.9191689491271973,
278
- "mean_token_accuracy": 0.7705935597419739,
279
- "num_tokens": 691956.0,
280
- "step": 270
281
- },
282
- {
283
- "entropy": 0.8797955513000488,
284
- "epoch": 0.24054982817869416,
285
- "grad_norm": 0.1714707911014557,
286
- "learning_rate": 0.0002640463917525773,
287
- "loss": 0.8856119155883789,
288
- "mean_token_accuracy": 0.7766761302947998,
289
- "num_tokens": 716820.0,
290
- "step": 280
291
- },
292
- {
293
- "entropy": 0.877093505859375,
294
- "epoch": 0.24914089347079038,
295
- "grad_norm": 0.15326625108718872,
296
- "learning_rate": 0.0002627577319587629,
297
- "loss": 0.8566678047180176,
298
- "mean_token_accuracy": 0.7812119901180268,
299
- "num_tokens": 742221.0,
300
- "step": 290
301
- },
302
- {
303
- "entropy": 0.87042076587677,
304
- "epoch": 0.25773195876288657,
305
- "grad_norm": 0.1710847020149231,
306
- "learning_rate": 0.00026146907216494846,
307
- "loss": 0.8829298973083496,
308
- "mean_token_accuracy": 0.7770332932472229,
309
- "num_tokens": 765839.0,
310
- "step": 300
311
- },
312
- {
313
- "entropy": 0.8417458057403564,
314
- "epoch": 0.2663230240549828,
315
- "grad_norm": 0.1496821790933609,
316
- "learning_rate": 0.00026018041237113403,
317
- "loss": 0.8433545112609864,
318
- "mean_token_accuracy": 0.7857938230037689,
319
- "num_tokens": 791131.0,
320
- "step": 310
321
- },
322
- {
323
- "entropy": 0.9158999443054199,
324
- "epoch": 0.27491408934707906,
325
- "grad_norm": 0.17929035425186157,
326
- "learning_rate": 0.00025889175257731955,
327
- "loss": 0.9123490333557129,
328
- "mean_token_accuracy": 0.775556480884552,
329
- "num_tokens": 816984.0,
330
- "step": 320
331
- },
332
- {
333
- "entropy": 0.8314016401767731,
334
- "epoch": 0.28350515463917525,
335
- "grad_norm": 0.16051091253757477,
336
- "learning_rate": 0.00025760309278350513,
337
- "loss": 0.8279628753662109,
338
- "mean_token_accuracy": 0.7862712442874908,
339
- "num_tokens": 842220.0,
340
- "step": 330
341
- },
342
- {
343
- "entropy": 0.8677021145820618,
344
- "epoch": 0.2920962199312715,
345
- "grad_norm": 0.19157563149929047,
346
- "learning_rate": 0.0002563144329896907,
347
- "loss": 0.8552864074707032,
348
- "mean_token_accuracy": 0.7835110187530517,
349
- "num_tokens": 867868.0,
350
- "step": 340
351
- },
352
- {
353
- "entropy": 0.8919401407241822,
354
- "epoch": 0.3006872852233677,
355
- "grad_norm": 0.14258626103401184,
356
- "learning_rate": 0.0002550257731958763,
357
- "loss": 0.9019416809082031,
358
- "mean_token_accuracy": 0.7760707855224609,
359
- "num_tokens": 894676.0,
360
- "step": 350
361
- },
362
- {
363
- "entropy": 0.8415017545223236,
364
- "epoch": 0.30927835051546393,
365
- "grad_norm": 0.14830808341503143,
366
- "learning_rate": 0.00025373711340206186,
367
- "loss": 0.8072587013244629,
368
- "mean_token_accuracy": 0.7912742376327515,
369
- "num_tokens": 921632.0,
370
- "step": 360
371
- },
372
- {
373
- "entropy": 0.8371863842010498,
374
- "epoch": 0.3178694158075601,
375
- "grad_norm": 0.15538977086544037,
376
- "learning_rate": 0.0002524484536082474,
377
- "loss": 0.8308550834655761,
378
- "mean_token_accuracy": 0.7849089920520782,
379
- "num_tokens": 945955.0,
380
- "step": 370
381
- },
382
- {
383
- "entropy": 0.8338176369667053,
384
- "epoch": 0.32646048109965636,
385
- "grad_norm": 0.1711844801902771,
386
- "learning_rate": 0.00025115979381443295,
387
- "loss": 0.8479556083679199,
388
- "mean_token_accuracy": 0.7850408136844635,
389
- "num_tokens": 971508.0,
390
- "step": 380
391
- },
392
- {
393
- "entropy": 0.8876297354698182,
394
- "epoch": 0.33505154639175255,
395
- "grad_norm": 0.14902909100055695,
396
- "learning_rate": 0.00024987113402061853,
397
- "loss": 0.8821809768676758,
398
- "mean_token_accuracy": 0.7764640510082245,
399
- "num_tokens": 995124.0,
400
- "step": 390
401
- },
402
- {
403
- "entropy": 0.8212514638900756,
404
- "epoch": 0.3436426116838488,
405
- "grad_norm": 0.17950233817100525,
406
- "learning_rate": 0.0002485824742268041,
407
- "loss": 0.8255527496337891,
408
- "mean_token_accuracy": 0.7870513379573822,
409
- "num_tokens": 1019735.0,
410
- "step": 400
411
- },
412
- {
413
- "entropy": 0.9294387817382812,
414
- "epoch": 0.35223367697594504,
415
- "grad_norm": 0.15237703919410706,
416
- "learning_rate": 0.0002472938144329897,
417
- "loss": 0.9235431671142578,
418
- "mean_token_accuracy": 0.7703655660152435,
419
- "num_tokens": 1046295.0,
420
- "step": 410
421
- },
422
- {
423
- "entropy": 0.8429355442523956,
424
- "epoch": 0.36082474226804123,
425
- "grad_norm": 0.1589658558368683,
426
- "learning_rate": 0.0002460051546391752,
427
- "loss": 0.8329957008361817,
428
- "mean_token_accuracy": 0.7897172749042511,
429
- "num_tokens": 1072020.0,
430
- "step": 420
431
- },
432
- {
433
- "entropy": 0.8407810032367706,
434
- "epoch": 0.3694158075601375,
435
- "grad_norm": 0.17521116137504578,
436
- "learning_rate": 0.0002447164948453608,
437
- "loss": 0.8535200119018554,
438
- "mean_token_accuracy": 0.7809976935386658,
439
- "num_tokens": 1098420.0,
440
- "step": 430
441
- },
442
- {
443
- "entropy": 0.8515082120895385,
444
- "epoch": 0.37800687285223367,
445
- "grad_norm": 0.15218117833137512,
446
- "learning_rate": 0.00024342783505154638,
447
- "loss": 0.8459153175354004,
448
- "mean_token_accuracy": 0.7833445191383361,
449
- "num_tokens": 1124235.0,
450
- "step": 440
451
- },
452
- {
453
- "entropy": 0.8506015419960022,
454
- "epoch": 0.3865979381443299,
455
- "grad_norm": 0.1809423863887787,
456
- "learning_rate": 0.00024213917525773193,
457
- "loss": 0.8384982109069824,
458
- "mean_token_accuracy": 0.7841372966766358,
459
- "num_tokens": 1149636.0,
460
- "step": 450
461
- },
462
- {
463
- "entropy": 0.9110018074512481,
464
- "epoch": 0.3951890034364261,
465
- "grad_norm": 0.15085963904857635,
466
- "learning_rate": 0.0002408505154639175,
467
- "loss": 0.9191832542419434,
468
- "mean_token_accuracy": 0.7757422685623169,
469
- "num_tokens": 1173711.0,
470
- "step": 460
471
- },
472
- {
473
- "entropy": 0.9000201225280762,
474
- "epoch": 0.40378006872852235,
475
- "grad_norm": 0.16748838126659393,
476
- "learning_rate": 0.00023956185567010308,
477
- "loss": 0.9107230186462403,
478
- "mean_token_accuracy": 0.7768173336982727,
479
- "num_tokens": 1199340.0,
480
- "step": 470
481
- },
482
- {
483
- "entropy": 0.8751251935958863,
484
- "epoch": 0.41237113402061853,
485
- "grad_norm": 0.1593722254037857,
486
- "learning_rate": 0.00023827319587628863,
487
- "loss": 0.8804635047912598,
488
- "mean_token_accuracy": 0.7814023733139038,
489
- "num_tokens": 1226032.0,
490
- "step": 480
491
- },
492
- {
493
- "entropy": 0.8843371510505676,
494
- "epoch": 0.4209621993127148,
495
- "grad_norm": 0.12278851121664047,
496
- "learning_rate": 0.0002369845360824742,
497
- "loss": 0.8643080711364746,
498
- "mean_token_accuracy": 0.7809630572795868,
499
- "num_tokens": 1252769.0,
500
- "step": 490
501
- },
502
- {
503
- "entropy": 0.8834336459636688,
504
- "epoch": 0.42955326460481097,
505
- "grad_norm": 0.1367531716823578,
506
- "learning_rate": 0.00023569587628865976,
507
- "loss": 0.8869522094726563,
508
- "mean_token_accuracy": 0.7775909185409546,
509
- "num_tokens": 1278979.0,
510
- "step": 500
511
- }
512
- ],
513
- "logging_steps": 10,
514
- "max_steps": 2328,
515
- "num_input_tokens_seen": 0,
516
- "num_train_epochs": 2,
517
- "save_steps": 500,
518
- "stateful_callbacks": {
519
- "TrainerControl": {
520
- "args": {
521
- "should_epoch_stop": false,
522
- "should_evaluate": false,
523
- "should_log": false,
524
- "should_save": true,
525
- "should_training_stop": false
526
- },
527
- "attributes": {}
528
- }
529
- },
530
- "total_flos": 2129667010873344.0,
531
- "train_batch_size": 16,
532
- "trial_name": null,
533
- "trial_params": null
534
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-500/training_args.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:f2af45870b4c4d611381846df6bdd4a32cb292a7b5897b79b7e5f196be57c596
3
- size 5585