NBAmine commited on
Commit
4b80fbe
·
verified ·
1 Parent(s): c483f91

Clean up: Remove redundant training checkpoint folder

Browse files
last-checkpoint/README.md DELETED
@@ -1,209 +0,0 @@
1
- ---
2
- base_model: mistralai/Mistral-Nemo-Instruct-2407
3
- library_name: peft
4
- pipeline_tag: text-generation
5
- tags:
6
- - base_model:adapter:mistralai/Mistral-Nemo-Instruct-2407
7
- - lora
8
- - sft
9
- - transformers
10
- - trl
11
- ---
12
-
13
- # Model Card for Model ID
14
-
15
- <!-- Provide a quick summary of what the model is/does. -->
16
-
17
-
18
-
19
- ## Model Details
20
-
21
- ### Model Description
22
-
23
- <!-- Provide a longer summary of what this model is. -->
24
-
25
-
26
-
27
- - **Developed by:** [More Information Needed]
28
- - **Funded by [optional]:** [More Information Needed]
29
- - **Shared by [optional]:** [More Information Needed]
30
- - **Model type:** [More Information Needed]
31
- - **Language(s) (NLP):** [More Information Needed]
32
- - **License:** [More Information Needed]
33
- - **Finetuned from model [optional]:** [More Information Needed]
34
-
35
- ### Model Sources [optional]
36
-
37
- <!-- Provide the basic links for the model. -->
38
-
39
- - **Repository:** [More Information Needed]
40
- - **Paper [optional]:** [More Information Needed]
41
- - **Demo [optional]:** [More Information Needed]
42
-
43
- ## Uses
44
-
45
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
46
-
47
- ### Direct Use
48
-
49
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
50
-
51
- [More Information Needed]
52
-
53
- ### Downstream Use [optional]
54
-
55
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
56
-
57
- [More Information Needed]
58
-
59
- ### Out-of-Scope Use
60
-
61
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
62
-
63
- [More Information Needed]
64
-
65
- ## Bias, Risks, and Limitations
66
-
67
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
68
-
69
- [More Information Needed]
70
-
71
- ### Recommendations
72
-
73
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
74
-
75
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
76
-
77
- ## How to Get Started with the Model
78
-
79
- Use the code below to get started with the model.
80
-
81
- [More Information Needed]
82
-
83
- ## Training Details
84
-
85
- ### Training Data
86
-
87
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
88
-
89
- [More Information Needed]
90
-
91
- ### Training Procedure
92
-
93
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
94
-
95
- #### Preprocessing [optional]
96
-
97
- [More Information Needed]
98
-
99
-
100
- #### Training Hyperparameters
101
-
102
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
103
-
104
- #### Speeds, Sizes, Times [optional]
105
-
106
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
107
-
108
- [More Information Needed]
109
-
110
- ## Evaluation
111
-
112
- <!-- This section describes the evaluation protocols and provides the results. -->
113
-
114
- ### Testing Data, Factors & Metrics
115
-
116
- #### Testing Data
117
-
118
- <!-- This should link to a Dataset Card if possible. -->
119
-
120
- [More Information Needed]
121
-
122
- #### Factors
123
-
124
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
125
-
126
- [More Information Needed]
127
-
128
- #### Metrics
129
-
130
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
131
-
132
- [More Information Needed]
133
-
134
- ### Results
135
-
136
- [More Information Needed]
137
-
138
- #### Summary
139
-
140
-
141
-
142
- ## Model Examination [optional]
143
-
144
- <!-- Relevant interpretability work for the model goes here -->
145
-
146
- [More Information Needed]
147
-
148
- ## Environmental Impact
149
-
150
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
151
-
152
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
153
-
154
- - **Hardware Type:** [More Information Needed]
155
- - **Hours used:** [More Information Needed]
156
- - **Cloud Provider:** [More Information Needed]
157
- - **Compute Region:** [More Information Needed]
158
- - **Carbon Emitted:** [More Information Needed]
159
-
160
- ## Technical Specifications [optional]
161
-
162
- ### Model Architecture and Objective
163
-
164
- [More Information Needed]
165
-
166
- ### Compute Infrastructure
167
-
168
- [More Information Needed]
169
-
170
- #### Hardware
171
-
172
- [More Information Needed]
173
-
174
- #### Software
175
-
176
- [More Information Needed]
177
-
178
- ## Citation [optional]
179
-
180
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
181
-
182
- **BibTeX:**
183
-
184
- [More Information Needed]
185
-
186
- **APA:**
187
-
188
- [More Information Needed]
189
-
190
- ## Glossary [optional]
191
-
192
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
193
-
194
- [More Information Needed]
195
-
196
- ## More Information [optional]
197
-
198
- [More Information Needed]
199
-
200
- ## Model Card Authors [optional]
201
-
202
- [More Information Needed]
203
-
204
- ## Model Card Contact
205
-
206
- [More Information Needed]
207
- ### Framework versions
208
-
209
- - PEFT 0.18.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
last-checkpoint/adapter_config.json DELETED
@@ -1,46 +0,0 @@
1
- {
2
- "alora_invocation_tokens": null,
3
- "alpha_pattern": {},
4
- "arrow_config": null,
5
- "auto_mapping": null,
6
- "base_model_name_or_path": "mistralai/Mistral-Nemo-Instruct-2407",
7
- "bias": "none",
8
- "corda_config": null,
9
- "ensure_weight_tying": false,
10
- "eva_config": null,
11
- "exclude_modules": null,
12
- "fan_in_fan_out": false,
13
- "inference_mode": true,
14
- "init_lora_weights": true,
15
- "layer_replication": null,
16
- "layers_pattern": null,
17
- "layers_to_transform": null,
18
- "loftq_config": {},
19
- "lora_alpha": 32,
20
- "lora_bias": false,
21
- "lora_dropout": 0.05,
22
- "megatron_config": null,
23
- "megatron_core": "megatron.core",
24
- "modules_to_save": null,
25
- "peft_type": "LORA",
26
- "peft_version": "0.18.1",
27
- "qalora_group_size": 16,
28
- "r": 16,
29
- "rank_pattern": {},
30
- "revision": null,
31
- "target_modules": [
32
- "up_proj",
33
- "q_proj",
34
- "down_proj",
35
- "o_proj",
36
- "v_proj",
37
- "gate_proj",
38
- "k_proj"
39
- ],
40
- "target_parameters": null,
41
- "task_type": "CAUSAL_LM",
42
- "trainable_token_indices": null,
43
- "use_dora": false,
44
- "use_qalora": false,
45
- "use_rslora": false
46
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
last-checkpoint/adapter_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:cd11b39803251198dcb7e030bb69c10b05cece6a9e45160afcc921794cb790cc
3
- size 228140600
 
 
 
 
last-checkpoint/chat_template.jinja DELETED
@@ -1,87 +0,0 @@
1
- {%- if messages[0]["role"] == "system" %}
2
- {%- set system_message = messages[0]["content"] %}
3
- {%- set loop_messages = messages[1:] %}
4
- {%- else %}
5
- {%- set loop_messages = messages %}
6
- {%- endif %}
7
- {%- if not tools is defined %}
8
- {%- set tools = none %}
9
- {%- endif %}
10
- {%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}
11
-
12
- {#- This block checks for alternating user/assistant messages, skipping tool calling messages #}
13
- {%- set ns = namespace() %}
14
- {%- set ns.index = 0 %}
15
- {%- for message in loop_messages %}
16
- {%- if not (message.role == "tool" or message.role == "tool_results" or (message.tool_calls is defined and message.tool_calls is not none)) %}
17
- {%- if (message["role"] == "user") != (ns.index % 2 == 0) %}
18
- {{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
19
- {%- endif %}
20
- {%- set ns.index = ns.index + 1 %}
21
- {%- endif %}
22
- {%- endfor %}
23
-
24
- {{- bos_token }}
25
- {%- for message in loop_messages %}
26
- {%- if message["role"] == "user" %}
27
- {%- if tools is not none and (message == user_messages[-1]) %}
28
- {{- "[AVAILABLE_TOOLS][" }}
29
- {%- for tool in tools %}
30
- {%- set tool = tool.function %}
31
- {{- '{"type": "function", "function": {' }}
32
- {%- for key, val in tool.items() if key != "return" %}
33
- {%- if val is string %}
34
- {{- '"' + key + '": "' + val + '"' }}
35
- {%- else %}
36
- {{- '"' + key + '": ' + val|tojson }}
37
- {%- endif %}
38
- {%- if not loop.last %}
39
- {{- ", " }}
40
- {%- endif %}
41
- {%- endfor %}
42
- {{- "}}" }}
43
- {%- if not loop.last %}
44
- {{- ", " }}
45
- {%- else %}
46
- {{- "]" }}
47
- {%- endif %}
48
- {%- endfor %}
49
- {{- "[/AVAILABLE_TOOLS]" }}
50
- {%- endif %}
51
- {%- if loop.last and system_message is defined %}
52
- {{- "[INST]" + system_message + "\n\n" + message["content"] + "[/INST]" }}
53
- {%- else %}
54
- {{- "[INST]" + message["content"] + "[/INST]" }}
55
- {%- endif %}
56
- {%- elif (message.tool_calls is defined and message.tool_calls is not none) %}
57
- {{- "[TOOL_CALLS][" }}
58
- {%- for tool_call in message.tool_calls %}
59
- {%- set out = tool_call.function|tojson %}
60
- {{- out[:-1] }}
61
- {%- if not tool_call.id is defined or tool_call.id|length != 9 %}
62
- {{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
63
- {%- endif %}
64
- {{- ', "id": "' + tool_call.id + '"}' }}
65
- {%- if not loop.last %}
66
- {{- ", " }}
67
- {%- else %}
68
- {{- "]" + eos_token }}
69
- {%- endif %}
70
- {%- endfor %}
71
- {%- elif message["role"] == "assistant" %}
72
- {{- message["content"] + eos_token}}
73
- {%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
74
- {%- if message.content is defined and message.content.content is defined %}
75
- {%- set content = message.content.content %}
76
- {%- else %}
77
- {%- set content = message.content %}
78
- {%- endif %}
79
- {{- '[TOOL_RESULTS]{"content": ' + content|string + ", " }}
80
- {%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}
81
- {{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
82
- {%- endif %}
83
- {{- '"call_id": "' + message.tool_call_id + '"}[/TOOL_RESULTS]' }}
84
- {%- else %}
85
- {{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
86
- {%- endif %}
87
- {%- endfor %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
last-checkpoint/optimizer.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:2acc6b93233f66c6ddb8b195904fe7cd974047004ffcd02f1d993e85ebc0a677
3
- size 116484839
 
 
 
 
last-checkpoint/rng_state.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:b7883d803ebcafeb5684e5f2bcceb39f2a54258143c0c4972785bf0a17a36dc8
3
- size 14645
 
 
 
 
last-checkpoint/scaler.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:7e188a4cd7f588ff088ff68a7d9c18ed5ca570c5b11d6790654dcb4e3accb81e
3
- size 1383
 
 
 
 
last-checkpoint/scheduler.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:08f9e08af1aa8eb785ad1df11d9714b6c859fed11b125506168e50ec9ce7af28
3
- size 1465
 
 
 
 
last-checkpoint/special_tokens_map.json DELETED
@@ -1,24 +0,0 @@
1
- {
2
- "bos_token": {
3
- "content": "<s>",
4
- "lstrip": false,
5
- "normalized": false,
6
- "rstrip": false,
7
- "single_word": false
8
- },
9
- "eos_token": {
10
- "content": "</s>",
11
- "lstrip": false,
12
- "normalized": false,
13
- "rstrip": false,
14
- "single_word": false
15
- },
16
- "pad_token": "<unk>",
17
- "unk_token": {
18
- "content": "<unk>",
19
- "lstrip": false,
20
- "normalized": false,
21
- "rstrip": false,
22
- "single_word": false
23
- }
24
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
last-checkpoint/tokenizer.json DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:b0240ce510f08e6c2041724e9043e33be9d251d1e4a4d94eb68cd47b954b61d2
3
- size 17078292
 
 
 
 
last-checkpoint/tokenizer_config.json DELETED
The diff for this file is too large to render. See raw diff
 
last-checkpoint/trainer_state.json DELETED
@@ -1,839 +0,0 @@
1
- {
2
- "best_global_step": 750,
3
- "best_metric": 0.5089643597602844,
4
- "best_model_checkpoint": "./adapter-phase1/checkpoint-750",
5
- "epoch": 1.2,
6
- "eval_steps": 150,
7
- "global_step": 750,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "entropy": 0.5730658559128642,
14
- "epoch": 0.016,
15
- "grad_norm": 0.7699093818664551,
16
- "learning_rate": 9.9744e-05,
17
- "loss": 0.5612,
18
- "mean_token_accuracy": 0.8584013111889363,
19
- "num_tokens": 41645.0,
20
- "step": 10
21
- },
22
- {
23
- "entropy": 0.5554142463952303,
24
- "epoch": 0.032,
25
- "grad_norm": 0.4457343518733978,
26
- "learning_rate": 9.9424e-05,
27
- "loss": 0.5273,
28
- "mean_token_accuracy": 0.8591317173093558,
29
- "num_tokens": 70428.0,
30
- "step": 20
31
- },
32
- {
33
- "entropy": 0.5915529232472181,
34
- "epoch": 0.048,
35
- "grad_norm": 0.4764552116394043,
36
- "learning_rate": 9.9104e-05,
37
- "loss": 0.571,
38
- "mean_token_accuracy": 0.8499542407691478,
39
- "num_tokens": 93459.0,
40
- "step": 30
41
- },
42
- {
43
- "entropy": 0.6596924969926476,
44
- "epoch": 0.064,
45
- "grad_norm": 0.5651283264160156,
46
- "learning_rate": 9.878400000000001e-05,
47
- "loss": 0.6238,
48
- "mean_token_accuracy": 0.8377377673983574,
49
- "num_tokens": 111867.0,
50
- "step": 40
51
- },
52
- {
53
- "entropy": 0.7530947024002671,
54
- "epoch": 0.08,
55
- "grad_norm": 1.0816445350646973,
56
- "learning_rate": 9.8464e-05,
57
- "loss": 0.7105,
58
- "mean_token_accuracy": 0.8222946774214506,
59
- "num_tokens": 124704.0,
60
- "step": 50
61
- },
62
- {
63
- "entropy": 0.4857567956671119,
64
- "epoch": 0.096,
65
- "grad_norm": 0.6103881001472473,
66
- "learning_rate": 9.8144e-05,
67
- "loss": 0.4722,
68
- "mean_token_accuracy": 0.8663083262741565,
69
- "num_tokens": 164155.0,
70
- "step": 60
71
- },
72
- {
73
- "entropy": 0.49499469716101885,
74
- "epoch": 0.112,
75
- "grad_norm": 0.4260229170322418,
76
- "learning_rate": 9.7824e-05,
77
- "loss": 0.4937,
78
- "mean_token_accuracy": 0.8636182025074959,
79
- "num_tokens": 191809.0,
80
- "step": 70
81
- },
82
- {
83
- "entropy": 0.5589337946847082,
84
- "epoch": 0.128,
85
- "grad_norm": 0.5363767147064209,
86
- "learning_rate": 9.750400000000001e-05,
87
- "loss": 0.5229,
88
- "mean_token_accuracy": 0.8587806306779384,
89
- "num_tokens": 214088.0,
90
- "step": 80
91
- },
92
- {
93
- "entropy": 0.6311248591169715,
94
- "epoch": 0.144,
95
- "grad_norm": 0.5147454142570496,
96
- "learning_rate": 9.718400000000001e-05,
97
- "loss": 0.5924,
98
- "mean_token_accuracy": 0.8405886068940163,
99
- "num_tokens": 231940.0,
100
- "step": 90
101
- },
102
- {
103
- "entropy": 0.7393803182989359,
104
- "epoch": 0.16,
105
- "grad_norm": 0.8111797571182251,
106
- "learning_rate": 9.6864e-05,
107
- "loss": 0.6967,
108
- "mean_token_accuracy": 0.8172304671257734,
109
- "num_tokens": 244562.0,
110
- "step": 100
111
- },
112
- {
113
- "entropy": 0.4579536892473698,
114
- "epoch": 0.176,
115
- "grad_norm": 0.3394778072834015,
116
- "learning_rate": 9.6544e-05,
117
- "loss": 0.447,
118
- "mean_token_accuracy": 0.8702179156243801,
119
- "num_tokens": 284741.0,
120
- "step": 110
121
- },
122
- {
123
- "entropy": 0.49405701775103805,
124
- "epoch": 0.192,
125
- "grad_norm": 0.4252355694770813,
126
- "learning_rate": 9.6224e-05,
127
- "loss": 0.4859,
128
- "mean_token_accuracy": 0.8623054280877114,
129
- "num_tokens": 313346.0,
130
- "step": 120
131
- },
132
- {
133
- "entropy": 0.5604451755061746,
134
- "epoch": 0.208,
135
- "grad_norm": 0.4283740222454071,
136
- "learning_rate": 9.590400000000001e-05,
137
- "loss": 0.5192,
138
- "mean_token_accuracy": 0.854694677516818,
139
- "num_tokens": 336313.0,
140
- "step": 130
141
- },
142
- {
143
- "entropy": 0.5832516210153699,
144
- "epoch": 0.224,
145
- "grad_norm": 0.5574604272842407,
146
- "learning_rate": 9.558400000000001e-05,
147
- "loss": 0.5639,
148
- "mean_token_accuracy": 0.8488325711339713,
149
- "num_tokens": 354930.0,
150
- "step": 140
151
- },
152
- {
153
- "entropy": 0.7139101274311542,
154
- "epoch": 0.24,
155
- "grad_norm": 0.9666448831558228,
156
- "learning_rate": 9.526400000000001e-05,
157
- "loss": 0.6833,
158
- "mean_token_accuracy": 0.8205961517989635,
159
- "num_tokens": 367969.0,
160
- "step": 150
161
- },
162
- {
163
- "epoch": 0.24,
164
- "eval_entropy": 0.5783390111327171,
165
- "eval_loss": 0.571293830871582,
166
- "eval_mean_token_accuracy": 0.8406577571630478,
167
- "eval_num_tokens": 367969.0,
168
- "eval_runtime": 949.8374,
169
- "eval_samples_per_second": 2.106,
170
- "eval_steps_per_second": 0.526,
171
- "step": 150
172
- },
173
- {
174
- "entropy": 0.4597647111862898,
175
- "epoch": 0.256,
176
- "grad_norm": 0.3567339777946472,
177
- "learning_rate": 9.4944e-05,
178
- "loss": 0.4466,
179
- "mean_token_accuracy": 0.8714417025446892,
180
- "num_tokens": 408832.0,
181
- "step": 160
182
- },
183
- {
184
- "entropy": 0.5107372861355544,
185
- "epoch": 0.272,
186
- "grad_norm": 0.365077406167984,
187
- "learning_rate": 9.462400000000001e-05,
188
- "loss": 0.4913,
189
- "mean_token_accuracy": 0.8607851468026638,
190
- "num_tokens": 437025.0,
191
- "step": 170
192
- },
193
- {
194
- "entropy": 0.549993178807199,
195
- "epoch": 0.288,
196
- "grad_norm": 0.921683669090271,
197
- "learning_rate": 9.4304e-05,
198
- "loss": 0.5275,
199
- "mean_token_accuracy": 0.8546455435454845,
200
- "num_tokens": 459640.0,
201
- "step": 180
202
- },
203
- {
204
- "entropy": 0.5974416058510542,
205
- "epoch": 0.304,
206
- "grad_norm": 0.5437716245651245,
207
- "learning_rate": 9.3984e-05,
208
- "loss": 0.5801,
209
- "mean_token_accuracy": 0.8437417767941952,
210
- "num_tokens": 477698.0,
211
- "step": 190
212
- },
213
- {
214
- "entropy": 0.7215298766270279,
215
- "epoch": 0.32,
216
- "grad_norm": 0.8138436079025269,
217
- "learning_rate": 9.3664e-05,
218
- "loss": 0.6601,
219
- "mean_token_accuracy": 0.823812685161829,
220
- "num_tokens": 490139.0,
221
- "step": 200
222
- },
223
- {
224
- "entropy": 0.41703027142211796,
225
- "epoch": 0.336,
226
- "grad_norm": 0.3584943115711212,
227
- "learning_rate": 9.3344e-05,
228
- "loss": 0.4339,
229
- "mean_token_accuracy": 0.8721328835934401,
230
- "num_tokens": 529888.0,
231
- "step": 210
232
- },
233
- {
234
- "entropy": 0.49080247841775415,
235
- "epoch": 0.352,
236
- "grad_norm": 0.384344220161438,
237
- "learning_rate": 9.3024e-05,
238
- "loss": 0.4712,
239
- "mean_token_accuracy": 0.8645332992076874,
240
- "num_tokens": 558603.0,
241
- "step": 220
242
- },
243
- {
244
- "entropy": 0.5349763100966811,
245
- "epoch": 0.368,
246
- "grad_norm": 0.45451048016548157,
247
- "learning_rate": 9.2704e-05,
248
- "loss": 0.5037,
249
- "mean_token_accuracy": 0.8568188078701496,
250
- "num_tokens": 581795.0,
251
- "step": 230
252
- },
253
- {
254
- "entropy": 0.5738583486527205,
255
- "epoch": 0.384,
256
- "grad_norm": 0.5654199123382568,
257
- "learning_rate": 9.2384e-05,
258
- "loss": 0.5418,
259
- "mean_token_accuracy": 0.853222968429327,
260
- "num_tokens": 600644.0,
261
- "step": 240
262
- },
263
- {
264
- "entropy": 0.7203033225610852,
265
- "epoch": 0.4,
266
- "grad_norm": 0.7510268688201904,
267
- "learning_rate": 9.2064e-05,
268
- "loss": 0.6766,
269
- "mean_token_accuracy": 0.8211973272264004,
270
- "num_tokens": 613463.0,
271
- "step": 250
272
- },
273
- {
274
- "entropy": 0.44393815845251083,
275
- "epoch": 0.416,
276
- "grad_norm": 0.28410282731056213,
277
- "learning_rate": 9.174400000000001e-05,
278
- "loss": 0.44,
279
- "mean_token_accuracy": 0.8719362560659647,
280
- "num_tokens": 653916.0,
281
- "step": 260
282
- },
283
- {
284
- "entropy": 0.49695795457810166,
285
- "epoch": 0.432,
286
- "grad_norm": 0.37558332085609436,
287
- "learning_rate": 9.142400000000001e-05,
288
- "loss": 0.4716,
289
- "mean_token_accuracy": 0.8661885127425194,
290
- "num_tokens": 682316.0,
291
- "step": 270
292
- },
293
- {
294
- "entropy": 0.5323772713541984,
295
- "epoch": 0.448,
296
- "grad_norm": 0.38234350085258484,
297
- "learning_rate": 9.1104e-05,
298
- "loss": 0.5153,
299
- "mean_token_accuracy": 0.8568883746862411,
300
- "num_tokens": 705595.0,
301
- "step": 280
302
- },
303
- {
304
- "entropy": 0.5631753627210856,
305
- "epoch": 0.464,
306
- "grad_norm": 0.5723901391029358,
307
- "learning_rate": 9.0784e-05,
308
- "loss": 0.5255,
309
- "mean_token_accuracy": 0.8554483871906996,
310
- "num_tokens": 724261.0,
311
- "step": 290
312
- },
313
- {
314
- "entropy": 0.7213884828612208,
315
- "epoch": 0.48,
316
- "grad_norm": 0.8089677095413208,
317
- "learning_rate": 9.0464e-05,
318
- "loss": 0.6531,
319
- "mean_token_accuracy": 0.8282416380941868,
320
- "num_tokens": 736888.0,
321
- "step": 300
322
- },
323
- {
324
- "epoch": 0.48,
325
- "eval_entropy": 0.5434884746074676,
326
- "eval_loss": 0.5595113635063171,
327
- "eval_mean_token_accuracy": 0.8425255041122437,
328
- "eval_num_tokens": 736888.0,
329
- "eval_runtime": 960.1391,
330
- "eval_samples_per_second": 2.083,
331
- "eval_steps_per_second": 0.521,
332
- "step": 300
333
- },
334
- {
335
- "entropy": 0.4070850105956197,
336
- "epoch": 0.496,
337
- "grad_norm": 0.3164316713809967,
338
- "learning_rate": 9.014400000000001e-05,
339
- "loss": 0.4347,
340
- "mean_token_accuracy": 0.8718821201473475,
341
- "num_tokens": 777614.0,
342
- "step": 310
343
- },
344
- {
345
- "entropy": 0.4795668950304389,
346
- "epoch": 0.512,
347
- "grad_norm": 0.36268115043640137,
348
- "learning_rate": 8.982400000000001e-05,
349
- "loss": 0.4528,
350
- "mean_token_accuracy": 0.8704403065145015,
351
- "num_tokens": 806367.0,
352
- "step": 320
353
- },
354
- {
355
- "entropy": 0.5111968521028757,
356
- "epoch": 0.528,
357
- "grad_norm": 0.46226978302001953,
358
- "learning_rate": 8.9504e-05,
359
- "loss": 0.493,
360
- "mean_token_accuracy": 0.8627706792205572,
361
- "num_tokens": 829451.0,
362
- "step": 330
363
- },
364
- {
365
- "entropy": 0.5876390922814607,
366
- "epoch": 0.544,
367
- "grad_norm": 0.5480667352676392,
368
- "learning_rate": 8.9184e-05,
369
- "loss": 0.553,
370
- "mean_token_accuracy": 0.8471334714442491,
371
- "num_tokens": 848096.0,
372
- "step": 340
373
- },
374
- {
375
- "entropy": 0.6881475947797299,
376
- "epoch": 0.56,
377
- "grad_norm": 0.7945592403411865,
378
- "learning_rate": 8.886400000000001e-05,
379
- "loss": 0.6313,
380
- "mean_token_accuracy": 0.8284968301653862,
381
- "num_tokens": 861213.0,
382
- "step": 350
383
- },
384
- {
385
- "entropy": 0.41089317156001925,
386
- "epoch": 0.576,
387
- "grad_norm": 0.42293551564216614,
388
- "learning_rate": 8.854400000000001e-05,
389
- "loss": 0.4347,
390
- "mean_token_accuracy": 0.872112188860774,
391
- "num_tokens": 902600.0,
392
- "step": 360
393
- },
394
- {
395
- "entropy": 0.4904096835292876,
396
- "epoch": 0.592,
397
- "grad_norm": 0.36379531025886536,
398
- "learning_rate": 8.8224e-05,
399
- "loss": 0.4607,
400
- "mean_token_accuracy": 0.8698205541819334,
401
- "num_tokens": 931252.0,
402
- "step": 370
403
- },
404
- {
405
- "entropy": 0.5374009614810348,
406
- "epoch": 0.608,
407
- "grad_norm": 0.39369404315948486,
408
- "learning_rate": 8.7904e-05,
409
- "loss": 0.5105,
410
- "mean_token_accuracy": 0.8589953418821097,
411
- "num_tokens": 954440.0,
412
- "step": 380
413
- },
414
- {
415
- "entropy": 0.5679504619911313,
416
- "epoch": 0.624,
417
- "grad_norm": 0.4653433859348297,
418
- "learning_rate": 8.7584e-05,
419
- "loss": 0.5375,
420
- "mean_token_accuracy": 0.8520914990454912,
421
- "num_tokens": 973561.0,
422
- "step": 390
423
- },
424
- {
425
- "entropy": 0.6884654611349106,
426
- "epoch": 0.64,
427
- "grad_norm": 0.75364750623703,
428
- "learning_rate": 8.7264e-05,
429
- "loss": 0.6424,
430
- "mean_token_accuracy": 0.8248051077127456,
431
- "num_tokens": 986663.0,
432
- "step": 400
433
- },
434
- {
435
- "entropy": 0.4144328683614731,
436
- "epoch": 0.656,
437
- "grad_norm": 0.3373418152332306,
438
- "learning_rate": 8.6944e-05,
439
- "loss": 0.4187,
440
- "mean_token_accuracy": 0.8739797297865153,
441
- "num_tokens": 1027401.0,
442
- "step": 410
443
- },
444
- {
445
- "entropy": 0.4791171012446284,
446
- "epoch": 0.672,
447
- "grad_norm": 0.3650929033756256,
448
- "learning_rate": 8.6624e-05,
449
- "loss": 0.4467,
450
- "mean_token_accuracy": 0.8709899630397558,
451
- "num_tokens": 1056257.0,
452
- "step": 420
453
- },
454
- {
455
- "entropy": 0.5098350465297699,
456
- "epoch": 0.688,
457
- "grad_norm": 0.3901676833629608,
458
- "learning_rate": 8.6304e-05,
459
- "loss": 0.4884,
460
- "mean_token_accuracy": 0.8597885746508837,
461
- "num_tokens": 1079455.0,
462
- "step": 430
463
- },
464
- {
465
- "entropy": 0.5697799691930413,
466
- "epoch": 0.704,
467
- "grad_norm": 0.4802677631378174,
468
- "learning_rate": 8.598400000000001e-05,
469
- "loss": 0.5419,
470
- "mean_token_accuracy": 0.8520946379750967,
471
- "num_tokens": 1098043.0,
472
- "step": 440
473
- },
474
- {
475
- "entropy": 0.6666812863200903,
476
- "epoch": 0.72,
477
- "grad_norm": 0.683847188949585,
478
- "learning_rate": 8.5664e-05,
479
- "loss": 0.6184,
480
- "mean_token_accuracy": 0.8354901738464833,
481
- "num_tokens": 1111075.0,
482
- "step": 450
483
- },
484
- {
485
- "epoch": 0.72,
486
- "eval_entropy": 0.5324991160035133,
487
- "eval_loss": 0.528014600276947,
488
- "eval_mean_token_accuracy": 0.8489134066104889,
489
- "eval_num_tokens": 1111075.0,
490
- "eval_runtime": 945.2593,
491
- "eval_samples_per_second": 2.116,
492
- "eval_steps_per_second": 0.529,
493
- "step": 450
494
- },
495
- {
496
- "entropy": 0.42183027137070894,
497
- "epoch": 0.736,
498
- "grad_norm": 0.30442023277282715,
499
- "learning_rate": 8.5344e-05,
500
- "loss": 0.4258,
501
- "mean_token_accuracy": 0.8741536162793636,
502
- "num_tokens": 1151218.0,
503
- "step": 460
504
- },
505
- {
506
- "entropy": 0.48677743785083294,
507
- "epoch": 0.752,
508
- "grad_norm": 0.3515753746032715,
509
- "learning_rate": 8.5024e-05,
510
- "loss": 0.4589,
511
- "mean_token_accuracy": 0.8665938679128885,
512
- "num_tokens": 1179788.0,
513
- "step": 470
514
- },
515
- {
516
- "entropy": 0.5252734636887908,
517
- "epoch": 0.768,
518
- "grad_norm": 0.396014004945755,
519
- "learning_rate": 8.470400000000001e-05,
520
- "loss": 0.5021,
521
- "mean_token_accuracy": 0.8596790559589863,
522
- "num_tokens": 1202980.0,
523
- "step": 480
524
- },
525
- {
526
- "entropy": 0.5485310789197684,
527
- "epoch": 0.784,
528
- "grad_norm": 0.4635639488697052,
529
- "learning_rate": 8.438400000000001e-05,
530
- "loss": 0.5269,
531
- "mean_token_accuracy": 0.8574780374765396,
532
- "num_tokens": 1221687.0,
533
- "step": 490
534
- },
535
- {
536
- "entropy": 0.6833245273679495,
537
- "epoch": 0.8,
538
- "grad_norm": 0.700869619846344,
539
- "learning_rate": 8.406400000000001e-05,
540
- "loss": 0.6387,
541
- "mean_token_accuracy": 0.8237500067800283,
542
- "num_tokens": 1234514.0,
543
- "step": 500
544
- },
545
- {
546
- "entropy": 0.40601076018065213,
547
- "epoch": 0.816,
548
- "grad_norm": 0.32708939909935,
549
- "learning_rate": 8.3744e-05,
550
- "loss": 0.408,
551
- "mean_token_accuracy": 0.8766346644610167,
552
- "num_tokens": 1275596.0,
553
- "step": 510
554
- },
555
- {
556
- "entropy": 0.4850410340353847,
557
- "epoch": 0.832,
558
- "grad_norm": 0.3244037926197052,
559
- "learning_rate": 8.3424e-05,
560
- "loss": 0.4573,
561
- "mean_token_accuracy": 0.8669606879353523,
562
- "num_tokens": 1304145.0,
563
- "step": 520
564
- },
565
- {
566
- "entropy": 0.515147590637207,
567
- "epoch": 0.848,
568
- "grad_norm": 0.38329896330833435,
569
- "learning_rate": 8.310400000000001e-05,
570
- "loss": 0.4863,
571
- "mean_token_accuracy": 0.8632290612906217,
572
- "num_tokens": 1326982.0,
573
- "step": 530
574
- },
575
- {
576
- "entropy": 0.5729153681546449,
577
- "epoch": 0.864,
578
- "grad_norm": 0.4813045263290405,
579
- "learning_rate": 8.278400000000001e-05,
580
- "loss": 0.5479,
581
- "mean_token_accuracy": 0.847094003856182,
582
- "num_tokens": 1345175.0,
583
- "step": 540
584
- },
585
- {
586
- "entropy": 0.6926866250112653,
587
- "epoch": 0.88,
588
- "grad_norm": 0.8769316673278809,
589
- "learning_rate": 8.2464e-05,
590
- "loss": 0.6399,
591
- "mean_token_accuracy": 0.8273460377007723,
592
- "num_tokens": 1357612.0,
593
- "step": 550
594
- },
595
- {
596
- "entropy": 0.43388050645589826,
597
- "epoch": 0.896,
598
- "grad_norm": 0.3002051115036011,
599
- "learning_rate": 8.2144e-05,
600
- "loss": 0.4221,
601
- "mean_token_accuracy": 0.8723263714462519,
602
- "num_tokens": 1395656.0,
603
- "step": 560
604
- },
605
- {
606
- "entropy": 0.47867969051003456,
607
- "epoch": 0.912,
608
- "grad_norm": 0.3462761640548706,
609
- "learning_rate": 8.1824e-05,
610
- "loss": 0.4537,
611
- "mean_token_accuracy": 0.8681479040533304,
612
- "num_tokens": 1423390.0,
613
- "step": 570
614
- },
615
- {
616
- "entropy": 0.5229433966800571,
617
- "epoch": 0.928,
618
- "grad_norm": 0.3625573217868805,
619
- "learning_rate": 8.1504e-05,
620
- "loss": 0.4876,
621
- "mean_token_accuracy": 0.8632072422653436,
622
- "num_tokens": 1446208.0,
623
- "step": 580
624
- },
625
- {
626
- "entropy": 0.5398129008710384,
627
- "epoch": 0.944,
628
- "grad_norm": 0.47574466466903687,
629
- "learning_rate": 8.1184e-05,
630
- "loss": 0.5157,
631
- "mean_token_accuracy": 0.8563049588352442,
632
- "num_tokens": 1464656.0,
633
- "step": 590
634
- },
635
- {
636
- "entropy": 0.6804929848760366,
637
- "epoch": 0.96,
638
- "grad_norm": 0.8095116019248962,
639
- "learning_rate": 8.0864e-05,
640
- "loss": 0.6355,
641
- "mean_token_accuracy": 0.8217808835208416,
642
- "num_tokens": 1477330.0,
643
- "step": 600
644
- },
645
- {
646
- "epoch": 0.96,
647
- "eval_entropy": 0.5289207182526589,
648
- "eval_loss": 0.5149514675140381,
649
- "eval_mean_token_accuracy": 0.8529882525205612,
650
- "eval_num_tokens": 1477330.0,
651
- "eval_runtime": 926.8896,
652
- "eval_samples_per_second": 2.158,
653
- "eval_steps_per_second": 0.539,
654
- "step": 600
655
- },
656
- {
657
- "entropy": 0.4441436754539609,
658
- "epoch": 0.976,
659
- "grad_norm": 0.3561602830886841,
660
- "learning_rate": 8.0544e-05,
661
- "loss": 0.4314,
662
- "mean_token_accuracy": 0.8683189991861582,
663
- "num_tokens": 1511581.0,
664
- "step": 610
665
- },
666
- {
667
- "entropy": 0.5468204509466886,
668
- "epoch": 0.992,
669
- "grad_norm": 0.4606572985649109,
670
- "learning_rate": 8.0224e-05,
671
- "loss": 0.5182,
672
- "mean_token_accuracy": 0.8496101208031177,
673
- "num_tokens": 1531489.0,
674
- "step": 620
675
- },
676
- {
677
- "entropy": 0.5519248092547059,
678
- "epoch": 1.008,
679
- "grad_norm": 0.2525905966758728,
680
- "learning_rate": 7.9904e-05,
681
- "loss": 0.5057,
682
- "mean_token_accuracy": 0.8535349532961846,
683
- "num_tokens": 1562607.0,
684
- "step": 630
685
- },
686
- {
687
- "entropy": 0.40866237645968795,
688
- "epoch": 1.024,
689
- "grad_norm": 0.3288176357746124,
690
- "learning_rate": 7.9584e-05,
691
- "loss": 0.3852,
692
- "mean_token_accuracy": 0.8830868355929852,
693
- "num_tokens": 1596809.0,
694
- "step": 640
695
- },
696
- {
697
- "entropy": 0.45947086848318575,
698
- "epoch": 1.04,
699
- "grad_norm": 0.3615362048149109,
700
- "learning_rate": 7.9264e-05,
701
- "loss": 0.427,
702
- "mean_token_accuracy": 0.8752098884433508,
703
- "num_tokens": 1623093.0,
704
- "step": 650
705
- },
706
- {
707
- "entropy": 0.48522304855287074,
708
- "epoch": 1.056,
709
- "grad_norm": 0.6554524898529053,
710
- "learning_rate": 7.894400000000001e-05,
711
- "loss": 0.4329,
712
- "mean_token_accuracy": 0.8734084539115429,
713
- "num_tokens": 1644709.0,
714
- "step": 660
715
- },
716
- {
717
- "entropy": 0.49721850007772445,
718
- "epoch": 1.072,
719
- "grad_norm": 0.5433140397071838,
720
- "learning_rate": 7.862400000000001e-05,
721
- "loss": 0.4597,
722
- "mean_token_accuracy": 0.8716968420892954,
723
- "num_tokens": 1661390.0,
724
- "step": 670
725
- },
726
- {
727
- "entropy": 0.48405226683244107,
728
- "epoch": 1.088,
729
- "grad_norm": 0.3505966365337372,
730
- "learning_rate": 7.8304e-05,
731
- "loss": 0.462,
732
- "mean_token_accuracy": 0.8606270607560873,
733
- "num_tokens": 1689962.0,
734
- "step": 680
735
- },
736
- {
737
- "entropy": 0.4111426337622106,
738
- "epoch": 1.104,
739
- "grad_norm": 0.3443625271320343,
740
- "learning_rate": 7.7984e-05,
741
- "loss": 0.379,
742
- "mean_token_accuracy": 0.8858214061707258,
743
- "num_tokens": 1723548.0,
744
- "step": 690
745
- },
746
- {
747
- "entropy": 0.45519505683332684,
748
- "epoch": 1.12,
749
- "grad_norm": 0.43360981345176697,
750
- "learning_rate": 7.766400000000001e-05,
751
- "loss": 0.417,
752
- "mean_token_accuracy": 0.8762232139706612,
753
- "num_tokens": 1749751.0,
754
- "step": 700
755
- },
756
- {
757
- "entropy": 0.47432731073349715,
758
- "epoch": 1.1360000000000001,
759
- "grad_norm": 0.48396071791648865,
760
- "learning_rate": 7.734400000000001e-05,
761
- "loss": 0.4423,
762
- "mean_token_accuracy": 0.8694507408887148,
763
- "num_tokens": 1770706.0,
764
- "step": 710
765
- },
766
- {
767
- "entropy": 0.5454745849594473,
768
- "epoch": 1.152,
769
- "grad_norm": 0.6095965504646301,
770
- "learning_rate": 7.702400000000001e-05,
771
- "loss": 0.491,
772
- "mean_token_accuracy": 0.8575698807835579,
773
- "num_tokens": 1786806.0,
774
- "step": 720
775
- },
776
- {
777
- "entropy": 0.4902514209970832,
778
- "epoch": 1.168,
779
- "grad_norm": 0.3669983148574829,
780
- "learning_rate": 7.6704e-05,
781
- "loss": 0.4784,
782
- "mean_token_accuracy": 0.8593317933380604,
783
- "num_tokens": 1814214.0,
784
- "step": 730
785
- },
786
- {
787
- "entropy": 0.42295527271926403,
788
- "epoch": 1.184,
789
- "grad_norm": 0.37635281682014465,
790
- "learning_rate": 7.6384e-05,
791
- "loss": 0.4006,
792
- "mean_token_accuracy": 0.8772692024707794,
793
- "num_tokens": 1845867.0,
794
- "step": 740
795
- },
796
- {
797
- "entropy": 0.45570146273821593,
798
- "epoch": 1.2,
799
- "grad_norm": 0.4157540798187256,
800
- "learning_rate": 7.6064e-05,
801
- "loss": 0.4131,
802
- "mean_token_accuracy": 0.8780728686600924,
803
- "num_tokens": 1870856.0,
804
- "step": 750
805
- },
806
- {
807
- "epoch": 1.2,
808
- "eval_entropy": 0.4963426645994186,
809
- "eval_loss": 0.5089643597602844,
810
- "eval_mean_token_accuracy": 0.8573515907526016,
811
- "eval_num_tokens": 1870856.0,
812
- "eval_runtime": 949.8047,
813
- "eval_samples_per_second": 2.106,
814
- "eval_steps_per_second": 0.526,
815
- "step": 750
816
- }
817
- ],
818
- "logging_steps": 10,
819
- "max_steps": 3125,
820
- "num_input_tokens_seen": 0,
821
- "num_train_epochs": 5,
822
- "save_steps": 150,
823
- "stateful_callbacks": {
824
- "TrainerControl": {
825
- "args": {
826
- "should_epoch_stop": false,
827
- "should_evaluate": false,
828
- "should_log": false,
829
- "should_save": true,
830
- "should_training_stop": false
831
- },
832
- "attributes": {}
833
- }
834
- },
835
- "total_flos": 1.3058997783257088e+17,
836
- "train_batch_size": 1,
837
- "trial_name": null,
838
- "trial_params": null
839
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
last-checkpoint/training_args.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:d19add453be896fb8010267a01d849597b52aecb53969dce6ab3000e56f1b7d0
3
- size 6353