Arvyniitb commited on
Commit
7121ba4
·
verified ·
1 Parent(s): 2fd1f7f

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Mistral-7B-Instruct-v0.3
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:mistralai/Mistral-7B-Instruct-v0.3
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.17.1
adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "v_proj",
29
+ "q_proj"
30
+ ],
31
+ "target_parameters": null,
32
+ "task_type": "CAUSAL_LM",
33
+ "trainable_token_indices": null,
34
+ "use_dora": false,
35
+ "use_qalora": false,
36
+ "use_rslora": false
37
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02e3b8172a56ce4c92a61a6ee49e572c2e100bb1bced833c487cdc427d8cce20
3
+ size 27280152
chat_template.jinja ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if messages[0]["role"] == "system" %}
2
+ {%- set system_message = messages[0]["content"] %}
3
+ {%- set loop_messages = messages[1:] %}
4
+ {%- else %}
5
+ {%- set loop_messages = messages %}
6
+ {%- endif %}
7
+ {%- if not tools is defined %}
8
+ {%- set tools = none %}
9
+ {%- endif %}
10
+ {%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}
11
+
12
+ {#- This block checks for alternating user/assistant messages, skipping tool calling messages #}
13
+ {%- set ns = namespace() %}
14
+ {%- set ns.index = 0 %}
15
+ {%- for message in loop_messages %}
16
+ {%- if not (message.role == "tool" or message.role == "tool_results" or (message.tool_calls is defined and message.tool_calls is not none)) %}
17
+ {%- if (message["role"] == "user") != (ns.index % 2 == 0) %}
18
+ {{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
19
+ {%- endif %}
20
+ {%- set ns.index = ns.index + 1 %}
21
+ {%- endif %}
22
+ {%- endfor %}
23
+
24
+ {{- bos_token }}
25
+ {%- for message in loop_messages %}
26
+ {%- if message["role"] == "user" %}
27
+ {%- if tools is not none and (message == user_messages[-1]) %}
28
+ {{- "[AVAILABLE_TOOLS] [" }}
29
+ {%- for tool in tools %}
30
+ {%- set tool = tool.function %}
31
+ {{- '{"type": "function", "function": {' }}
32
+ {%- for key, val in tool.items() if key != "return" %}
33
+ {%- if val is string %}
34
+ {{- '"' + key + '": "' + val + '"' }}
35
+ {%- else %}
36
+ {{- '"' + key + '": ' + val|tojson }}
37
+ {%- endif %}
38
+ {%- if not loop.last %}
39
+ {{- ", " }}
40
+ {%- endif %}
41
+ {%- endfor %}
42
+ {{- "}}" }}
43
+ {%- if not loop.last %}
44
+ {{- ", " }}
45
+ {%- else %}
46
+ {{- "]" }}
47
+ {%- endif %}
48
+ {%- endfor %}
49
+ {{- "[/AVAILABLE_TOOLS]" }}
50
+ {%- endif %}
51
+ {%- if loop.last and system_message is defined %}
52
+ {{- "[INST] " + system_message + "\n\n" + message["content"] + "[/INST]" }}
53
+ {%- else %}
54
+ {{- "[INST] " + message["content"] + "[/INST]" }}
55
+ {%- endif %}
56
+ {%- elif message.tool_calls is defined and message.tool_calls is not none %}
57
+ {{- "[TOOL_CALLS] [" }}
58
+ {%- for tool_call in message.tool_calls %}
59
+ {%- set out = tool_call.function|tojson %}
60
+ {{- out[:-1] }}
61
+ {%- if not tool_call.id is defined or tool_call.id|length != 9 %}
62
+ {{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
63
+ {%- endif %}
64
+ {{- ', "id": "' + tool_call.id + '"}' }}
65
+ {%- if not loop.last %}
66
+ {{- ", " }}
67
+ {%- else %}
68
+ {{- "]" + eos_token }}
69
+ {%- endif %}
70
+ {%- endfor %}
71
+ {%- elif message["role"] == "assistant" %}
72
+ {{- " " + message["content"]|trim + eos_token}}
73
+ {%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
74
+ {%- if message.content is defined and message.content.content is defined %}
75
+ {%- set content = message.content.content %}
76
+ {%- else %}
77
+ {%- set content = message.content %}
78
+ {%- endif %}
79
+ {{- '[TOOL_RESULTS] {"content": ' + content|string + ", " }}
80
+ {%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}
81
+ {{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
82
+ {%- endif %}
83
+ {{- '"call_id": "' + message.tool_call_id + '"}[/TOOL_RESULTS]' }}
84
+ {%- else %}
85
+ {{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
86
+ {%- endif %}
87
+ {%- endfor %}
checkpoint-1614/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Mistral-7B-Instruct-v0.3
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:mistralai/Mistral-7B-Instruct-v0.3
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.17.1
checkpoint-1614/adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "v_proj",
29
+ "q_proj"
30
+ ],
31
+ "target_parameters": null,
32
+ "task_type": "CAUSAL_LM",
33
+ "trainable_token_indices": null,
34
+ "use_dora": false,
35
+ "use_qalora": false,
36
+ "use_rslora": false
37
+ }
checkpoint-1614/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:138bc4a53345f66bbe9dccfeb764693d683d74f325bff087469b821689714e14
3
+ size 27280152
checkpoint-1614/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0509d41125bc3f40db1342bb896becf5ff467cfb1d9d137b1e7922644297578d
3
+ size 54636235
checkpoint-1614/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ff31d94a59a6800c19956b59e113de14dff0ed2c8b60ba63a119bfa9ddbce42
3
+ size 14645
checkpoint-1614/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7347ad12e6e15aa1ca55f9964eb23b0a99161fe4d54b9b40659277f87dce33e3
3
+ size 1383
checkpoint-1614/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef8631dd3c4ba7a3b65d5660a164f683f3ab8aee8c1b69d0593430f4c2690f41
3
+ size 1465
checkpoint-1614/trainer_state.json ADDED
@@ -0,0 +1,1177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 2.0,
6
+ "eval_steps": 500,
7
+ "global_step": 1614,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.012397334573066791,
14
+ "grad_norm": 0.6915941834449768,
15
+ "learning_rate": 0.00018,
16
+ "loss": 2.3006,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.024794669146133583,
21
+ "grad_norm": 0.6273291707038879,
22
+ "learning_rate": 0.0001992534218166736,
23
+ "loss": 2.1317,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.037192003719200374,
28
+ "grad_norm": 0.45228877663612366,
29
+ "learning_rate": 0.00019842389050186645,
30
+ "loss": 1.9808,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.049589338292267165,
35
+ "grad_norm": 0.3886250853538513,
36
+ "learning_rate": 0.00019759435918705932,
37
+ "loss": 1.8968,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.06198667286533395,
42
+ "grad_norm": 0.3263673484325409,
43
+ "learning_rate": 0.00019676482787225219,
44
+ "loss": 1.9159,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.07438400743840075,
49
+ "grad_norm": 0.3357856273651123,
50
+ "learning_rate": 0.00019593529655744505,
51
+ "loss": 1.9052,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.08678134201146753,
56
+ "grad_norm": 0.3885687291622162,
57
+ "learning_rate": 0.00019510576524263792,
58
+ "loss": 1.8788,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.09917867658453433,
63
+ "grad_norm": 0.33165615797042847,
64
+ "learning_rate": 0.00019427623392783078,
65
+ "loss": 1.876,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.11157601115760112,
70
+ "grad_norm": 0.3510221242904663,
71
+ "learning_rate": 0.00019344670261302365,
72
+ "loss": 1.7992,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.1239733457306679,
77
+ "grad_norm": 0.35817059874534607,
78
+ "learning_rate": 0.0001926171712982165,
79
+ "loss": 1.7386,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.1363706803037347,
84
+ "grad_norm": 0.3222461938858032,
85
+ "learning_rate": 0.00019178763998340938,
86
+ "loss": 1.8766,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.1487680148768015,
91
+ "grad_norm": 0.302379310131073,
92
+ "learning_rate": 0.00019095810866860224,
93
+ "loss": 1.833,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.16116534944986827,
98
+ "grad_norm": 0.3621092736721039,
99
+ "learning_rate": 0.00019012857735379514,
100
+ "loss": 1.7433,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.17356268402293507,
105
+ "grad_norm": 0.35103851556777954,
106
+ "learning_rate": 0.00018929904603898798,
107
+ "loss": 1.8552,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.18596001859600186,
112
+ "grad_norm": 0.3761669993400574,
113
+ "learning_rate": 0.00018846951472418084,
114
+ "loss": 1.8072,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.19835735316906866,
119
+ "grad_norm": 0.2791743576526642,
120
+ "learning_rate": 0.00018763998340937373,
121
+ "loss": 1.882,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.21075468774213543,
126
+ "grad_norm": 0.28887104988098145,
127
+ "learning_rate": 0.00018681045209456657,
128
+ "loss": 1.945,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.22315202231520223,
133
+ "grad_norm": 0.3621903955936432,
134
+ "learning_rate": 0.00018598092077975944,
135
+ "loss": 1.7506,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.23554935688826903,
140
+ "grad_norm": 0.36899787187576294,
141
+ "learning_rate": 0.00018515138946495233,
142
+ "loss": 1.8959,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.2479466914613358,
147
+ "grad_norm": 0.3354776203632355,
148
+ "learning_rate": 0.00018432185815014517,
149
+ "loss": 1.9282,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.2603440260344026,
154
+ "grad_norm": 0.3030059337615967,
155
+ "learning_rate": 0.00018349232683533803,
156
+ "loss": 1.9876,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.2727413606074694,
161
+ "grad_norm": 0.380134254693985,
162
+ "learning_rate": 0.00018266279552053093,
163
+ "loss": 1.795,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.2851386951805362,
168
+ "grad_norm": 0.3484257161617279,
169
+ "learning_rate": 0.00018183326420572376,
170
+ "loss": 1.9131,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.297536029753603,
175
+ "grad_norm": 0.3204387426376343,
176
+ "learning_rate": 0.00018100373289091663,
177
+ "loss": 1.9829,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.3099333643266698,
182
+ "grad_norm": 0.3759450912475586,
183
+ "learning_rate": 0.00018017420157610952,
184
+ "loss": 1.9002,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.32233069889973653,
189
+ "grad_norm": 0.3721698820590973,
190
+ "learning_rate": 0.00017934467026130236,
191
+ "loss": 1.8657,
192
+ "step": 260
193
+ },
194
+ {
195
+ "epoch": 0.33472803347280333,
196
+ "grad_norm": 0.35085615515708923,
197
+ "learning_rate": 0.00017851513894649523,
198
+ "loss": 1.8623,
199
+ "step": 270
200
+ },
201
+ {
202
+ "epoch": 0.34712536804587013,
203
+ "grad_norm": 0.35750696063041687,
204
+ "learning_rate": 0.00017768560763168812,
205
+ "loss": 1.8743,
206
+ "step": 280
207
+ },
208
+ {
209
+ "epoch": 0.35952270261893693,
210
+ "grad_norm": 0.3072109520435333,
211
+ "learning_rate": 0.00017685607631688096,
212
+ "loss": 1.9626,
213
+ "step": 290
214
+ },
215
+ {
216
+ "epoch": 0.3719200371920037,
217
+ "grad_norm": 0.40647512674331665,
218
+ "learning_rate": 0.00017602654500207382,
219
+ "loss": 1.8034,
220
+ "step": 300
221
+ },
222
+ {
223
+ "epoch": 0.3843173717650705,
224
+ "grad_norm": 0.3000311851501465,
225
+ "learning_rate": 0.00017519701368726672,
226
+ "loss": 1.8995,
227
+ "step": 310
228
+ },
229
+ {
230
+ "epoch": 0.3967147063381373,
231
+ "grad_norm": 0.36904624104499817,
232
+ "learning_rate": 0.00017436748237245955,
233
+ "loss": 1.82,
234
+ "step": 320
235
+ },
236
+ {
237
+ "epoch": 0.40911204091120407,
238
+ "grad_norm": 0.337799072265625,
239
+ "learning_rate": 0.00017353795105765242,
240
+ "loss": 1.8706,
241
+ "step": 330
242
+ },
243
+ {
244
+ "epoch": 0.42150937548427087,
245
+ "grad_norm": 0.4223800003528595,
246
+ "learning_rate": 0.0001727084197428453,
247
+ "loss": 1.8584,
248
+ "step": 340
249
+ },
250
+ {
251
+ "epoch": 0.43390671005733766,
252
+ "grad_norm": 0.34497585892677307,
253
+ "learning_rate": 0.00017187888842803818,
254
+ "loss": 1.8417,
255
+ "step": 350
256
+ },
257
+ {
258
+ "epoch": 0.44630404463040446,
259
+ "grad_norm": 0.34032031893730164,
260
+ "learning_rate": 0.00017104935711323102,
261
+ "loss": 1.7917,
262
+ "step": 360
263
+ },
264
+ {
265
+ "epoch": 0.45870137920347126,
266
+ "grad_norm": 0.4158859848976135,
267
+ "learning_rate": 0.0001702198257984239,
268
+ "loss": 1.8749,
269
+ "step": 370
270
+ },
271
+ {
272
+ "epoch": 0.47109871377653806,
273
+ "grad_norm": 0.36545196175575256,
274
+ "learning_rate": 0.00016939029448361678,
275
+ "loss": 1.8337,
276
+ "step": 380
277
+ },
278
+ {
279
+ "epoch": 0.48349604834960486,
280
+ "grad_norm": 0.32123473286628723,
281
+ "learning_rate": 0.00016856076316880961,
282
+ "loss": 1.9321,
283
+ "step": 390
284
+ },
285
+ {
286
+ "epoch": 0.4958933829226716,
287
+ "grad_norm": 0.45476439595222473,
288
+ "learning_rate": 0.0001677312318540025,
289
+ "loss": 1.8185,
290
+ "step": 400
291
+ },
292
+ {
293
+ "epoch": 0.5082907174957384,
294
+ "grad_norm": 0.3410905599594116,
295
+ "learning_rate": 0.00016690170053919537,
296
+ "loss": 1.9082,
297
+ "step": 410
298
+ },
299
+ {
300
+ "epoch": 0.5206880520688052,
301
+ "grad_norm": 0.3436656892299652,
302
+ "learning_rate": 0.0001660721692243882,
303
+ "loss": 1.7821,
304
+ "step": 420
305
+ },
306
+ {
307
+ "epoch": 0.533085386641872,
308
+ "grad_norm": 0.34343594312667847,
309
+ "learning_rate": 0.0001652426379095811,
310
+ "loss": 1.8184,
311
+ "step": 430
312
+ },
313
+ {
314
+ "epoch": 0.5454827212149388,
315
+ "grad_norm": 0.4309318959712982,
316
+ "learning_rate": 0.00016441310659477397,
317
+ "loss": 1.9124,
318
+ "step": 440
319
+ },
320
+ {
321
+ "epoch": 0.5578800557880056,
322
+ "grad_norm": 0.4032953679561615,
323
+ "learning_rate": 0.0001635835752799668,
324
+ "loss": 1.7962,
325
+ "step": 450
326
+ },
327
+ {
328
+ "epoch": 0.5702773903610724,
329
+ "grad_norm": 0.3726664185523987,
330
+ "learning_rate": 0.0001627540439651597,
331
+ "loss": 1.8158,
332
+ "step": 460
333
+ },
334
+ {
335
+ "epoch": 0.5826747249341392,
336
+ "grad_norm": 0.36948224902153015,
337
+ "learning_rate": 0.00016192451265035257,
338
+ "loss": 1.7539,
339
+ "step": 470
340
+ },
341
+ {
342
+ "epoch": 0.595072059507206,
343
+ "grad_norm": 0.33594000339508057,
344
+ "learning_rate": 0.0001610949813355454,
345
+ "loss": 1.9287,
346
+ "step": 480
347
+ },
348
+ {
349
+ "epoch": 0.6074693940802728,
350
+ "grad_norm": 0.3209264576435089,
351
+ "learning_rate": 0.0001602654500207383,
352
+ "loss": 1.8375,
353
+ "step": 490
354
+ },
355
+ {
356
+ "epoch": 0.6198667286533396,
357
+ "grad_norm": 0.38256484270095825,
358
+ "learning_rate": 0.00015943591870593116,
359
+ "loss": 1.817,
360
+ "step": 500
361
+ },
362
+ {
363
+ "epoch": 0.6322640632264063,
364
+ "grad_norm": 0.38966864347457886,
365
+ "learning_rate": 0.000158606387391124,
366
+ "loss": 1.7397,
367
+ "step": 510
368
+ },
369
+ {
370
+ "epoch": 0.6446613977994731,
371
+ "grad_norm": 0.45601052045822144,
372
+ "learning_rate": 0.0001577768560763169,
373
+ "loss": 1.7448,
374
+ "step": 520
375
+ },
376
+ {
377
+ "epoch": 0.6570587323725399,
378
+ "grad_norm": 0.39306697249412537,
379
+ "learning_rate": 0.00015694732476150976,
380
+ "loss": 1.8505,
381
+ "step": 530
382
+ },
383
+ {
384
+ "epoch": 0.6694560669456067,
385
+ "grad_norm": 0.4036141633987427,
386
+ "learning_rate": 0.00015611779344670262,
387
+ "loss": 1.9946,
388
+ "step": 540
389
+ },
390
+ {
391
+ "epoch": 0.6818534015186735,
392
+ "grad_norm": 0.34463125467300415,
393
+ "learning_rate": 0.0001552882621318955,
394
+ "loss": 2.0461,
395
+ "step": 550
396
+ },
397
+ {
398
+ "epoch": 0.6942507360917403,
399
+ "grad_norm": 0.3309987485408783,
400
+ "learning_rate": 0.00015445873081708835,
401
+ "loss": 1.8714,
402
+ "step": 560
403
+ },
404
+ {
405
+ "epoch": 0.7066480706648071,
406
+ "grad_norm": 0.40711745619773865,
407
+ "learning_rate": 0.00015362919950228122,
408
+ "loss": 1.9515,
409
+ "step": 570
410
+ },
411
+ {
412
+ "epoch": 0.7190454052378739,
413
+ "grad_norm": 0.4855351150035858,
414
+ "learning_rate": 0.00015279966818747409,
415
+ "loss": 1.928,
416
+ "step": 580
417
+ },
418
+ {
419
+ "epoch": 0.7314427398109407,
420
+ "grad_norm": 0.3159841299057007,
421
+ "learning_rate": 0.00015197013687266695,
422
+ "loss": 1.8648,
423
+ "step": 590
424
+ },
425
+ {
426
+ "epoch": 0.7438400743840075,
427
+ "grad_norm": 0.34454017877578735,
428
+ "learning_rate": 0.00015114060555785982,
429
+ "loss": 1.7614,
430
+ "step": 600
431
+ },
432
+ {
433
+ "epoch": 0.7562374089570743,
434
+ "grad_norm": 0.42112237215042114,
435
+ "learning_rate": 0.00015031107424305268,
436
+ "loss": 1.7943,
437
+ "step": 610
438
+ },
439
+ {
440
+ "epoch": 0.768634743530141,
441
+ "grad_norm": 0.4868924617767334,
442
+ "learning_rate": 0.00014948154292824555,
443
+ "loss": 1.8236,
444
+ "step": 620
445
+ },
446
+ {
447
+ "epoch": 0.7810320781032078,
448
+ "grad_norm": 0.27235159277915955,
449
+ "learning_rate": 0.00014865201161343841,
450
+ "loss": 1.8792,
451
+ "step": 630
452
+ },
453
+ {
454
+ "epoch": 0.7934294126762746,
455
+ "grad_norm": 0.36492735147476196,
456
+ "learning_rate": 0.00014782248029863128,
457
+ "loss": 1.8139,
458
+ "step": 640
459
+ },
460
+ {
461
+ "epoch": 0.8058267472493414,
462
+ "grad_norm": 0.3278910517692566,
463
+ "learning_rate": 0.00014699294898382414,
464
+ "loss": 1.9303,
465
+ "step": 650
466
+ },
467
+ {
468
+ "epoch": 0.8182240818224081,
469
+ "grad_norm": 0.4410141110420227,
470
+ "learning_rate": 0.000146163417669017,
471
+ "loss": 1.7458,
472
+ "step": 660
473
+ },
474
+ {
475
+ "epoch": 0.8306214163954749,
476
+ "grad_norm": 0.44660821557044983,
477
+ "learning_rate": 0.00014533388635420988,
478
+ "loss": 1.6484,
479
+ "step": 670
480
+ },
481
+ {
482
+ "epoch": 0.8430187509685417,
483
+ "grad_norm": 0.36396560072898865,
484
+ "learning_rate": 0.00014450435503940274,
485
+ "loss": 1.7457,
486
+ "step": 680
487
+ },
488
+ {
489
+ "epoch": 0.8554160855416085,
490
+ "grad_norm": 0.4536712169647217,
491
+ "learning_rate": 0.0001436748237245956,
492
+ "loss": 1.7763,
493
+ "step": 690
494
+ },
495
+ {
496
+ "epoch": 0.8678134201146753,
497
+ "grad_norm": 0.45438772439956665,
498
+ "learning_rate": 0.00014284529240978847,
499
+ "loss": 1.8736,
500
+ "step": 700
501
+ },
502
+ {
503
+ "epoch": 0.8802107546877421,
504
+ "grad_norm": 0.331462562084198,
505
+ "learning_rate": 0.00014201576109498134,
506
+ "loss": 1.9909,
507
+ "step": 710
508
+ },
509
+ {
510
+ "epoch": 0.8926080892608089,
511
+ "grad_norm": 0.29686763882637024,
512
+ "learning_rate": 0.00014118622978017423,
513
+ "loss": 1.9242,
514
+ "step": 720
515
+ },
516
+ {
517
+ "epoch": 0.9050054238338757,
518
+ "grad_norm": 0.4546560049057007,
519
+ "learning_rate": 0.00014035669846536707,
520
+ "loss": 1.8032,
521
+ "step": 730
522
+ },
523
+ {
524
+ "epoch": 0.9174027584069425,
525
+ "grad_norm": 0.3135245442390442,
526
+ "learning_rate": 0.00013952716715055993,
527
+ "loss": 1.835,
528
+ "step": 740
529
+ },
530
+ {
531
+ "epoch": 0.9298000929800093,
532
+ "grad_norm": 0.6448049545288086,
533
+ "learning_rate": 0.00013869763583575283,
534
+ "loss": 1.9127,
535
+ "step": 750
536
+ },
537
+ {
538
+ "epoch": 0.9421974275530761,
539
+ "grad_norm": 0.39725756645202637,
540
+ "learning_rate": 0.00013786810452094567,
541
+ "loss": 1.8041,
542
+ "step": 760
543
+ },
544
+ {
545
+ "epoch": 0.9545947621261429,
546
+ "grad_norm": 0.3762451708316803,
547
+ "learning_rate": 0.00013703857320613853,
548
+ "loss": 1.8003,
549
+ "step": 770
550
+ },
551
+ {
552
+ "epoch": 0.9669920966992097,
553
+ "grad_norm": 0.35813263058662415,
554
+ "learning_rate": 0.00013620904189133142,
555
+ "loss": 1.8474,
556
+ "step": 780
557
+ },
558
+ {
559
+ "epoch": 0.9793894312722765,
560
+ "grad_norm": 0.29999616742134094,
561
+ "learning_rate": 0.00013537951057652426,
562
+ "loss": 1.8705,
563
+ "step": 790
564
+ },
565
+ {
566
+ "epoch": 0.9917867658453432,
567
+ "grad_norm": 0.3202720880508423,
568
+ "learning_rate": 0.00013454997926171713,
569
+ "loss": 1.7752,
570
+ "step": 800
571
+ },
572
+ {
573
+ "epoch": 1.0,
574
+ "eval_loss": 1.8193774223327637,
575
+ "eval_runtime": 78.8064,
576
+ "eval_samples_per_second": 9.098,
577
+ "eval_steps_per_second": 1.142,
578
+ "step": 807
579
+ },
580
+ {
581
+ "epoch": 1.00371920037192,
582
+ "grad_norm": 0.30241405963897705,
583
+ "learning_rate": 0.00013372044794691002,
584
+ "loss": 1.8318,
585
+ "step": 810
586
+ },
587
+ {
588
+ "epoch": 1.0161165349449868,
589
+ "grad_norm": 0.3700416386127472,
590
+ "learning_rate": 0.00013289091663210286,
591
+ "loss": 1.8938,
592
+ "step": 820
593
+ },
594
+ {
595
+ "epoch": 1.0285138695180536,
596
+ "grad_norm": 0.4154430329799652,
597
+ "learning_rate": 0.00013206138531729572,
598
+ "loss": 1.8733,
599
+ "step": 830
600
+ },
601
+ {
602
+ "epoch": 1.0409112040911204,
603
+ "grad_norm": 0.38313189148902893,
604
+ "learning_rate": 0.00013123185400248862,
605
+ "loss": 1.8571,
606
+ "step": 840
607
+ },
608
+ {
609
+ "epoch": 1.0533085386641872,
610
+ "grad_norm": 0.23230139911174774,
611
+ "learning_rate": 0.00013040232268768146,
612
+ "loss": 1.798,
613
+ "step": 850
614
+ },
615
+ {
616
+ "epoch": 1.065705873237254,
617
+ "grad_norm": 0.3701108992099762,
618
+ "learning_rate": 0.00012957279137287432,
619
+ "loss": 1.8533,
620
+ "step": 860
621
+ },
622
+ {
623
+ "epoch": 1.0781032078103208,
624
+ "grad_norm": 0.29064834117889404,
625
+ "learning_rate": 0.00012874326005806721,
626
+ "loss": 1.7453,
627
+ "step": 870
628
+ },
629
+ {
630
+ "epoch": 1.0905005423833876,
631
+ "grad_norm": 0.3150763213634491,
632
+ "learning_rate": 0.00012791372874326005,
633
+ "loss": 1.8977,
634
+ "step": 880
635
+ },
636
+ {
637
+ "epoch": 1.1028978769564544,
638
+ "grad_norm": 0.428843230009079,
639
+ "learning_rate": 0.00012708419742845292,
640
+ "loss": 1.8688,
641
+ "step": 890
642
+ },
643
+ {
644
+ "epoch": 1.1152952115295212,
645
+ "grad_norm": 0.2608051896095276,
646
+ "learning_rate": 0.0001262546661136458,
647
+ "loss": 1.7242,
648
+ "step": 900
649
+ },
650
+ {
651
+ "epoch": 1.127692546102588,
652
+ "grad_norm": 0.3821583688259125,
653
+ "learning_rate": 0.00012542513479883865,
654
+ "loss": 1.9037,
655
+ "step": 910
656
+ },
657
+ {
658
+ "epoch": 1.1400898806756548,
659
+ "grad_norm": 0.28013911843299866,
660
+ "learning_rate": 0.00012459560348403151,
661
+ "loss": 1.8215,
662
+ "step": 920
663
+ },
664
+ {
665
+ "epoch": 1.1524872152487216,
666
+ "grad_norm": 0.30506208539009094,
667
+ "learning_rate": 0.0001237660721692244,
668
+ "loss": 1.8197,
669
+ "step": 930
670
+ },
671
+ {
672
+ "epoch": 1.1648845498217884,
673
+ "grad_norm": 0.29327717423439026,
674
+ "learning_rate": 0.00012293654085441727,
675
+ "loss": 1.7784,
676
+ "step": 940
677
+ },
678
+ {
679
+ "epoch": 1.1772818843948551,
680
+ "grad_norm": 0.23550163209438324,
681
+ "learning_rate": 0.0001221070095396101,
682
+ "loss": 1.9318,
683
+ "step": 950
684
+ },
685
+ {
686
+ "epoch": 1.189679218967922,
687
+ "grad_norm": 0.21349768340587616,
688
+ "learning_rate": 0.000121277478224803,
689
+ "loss": 1.8084,
690
+ "step": 960
691
+ },
692
+ {
693
+ "epoch": 1.2020765535409887,
694
+ "grad_norm": 0.34790855646133423,
695
+ "learning_rate": 0.00012044794690999586,
696
+ "loss": 1.7587,
697
+ "step": 970
698
+ },
699
+ {
700
+ "epoch": 1.2144738881140555,
701
+ "grad_norm": 0.2519979774951935,
702
+ "learning_rate": 0.00011961841559518872,
703
+ "loss": 1.8055,
704
+ "step": 980
705
+ },
706
+ {
707
+ "epoch": 1.2268712226871223,
708
+ "grad_norm": 0.3781174123287201,
709
+ "learning_rate": 0.0001187888842803816,
710
+ "loss": 1.8436,
711
+ "step": 990
712
+ },
713
+ {
714
+ "epoch": 1.2392685572601891,
715
+ "grad_norm": 0.26533016562461853,
716
+ "learning_rate": 0.00011795935296557445,
717
+ "loss": 1.6512,
718
+ "step": 1000
719
+ },
720
+ {
721
+ "epoch": 1.2516658918332557,
722
+ "grad_norm": 0.2862655818462372,
723
+ "learning_rate": 0.00011712982165076732,
724
+ "loss": 2.0418,
725
+ "step": 1010
726
+ },
727
+ {
728
+ "epoch": 1.2640632264063227,
729
+ "grad_norm": 0.27094656229019165,
730
+ "learning_rate": 0.0001163002903359602,
731
+ "loss": 1.7278,
732
+ "step": 1020
733
+ },
734
+ {
735
+ "epoch": 1.2764605609793893,
736
+ "grad_norm": 0.29184144735336304,
737
+ "learning_rate": 0.00011547075902115305,
738
+ "loss": 1.6938,
739
+ "step": 1030
740
+ },
741
+ {
742
+ "epoch": 1.2888578955524563,
743
+ "grad_norm": 0.30606576800346375,
744
+ "learning_rate": 0.00011464122770634591,
745
+ "loss": 1.6627,
746
+ "step": 1040
747
+ },
748
+ {
749
+ "epoch": 1.301255230125523,
750
+ "grad_norm": 0.23827411234378815,
751
+ "learning_rate": 0.0001138116963915388,
752
+ "loss": 1.7671,
753
+ "step": 1050
754
+ },
755
+ {
756
+ "epoch": 1.31365256469859,
757
+ "grad_norm": 0.3038440942764282,
758
+ "learning_rate": 0.00011298216507673165,
759
+ "loss": 1.9029,
760
+ "step": 1060
761
+ },
762
+ {
763
+ "epoch": 1.3260498992716565,
764
+ "grad_norm": 0.22325201332569122,
765
+ "learning_rate": 0.00011215263376192451,
766
+ "loss": 1.7413,
767
+ "step": 1070
768
+ },
769
+ {
770
+ "epoch": 1.3384472338447233,
771
+ "grad_norm": 0.30867546796798706,
772
+ "learning_rate": 0.00011132310244711739,
773
+ "loss": 1.8252,
774
+ "step": 1080
775
+ },
776
+ {
777
+ "epoch": 1.35084456841779,
778
+ "grad_norm": 0.3236595392227173,
779
+ "learning_rate": 0.00011049357113231024,
780
+ "loss": 1.8856,
781
+ "step": 1090
782
+ },
783
+ {
784
+ "epoch": 1.363241902990857,
785
+ "grad_norm": 0.32089242339134216,
786
+ "learning_rate": 0.00010966403981750311,
787
+ "loss": 1.8437,
788
+ "step": 1100
789
+ },
790
+ {
791
+ "epoch": 1.3756392375639237,
792
+ "grad_norm": 0.28864434361457825,
793
+ "learning_rate": 0.00010883450850269599,
794
+ "loss": 1.7875,
795
+ "step": 1110
796
+ },
797
+ {
798
+ "epoch": 1.3880365721369905,
799
+ "grad_norm": 0.3817216455936432,
800
+ "learning_rate": 0.00010800497718788884,
801
+ "loss": 1.8455,
802
+ "step": 1120
803
+ },
804
+ {
805
+ "epoch": 1.4004339067100573,
806
+ "grad_norm": 0.22240202128887177,
807
+ "learning_rate": 0.0001071754458730817,
808
+ "loss": 1.7744,
809
+ "step": 1130
810
+ },
811
+ {
812
+ "epoch": 1.412831241283124,
813
+ "grad_norm": 0.28633448481559753,
814
+ "learning_rate": 0.00010634591455827458,
815
+ "loss": 1.765,
816
+ "step": 1140
817
+ },
818
+ {
819
+ "epoch": 1.425228575856191,
820
+ "grad_norm": 0.19945985078811646,
821
+ "learning_rate": 0.00010551638324346745,
822
+ "loss": 1.9739,
823
+ "step": 1150
824
+ },
825
+ {
826
+ "epoch": 1.4376259104292577,
827
+ "grad_norm": 0.2815242111682892,
828
+ "learning_rate": 0.0001046868519286603,
829
+ "loss": 1.8369,
830
+ "step": 1160
831
+ },
832
+ {
833
+ "epoch": 1.4500232450023245,
834
+ "grad_norm": 0.20174123346805573,
835
+ "learning_rate": 0.00010385732061385318,
836
+ "loss": 1.9351,
837
+ "step": 1170
838
+ },
839
+ {
840
+ "epoch": 1.4624205795753913,
841
+ "grad_norm": 0.27899351716041565,
842
+ "learning_rate": 0.00010302778929904605,
843
+ "loss": 1.9544,
844
+ "step": 1180
845
+ },
846
+ {
847
+ "epoch": 1.474817914148458,
848
+ "grad_norm": 0.22028954327106476,
849
+ "learning_rate": 0.00010219825798423892,
850
+ "loss": 1.852,
851
+ "step": 1190
852
+ },
853
+ {
854
+ "epoch": 1.4872152487215249,
855
+ "grad_norm": 0.2520454525947571,
856
+ "learning_rate": 0.00010136872666943178,
857
+ "loss": 1.6232,
858
+ "step": 1200
859
+ },
860
+ {
861
+ "epoch": 1.4996125832945917,
862
+ "grad_norm": 0.34104567766189575,
863
+ "learning_rate": 0.00010053919535462464,
864
+ "loss": 1.9285,
865
+ "step": 1210
866
+ },
867
+ {
868
+ "epoch": 1.5120099178676585,
869
+ "grad_norm": 0.3216676414012909,
870
+ "learning_rate": 9.970966403981751e-05,
871
+ "loss": 1.6644,
872
+ "step": 1220
873
+ },
874
+ {
875
+ "epoch": 1.5244072524407253,
876
+ "grad_norm": 0.22132286429405212,
877
+ "learning_rate": 9.888013272501037e-05,
878
+ "loss": 1.796,
879
+ "step": 1230
880
+ },
881
+ {
882
+ "epoch": 1.536804587013792,
883
+ "grad_norm": 0.35185569524765015,
884
+ "learning_rate": 9.805060141020324e-05,
885
+ "loss": 1.7821,
886
+ "step": 1240
887
+ },
888
+ {
889
+ "epoch": 1.5492019215868589,
890
+ "grad_norm": 0.16420242190361023,
891
+ "learning_rate": 9.72210700953961e-05,
892
+ "loss": 1.8054,
893
+ "step": 1250
894
+ },
895
+ {
896
+ "epoch": 1.5615992561599255,
897
+ "grad_norm": 0.32100507616996765,
898
+ "learning_rate": 9.639153878058897e-05,
899
+ "loss": 1.8707,
900
+ "step": 1260
901
+ },
902
+ {
903
+ "epoch": 1.5739965907329925,
904
+ "grad_norm": 0.2014254331588745,
905
+ "learning_rate": 9.556200746578184e-05,
906
+ "loss": 1.8087,
907
+ "step": 1270
908
+ },
909
+ {
910
+ "epoch": 1.586393925306059,
911
+ "grad_norm": 0.23392640054225922,
912
+ "learning_rate": 9.47324761509747e-05,
913
+ "loss": 1.7327,
914
+ "step": 1280
915
+ },
916
+ {
917
+ "epoch": 1.598791259879126,
918
+ "grad_norm": 0.19987891614437103,
919
+ "learning_rate": 9.390294483616757e-05,
920
+ "loss": 1.8399,
921
+ "step": 1290
922
+ },
923
+ {
924
+ "epoch": 1.6111885944521926,
925
+ "grad_norm": 0.13928192853927612,
926
+ "learning_rate": 9.307341352136043e-05,
927
+ "loss": 1.9271,
928
+ "step": 1300
929
+ },
930
+ {
931
+ "epoch": 1.6235859290252597,
932
+ "grad_norm": 0.3423463702201843,
933
+ "learning_rate": 9.22438822065533e-05,
934
+ "loss": 1.81,
935
+ "step": 1310
936
+ },
937
+ {
938
+ "epoch": 1.6359832635983262,
939
+ "grad_norm": 0.31117212772369385,
940
+ "learning_rate": 9.141435089174618e-05,
941
+ "loss": 1.8919,
942
+ "step": 1320
943
+ },
944
+ {
945
+ "epoch": 1.6483805981713933,
946
+ "grad_norm": 0.1769014447927475,
947
+ "learning_rate": 9.058481957693903e-05,
948
+ "loss": 1.692,
949
+ "step": 1330
950
+ },
951
+ {
952
+ "epoch": 1.6607779327444598,
953
+ "grad_norm": 0.1725306212902069,
954
+ "learning_rate": 8.97552882621319e-05,
955
+ "loss": 1.7833,
956
+ "step": 1340
957
+ },
958
+ {
959
+ "epoch": 1.6731752673175269,
960
+ "grad_norm": 0.15333665907382965,
961
+ "learning_rate": 8.892575694732477e-05,
962
+ "loss": 1.7129,
963
+ "step": 1350
964
+ },
965
+ {
966
+ "epoch": 1.6855726018905934,
967
+ "grad_norm": 0.225737065076828,
968
+ "learning_rate": 8.809622563251764e-05,
969
+ "loss": 2.0034,
970
+ "step": 1360
971
+ },
972
+ {
973
+ "epoch": 1.6979699364636605,
974
+ "grad_norm": 0.27135154604911804,
975
+ "learning_rate": 8.726669431771049e-05,
976
+ "loss": 1.7641,
977
+ "step": 1370
978
+ },
979
+ {
980
+ "epoch": 1.710367271036727,
981
+ "grad_norm": 0.19318152964115143,
982
+ "learning_rate": 8.643716300290337e-05,
983
+ "loss": 1.9714,
984
+ "step": 1380
985
+ },
986
+ {
987
+ "epoch": 1.722764605609794,
988
+ "grad_norm": 0.09128980338573456,
989
+ "learning_rate": 8.560763168809624e-05,
990
+ "loss": 1.6477,
991
+ "step": 1390
992
+ },
993
+ {
994
+ "epoch": 1.7351619401828606,
995
+ "grad_norm": 0.20176462829113007,
996
+ "learning_rate": 8.477810037328909e-05,
997
+ "loss": 1.8707,
998
+ "step": 1400
999
+ },
1000
+ {
1001
+ "epoch": 1.7475592747559274,
1002
+ "grad_norm": 0.1746564656496048,
1003
+ "learning_rate": 8.394856905848197e-05,
1004
+ "loss": 1.8979,
1005
+ "step": 1410
1006
+ },
1007
+ {
1008
+ "epoch": 1.7599566093289942,
1009
+ "grad_norm": 0.8505027294158936,
1010
+ "learning_rate": 8.311903774367483e-05,
1011
+ "loss": 1.71,
1012
+ "step": 1420
1013
+ },
1014
+ {
1015
+ "epoch": 1.772353943902061,
1016
+ "grad_norm": 0.18071773648262024,
1017
+ "learning_rate": 8.22895064288677e-05,
1018
+ "loss": 1.9505,
1019
+ "step": 1430
1020
+ },
1021
+ {
1022
+ "epoch": 1.7847512784751278,
1023
+ "grad_norm": 0.29732322692871094,
1024
+ "learning_rate": 8.145997511406056e-05,
1025
+ "loss": 1.8041,
1026
+ "step": 1440
1027
+ },
1028
+ {
1029
+ "epoch": 1.7971486130481946,
1030
+ "grad_norm": 0.21795178949832916,
1031
+ "learning_rate": 8.063044379925343e-05,
1032
+ "loss": 1.7381,
1033
+ "step": 1450
1034
+ },
1035
+ {
1036
+ "epoch": 1.8095459476212614,
1037
+ "grad_norm": 0.23459061980247498,
1038
+ "learning_rate": 7.98009124844463e-05,
1039
+ "loss": 1.6636,
1040
+ "step": 1460
1041
+ },
1042
+ {
1043
+ "epoch": 1.8219432821943282,
1044
+ "grad_norm": 0.20895689725875854,
1045
+ "learning_rate": 7.897138116963916e-05,
1046
+ "loss": 1.8257,
1047
+ "step": 1470
1048
+ },
1049
+ {
1050
+ "epoch": 1.834340616767395,
1051
+ "grad_norm": 0.18129809200763702,
1052
+ "learning_rate": 7.814184985483203e-05,
1053
+ "loss": 1.9791,
1054
+ "step": 1480
1055
+ },
1056
+ {
1057
+ "epoch": 1.8467379513404618,
1058
+ "grad_norm": 0.1758158951997757,
1059
+ "learning_rate": 7.731231854002489e-05,
1060
+ "loss": 1.7657,
1061
+ "step": 1490
1062
+ },
1063
+ {
1064
+ "epoch": 1.8591352859135286,
1065
+ "grad_norm": 0.14756101369857788,
1066
+ "learning_rate": 7.648278722521776e-05,
1067
+ "loss": 1.7483,
1068
+ "step": 1500
1069
+ },
1070
+ {
1071
+ "epoch": 1.8715326204865954,
1072
+ "grad_norm": 0.14168867468833923,
1073
+ "learning_rate": 7.565325591041062e-05,
1074
+ "loss": 1.8844,
1075
+ "step": 1510
1076
+ },
1077
+ {
1078
+ "epoch": 1.8839299550596622,
1079
+ "grad_norm": 0.1295221447944641,
1080
+ "learning_rate": 7.482372459560349e-05,
1081
+ "loss": 1.8121,
1082
+ "step": 1520
1083
+ },
1084
+ {
1085
+ "epoch": 1.896327289632729,
1086
+ "grad_norm": 0.19356060028076172,
1087
+ "learning_rate": 7.399419328079635e-05,
1088
+ "loss": 1.5851,
1089
+ "step": 1530
1090
+ },
1091
+ {
1092
+ "epoch": 1.9087246242057958,
1093
+ "grad_norm": 0.20035363733768463,
1094
+ "learning_rate": 7.316466196598922e-05,
1095
+ "loss": 1.7063,
1096
+ "step": 1540
1097
+ },
1098
+ {
1099
+ "epoch": 1.9211219587788624,
1100
+ "grad_norm": 0.12139635533094406,
1101
+ "learning_rate": 7.233513065118208e-05,
1102
+ "loss": 1.7962,
1103
+ "step": 1550
1104
+ },
1105
+ {
1106
+ "epoch": 1.9335192933519294,
1107
+ "grad_norm": 0.10587511211633682,
1108
+ "learning_rate": 7.150559933637495e-05,
1109
+ "loss": 1.7143,
1110
+ "step": 1560
1111
+ },
1112
+ {
1113
+ "epoch": 1.945916627924996,
1114
+ "grad_norm": 0.18997065722942352,
1115
+ "learning_rate": 7.067606802156782e-05,
1116
+ "loss": 1.7636,
1117
+ "step": 1570
1118
+ },
1119
+ {
1120
+ "epoch": 1.958313962498063,
1121
+ "grad_norm": 0.15760765969753265,
1122
+ "learning_rate": 6.984653670676068e-05,
1123
+ "loss": 1.7672,
1124
+ "step": 1580
1125
+ },
1126
+ {
1127
+ "epoch": 1.9707112970711296,
1128
+ "grad_norm": 0.1490117609500885,
1129
+ "learning_rate": 6.901700539195355e-05,
1130
+ "loss": 1.6405,
1131
+ "step": 1590
1132
+ },
1133
+ {
1134
+ "epoch": 1.9831086316441966,
1135
+ "grad_norm": 0.09482846409082413,
1136
+ "learning_rate": 6.818747407714641e-05,
1137
+ "loss": 1.8141,
1138
+ "step": 1600
1139
+ },
1140
+ {
1141
+ "epoch": 1.9955059662172632,
1142
+ "grad_norm": 0.182535782456398,
1143
+ "learning_rate": 6.735794276233928e-05,
1144
+ "loss": 1.7745,
1145
+ "step": 1610
1146
+ },
1147
+ {
1148
+ "epoch": 2.0,
1149
+ "eval_loss": 1.8065738677978516,
1150
+ "eval_runtime": 78.829,
1151
+ "eval_samples_per_second": 9.096,
1152
+ "eval_steps_per_second": 1.142,
1153
+ "step": 1614
1154
+ }
1155
+ ],
1156
+ "logging_steps": 10,
1157
+ "max_steps": 2421,
1158
+ "num_input_tokens_seen": 0,
1159
+ "num_train_epochs": 3,
1160
+ "save_steps": 500,
1161
+ "stateful_callbacks": {
1162
+ "TrainerControl": {
1163
+ "args": {
1164
+ "should_epoch_stop": false,
1165
+ "should_evaluate": false,
1166
+ "should_log": false,
1167
+ "should_save": true,
1168
+ "should_training_stop": false
1169
+ },
1170
+ "attributes": {}
1171
+ }
1172
+ },
1173
+ "total_flos": 5.6462587058139955e+17,
1174
+ "train_batch_size": 1,
1175
+ "trial_name": null,
1176
+ "trial_params": null
1177
+ }
checkpoint-1614/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15ed81f35b71a674f101f42b5d39caf5a7c94955958bbf088fb20abcabaa812f
3
+ size 5905
checkpoint-2421/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Mistral-7B-Instruct-v0.3
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:mistralai/Mistral-7B-Instruct-v0.3
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.17.1
checkpoint-2421/adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "v_proj",
29
+ "q_proj"
30
+ ],
31
+ "target_parameters": null,
32
+ "task_type": "CAUSAL_LM",
33
+ "trainable_token_indices": null,
34
+ "use_dora": false,
35
+ "use_qalora": false,
36
+ "use_rslora": false
37
+ }
checkpoint-2421/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02e3b8172a56ce4c92a61a6ee49e572c2e100bb1bced833c487cdc427d8cce20
3
+ size 27280152
checkpoint-2421/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c0f2a03fdaec93cd8b1bbbb4ae4ae2138788f5a4dfcc94be4cd3edf2df96724
3
+ size 54636235
checkpoint-2421/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b821cc82cb3337448207066c689df8aec7433b98d3f9262e3045f36daf7c17f
3
+ size 14645
checkpoint-2421/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6b2ab93d9c86cf9164261edee3306a3c8daeb04d79e7e4c3654c207511443a7
3
+ size 1383
checkpoint-2421/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5cfafc6475c9407a62a66770a84fe63099f25c87eacd07edc3d21fce1ea0b35a
3
+ size 1465
checkpoint-2421/trainer_state.json ADDED
@@ -0,0 +1,1752 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 3.0,
6
+ "eval_steps": 500,
7
+ "global_step": 2421,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.012397334573066791,
14
+ "grad_norm": 0.6915941834449768,
15
+ "learning_rate": 0.00018,
16
+ "loss": 2.3006,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.024794669146133583,
21
+ "grad_norm": 0.6273291707038879,
22
+ "learning_rate": 0.0001992534218166736,
23
+ "loss": 2.1317,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.037192003719200374,
28
+ "grad_norm": 0.45228877663612366,
29
+ "learning_rate": 0.00019842389050186645,
30
+ "loss": 1.9808,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.049589338292267165,
35
+ "grad_norm": 0.3886250853538513,
36
+ "learning_rate": 0.00019759435918705932,
37
+ "loss": 1.8968,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.06198667286533395,
42
+ "grad_norm": 0.3263673484325409,
43
+ "learning_rate": 0.00019676482787225219,
44
+ "loss": 1.9159,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.07438400743840075,
49
+ "grad_norm": 0.3357856273651123,
50
+ "learning_rate": 0.00019593529655744505,
51
+ "loss": 1.9052,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.08678134201146753,
56
+ "grad_norm": 0.3885687291622162,
57
+ "learning_rate": 0.00019510576524263792,
58
+ "loss": 1.8788,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.09917867658453433,
63
+ "grad_norm": 0.33165615797042847,
64
+ "learning_rate": 0.00019427623392783078,
65
+ "loss": 1.876,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.11157601115760112,
70
+ "grad_norm": 0.3510221242904663,
71
+ "learning_rate": 0.00019344670261302365,
72
+ "loss": 1.7992,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.1239733457306679,
77
+ "grad_norm": 0.35817059874534607,
78
+ "learning_rate": 0.0001926171712982165,
79
+ "loss": 1.7386,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.1363706803037347,
84
+ "grad_norm": 0.3222461938858032,
85
+ "learning_rate": 0.00019178763998340938,
86
+ "loss": 1.8766,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.1487680148768015,
91
+ "grad_norm": 0.302379310131073,
92
+ "learning_rate": 0.00019095810866860224,
93
+ "loss": 1.833,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.16116534944986827,
98
+ "grad_norm": 0.3621092736721039,
99
+ "learning_rate": 0.00019012857735379514,
100
+ "loss": 1.7433,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.17356268402293507,
105
+ "grad_norm": 0.35103851556777954,
106
+ "learning_rate": 0.00018929904603898798,
107
+ "loss": 1.8552,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.18596001859600186,
112
+ "grad_norm": 0.3761669993400574,
113
+ "learning_rate": 0.00018846951472418084,
114
+ "loss": 1.8072,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.19835735316906866,
119
+ "grad_norm": 0.2791743576526642,
120
+ "learning_rate": 0.00018763998340937373,
121
+ "loss": 1.882,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.21075468774213543,
126
+ "grad_norm": 0.28887104988098145,
127
+ "learning_rate": 0.00018681045209456657,
128
+ "loss": 1.945,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.22315202231520223,
133
+ "grad_norm": 0.3621903955936432,
134
+ "learning_rate": 0.00018598092077975944,
135
+ "loss": 1.7506,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.23554935688826903,
140
+ "grad_norm": 0.36899787187576294,
141
+ "learning_rate": 0.00018515138946495233,
142
+ "loss": 1.8959,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.2479466914613358,
147
+ "grad_norm": 0.3354776203632355,
148
+ "learning_rate": 0.00018432185815014517,
149
+ "loss": 1.9282,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.2603440260344026,
154
+ "grad_norm": 0.3030059337615967,
155
+ "learning_rate": 0.00018349232683533803,
156
+ "loss": 1.9876,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.2727413606074694,
161
+ "grad_norm": 0.380134254693985,
162
+ "learning_rate": 0.00018266279552053093,
163
+ "loss": 1.795,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.2851386951805362,
168
+ "grad_norm": 0.3484257161617279,
169
+ "learning_rate": 0.00018183326420572376,
170
+ "loss": 1.9131,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.297536029753603,
175
+ "grad_norm": 0.3204387426376343,
176
+ "learning_rate": 0.00018100373289091663,
177
+ "loss": 1.9829,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.3099333643266698,
182
+ "grad_norm": 0.3759450912475586,
183
+ "learning_rate": 0.00018017420157610952,
184
+ "loss": 1.9002,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.32233069889973653,
189
+ "grad_norm": 0.3721698820590973,
190
+ "learning_rate": 0.00017934467026130236,
191
+ "loss": 1.8657,
192
+ "step": 260
193
+ },
194
+ {
195
+ "epoch": 0.33472803347280333,
196
+ "grad_norm": 0.35085615515708923,
197
+ "learning_rate": 0.00017851513894649523,
198
+ "loss": 1.8623,
199
+ "step": 270
200
+ },
201
+ {
202
+ "epoch": 0.34712536804587013,
203
+ "grad_norm": 0.35750696063041687,
204
+ "learning_rate": 0.00017768560763168812,
205
+ "loss": 1.8743,
206
+ "step": 280
207
+ },
208
+ {
209
+ "epoch": 0.35952270261893693,
210
+ "grad_norm": 0.3072109520435333,
211
+ "learning_rate": 0.00017685607631688096,
212
+ "loss": 1.9626,
213
+ "step": 290
214
+ },
215
+ {
216
+ "epoch": 0.3719200371920037,
217
+ "grad_norm": 0.40647512674331665,
218
+ "learning_rate": 0.00017602654500207382,
219
+ "loss": 1.8034,
220
+ "step": 300
221
+ },
222
+ {
223
+ "epoch": 0.3843173717650705,
224
+ "grad_norm": 0.3000311851501465,
225
+ "learning_rate": 0.00017519701368726672,
226
+ "loss": 1.8995,
227
+ "step": 310
228
+ },
229
+ {
230
+ "epoch": 0.3967147063381373,
231
+ "grad_norm": 0.36904624104499817,
232
+ "learning_rate": 0.00017436748237245955,
233
+ "loss": 1.82,
234
+ "step": 320
235
+ },
236
+ {
237
+ "epoch": 0.40911204091120407,
238
+ "grad_norm": 0.337799072265625,
239
+ "learning_rate": 0.00017353795105765242,
240
+ "loss": 1.8706,
241
+ "step": 330
242
+ },
243
+ {
244
+ "epoch": 0.42150937548427087,
245
+ "grad_norm": 0.4223800003528595,
246
+ "learning_rate": 0.0001727084197428453,
247
+ "loss": 1.8584,
248
+ "step": 340
249
+ },
250
+ {
251
+ "epoch": 0.43390671005733766,
252
+ "grad_norm": 0.34497585892677307,
253
+ "learning_rate": 0.00017187888842803818,
254
+ "loss": 1.8417,
255
+ "step": 350
256
+ },
257
+ {
258
+ "epoch": 0.44630404463040446,
259
+ "grad_norm": 0.34032031893730164,
260
+ "learning_rate": 0.00017104935711323102,
261
+ "loss": 1.7917,
262
+ "step": 360
263
+ },
264
+ {
265
+ "epoch": 0.45870137920347126,
266
+ "grad_norm": 0.4158859848976135,
267
+ "learning_rate": 0.0001702198257984239,
268
+ "loss": 1.8749,
269
+ "step": 370
270
+ },
271
+ {
272
+ "epoch": 0.47109871377653806,
273
+ "grad_norm": 0.36545196175575256,
274
+ "learning_rate": 0.00016939029448361678,
275
+ "loss": 1.8337,
276
+ "step": 380
277
+ },
278
+ {
279
+ "epoch": 0.48349604834960486,
280
+ "grad_norm": 0.32123473286628723,
281
+ "learning_rate": 0.00016856076316880961,
282
+ "loss": 1.9321,
283
+ "step": 390
284
+ },
285
+ {
286
+ "epoch": 0.4958933829226716,
287
+ "grad_norm": 0.45476439595222473,
288
+ "learning_rate": 0.0001677312318540025,
289
+ "loss": 1.8185,
290
+ "step": 400
291
+ },
292
+ {
293
+ "epoch": 0.5082907174957384,
294
+ "grad_norm": 0.3410905599594116,
295
+ "learning_rate": 0.00016690170053919537,
296
+ "loss": 1.9082,
297
+ "step": 410
298
+ },
299
+ {
300
+ "epoch": 0.5206880520688052,
301
+ "grad_norm": 0.3436656892299652,
302
+ "learning_rate": 0.0001660721692243882,
303
+ "loss": 1.7821,
304
+ "step": 420
305
+ },
306
+ {
307
+ "epoch": 0.533085386641872,
308
+ "grad_norm": 0.34343594312667847,
309
+ "learning_rate": 0.0001652426379095811,
310
+ "loss": 1.8184,
311
+ "step": 430
312
+ },
313
+ {
314
+ "epoch": 0.5454827212149388,
315
+ "grad_norm": 0.4309318959712982,
316
+ "learning_rate": 0.00016441310659477397,
317
+ "loss": 1.9124,
318
+ "step": 440
319
+ },
320
+ {
321
+ "epoch": 0.5578800557880056,
322
+ "grad_norm": 0.4032953679561615,
323
+ "learning_rate": 0.0001635835752799668,
324
+ "loss": 1.7962,
325
+ "step": 450
326
+ },
327
+ {
328
+ "epoch": 0.5702773903610724,
329
+ "grad_norm": 0.3726664185523987,
330
+ "learning_rate": 0.0001627540439651597,
331
+ "loss": 1.8158,
332
+ "step": 460
333
+ },
334
+ {
335
+ "epoch": 0.5826747249341392,
336
+ "grad_norm": 0.36948224902153015,
337
+ "learning_rate": 0.00016192451265035257,
338
+ "loss": 1.7539,
339
+ "step": 470
340
+ },
341
+ {
342
+ "epoch": 0.595072059507206,
343
+ "grad_norm": 0.33594000339508057,
344
+ "learning_rate": 0.0001610949813355454,
345
+ "loss": 1.9287,
346
+ "step": 480
347
+ },
348
+ {
349
+ "epoch": 0.6074693940802728,
350
+ "grad_norm": 0.3209264576435089,
351
+ "learning_rate": 0.0001602654500207383,
352
+ "loss": 1.8375,
353
+ "step": 490
354
+ },
355
+ {
356
+ "epoch": 0.6198667286533396,
357
+ "grad_norm": 0.38256484270095825,
358
+ "learning_rate": 0.00015943591870593116,
359
+ "loss": 1.817,
360
+ "step": 500
361
+ },
362
+ {
363
+ "epoch": 0.6322640632264063,
364
+ "grad_norm": 0.38966864347457886,
365
+ "learning_rate": 0.000158606387391124,
366
+ "loss": 1.7397,
367
+ "step": 510
368
+ },
369
+ {
370
+ "epoch": 0.6446613977994731,
371
+ "grad_norm": 0.45601052045822144,
372
+ "learning_rate": 0.0001577768560763169,
373
+ "loss": 1.7448,
374
+ "step": 520
375
+ },
376
+ {
377
+ "epoch": 0.6570587323725399,
378
+ "grad_norm": 0.39306697249412537,
379
+ "learning_rate": 0.00015694732476150976,
380
+ "loss": 1.8505,
381
+ "step": 530
382
+ },
383
+ {
384
+ "epoch": 0.6694560669456067,
385
+ "grad_norm": 0.4036141633987427,
386
+ "learning_rate": 0.00015611779344670262,
387
+ "loss": 1.9946,
388
+ "step": 540
389
+ },
390
+ {
391
+ "epoch": 0.6818534015186735,
392
+ "grad_norm": 0.34463125467300415,
393
+ "learning_rate": 0.0001552882621318955,
394
+ "loss": 2.0461,
395
+ "step": 550
396
+ },
397
+ {
398
+ "epoch": 0.6942507360917403,
399
+ "grad_norm": 0.3309987485408783,
400
+ "learning_rate": 0.00015445873081708835,
401
+ "loss": 1.8714,
402
+ "step": 560
403
+ },
404
+ {
405
+ "epoch": 0.7066480706648071,
406
+ "grad_norm": 0.40711745619773865,
407
+ "learning_rate": 0.00015362919950228122,
408
+ "loss": 1.9515,
409
+ "step": 570
410
+ },
411
+ {
412
+ "epoch": 0.7190454052378739,
413
+ "grad_norm": 0.4855351150035858,
414
+ "learning_rate": 0.00015279966818747409,
415
+ "loss": 1.928,
416
+ "step": 580
417
+ },
418
+ {
419
+ "epoch": 0.7314427398109407,
420
+ "grad_norm": 0.3159841299057007,
421
+ "learning_rate": 0.00015197013687266695,
422
+ "loss": 1.8648,
423
+ "step": 590
424
+ },
425
+ {
426
+ "epoch": 0.7438400743840075,
427
+ "grad_norm": 0.34454017877578735,
428
+ "learning_rate": 0.00015114060555785982,
429
+ "loss": 1.7614,
430
+ "step": 600
431
+ },
432
+ {
433
+ "epoch": 0.7562374089570743,
434
+ "grad_norm": 0.42112237215042114,
435
+ "learning_rate": 0.00015031107424305268,
436
+ "loss": 1.7943,
437
+ "step": 610
438
+ },
439
+ {
440
+ "epoch": 0.768634743530141,
441
+ "grad_norm": 0.4868924617767334,
442
+ "learning_rate": 0.00014948154292824555,
443
+ "loss": 1.8236,
444
+ "step": 620
445
+ },
446
+ {
447
+ "epoch": 0.7810320781032078,
448
+ "grad_norm": 0.27235159277915955,
449
+ "learning_rate": 0.00014865201161343841,
450
+ "loss": 1.8792,
451
+ "step": 630
452
+ },
453
+ {
454
+ "epoch": 0.7934294126762746,
455
+ "grad_norm": 0.36492735147476196,
456
+ "learning_rate": 0.00014782248029863128,
457
+ "loss": 1.8139,
458
+ "step": 640
459
+ },
460
+ {
461
+ "epoch": 0.8058267472493414,
462
+ "grad_norm": 0.3278910517692566,
463
+ "learning_rate": 0.00014699294898382414,
464
+ "loss": 1.9303,
465
+ "step": 650
466
+ },
467
+ {
468
+ "epoch": 0.8182240818224081,
469
+ "grad_norm": 0.4410141110420227,
470
+ "learning_rate": 0.000146163417669017,
471
+ "loss": 1.7458,
472
+ "step": 660
473
+ },
474
+ {
475
+ "epoch": 0.8306214163954749,
476
+ "grad_norm": 0.44660821557044983,
477
+ "learning_rate": 0.00014533388635420988,
478
+ "loss": 1.6484,
479
+ "step": 670
480
+ },
481
+ {
482
+ "epoch": 0.8430187509685417,
483
+ "grad_norm": 0.36396560072898865,
484
+ "learning_rate": 0.00014450435503940274,
485
+ "loss": 1.7457,
486
+ "step": 680
487
+ },
488
+ {
489
+ "epoch": 0.8554160855416085,
490
+ "grad_norm": 0.4536712169647217,
491
+ "learning_rate": 0.0001436748237245956,
492
+ "loss": 1.7763,
493
+ "step": 690
494
+ },
495
+ {
496
+ "epoch": 0.8678134201146753,
497
+ "grad_norm": 0.45438772439956665,
498
+ "learning_rate": 0.00014284529240978847,
499
+ "loss": 1.8736,
500
+ "step": 700
501
+ },
502
+ {
503
+ "epoch": 0.8802107546877421,
504
+ "grad_norm": 0.331462562084198,
505
+ "learning_rate": 0.00014201576109498134,
506
+ "loss": 1.9909,
507
+ "step": 710
508
+ },
509
+ {
510
+ "epoch": 0.8926080892608089,
511
+ "grad_norm": 0.29686763882637024,
512
+ "learning_rate": 0.00014118622978017423,
513
+ "loss": 1.9242,
514
+ "step": 720
515
+ },
516
+ {
517
+ "epoch": 0.9050054238338757,
518
+ "grad_norm": 0.4546560049057007,
519
+ "learning_rate": 0.00014035669846536707,
520
+ "loss": 1.8032,
521
+ "step": 730
522
+ },
523
+ {
524
+ "epoch": 0.9174027584069425,
525
+ "grad_norm": 0.3135245442390442,
526
+ "learning_rate": 0.00013952716715055993,
527
+ "loss": 1.835,
528
+ "step": 740
529
+ },
530
+ {
531
+ "epoch": 0.9298000929800093,
532
+ "grad_norm": 0.6448049545288086,
533
+ "learning_rate": 0.00013869763583575283,
534
+ "loss": 1.9127,
535
+ "step": 750
536
+ },
537
+ {
538
+ "epoch": 0.9421974275530761,
539
+ "grad_norm": 0.39725756645202637,
540
+ "learning_rate": 0.00013786810452094567,
541
+ "loss": 1.8041,
542
+ "step": 760
543
+ },
544
+ {
545
+ "epoch": 0.9545947621261429,
546
+ "grad_norm": 0.3762451708316803,
547
+ "learning_rate": 0.00013703857320613853,
548
+ "loss": 1.8003,
549
+ "step": 770
550
+ },
551
+ {
552
+ "epoch": 0.9669920966992097,
553
+ "grad_norm": 0.35813263058662415,
554
+ "learning_rate": 0.00013620904189133142,
555
+ "loss": 1.8474,
556
+ "step": 780
557
+ },
558
+ {
559
+ "epoch": 0.9793894312722765,
560
+ "grad_norm": 0.29999616742134094,
561
+ "learning_rate": 0.00013537951057652426,
562
+ "loss": 1.8705,
563
+ "step": 790
564
+ },
565
+ {
566
+ "epoch": 0.9917867658453432,
567
+ "grad_norm": 0.3202720880508423,
568
+ "learning_rate": 0.00013454997926171713,
569
+ "loss": 1.7752,
570
+ "step": 800
571
+ },
572
+ {
573
+ "epoch": 1.0,
574
+ "eval_loss": 1.8193774223327637,
575
+ "eval_runtime": 78.8064,
576
+ "eval_samples_per_second": 9.098,
577
+ "eval_steps_per_second": 1.142,
578
+ "step": 807
579
+ },
580
+ {
581
+ "epoch": 1.00371920037192,
582
+ "grad_norm": 0.30241405963897705,
583
+ "learning_rate": 0.00013372044794691002,
584
+ "loss": 1.8318,
585
+ "step": 810
586
+ },
587
+ {
588
+ "epoch": 1.0161165349449868,
589
+ "grad_norm": 0.3700416386127472,
590
+ "learning_rate": 0.00013289091663210286,
591
+ "loss": 1.8938,
592
+ "step": 820
593
+ },
594
+ {
595
+ "epoch": 1.0285138695180536,
596
+ "grad_norm": 0.4154430329799652,
597
+ "learning_rate": 0.00013206138531729572,
598
+ "loss": 1.8733,
599
+ "step": 830
600
+ },
601
+ {
602
+ "epoch": 1.0409112040911204,
603
+ "grad_norm": 0.38313189148902893,
604
+ "learning_rate": 0.00013123185400248862,
605
+ "loss": 1.8571,
606
+ "step": 840
607
+ },
608
+ {
609
+ "epoch": 1.0533085386641872,
610
+ "grad_norm": 0.23230139911174774,
611
+ "learning_rate": 0.00013040232268768146,
612
+ "loss": 1.798,
613
+ "step": 850
614
+ },
615
+ {
616
+ "epoch": 1.065705873237254,
617
+ "grad_norm": 0.3701108992099762,
618
+ "learning_rate": 0.00012957279137287432,
619
+ "loss": 1.8533,
620
+ "step": 860
621
+ },
622
+ {
623
+ "epoch": 1.0781032078103208,
624
+ "grad_norm": 0.29064834117889404,
625
+ "learning_rate": 0.00012874326005806721,
626
+ "loss": 1.7453,
627
+ "step": 870
628
+ },
629
+ {
630
+ "epoch": 1.0905005423833876,
631
+ "grad_norm": 0.3150763213634491,
632
+ "learning_rate": 0.00012791372874326005,
633
+ "loss": 1.8977,
634
+ "step": 880
635
+ },
636
+ {
637
+ "epoch": 1.1028978769564544,
638
+ "grad_norm": 0.428843230009079,
639
+ "learning_rate": 0.00012708419742845292,
640
+ "loss": 1.8688,
641
+ "step": 890
642
+ },
643
+ {
644
+ "epoch": 1.1152952115295212,
645
+ "grad_norm": 0.2608051896095276,
646
+ "learning_rate": 0.0001262546661136458,
647
+ "loss": 1.7242,
648
+ "step": 900
649
+ },
650
+ {
651
+ "epoch": 1.127692546102588,
652
+ "grad_norm": 0.3821583688259125,
653
+ "learning_rate": 0.00012542513479883865,
654
+ "loss": 1.9037,
655
+ "step": 910
656
+ },
657
+ {
658
+ "epoch": 1.1400898806756548,
659
+ "grad_norm": 0.28013911843299866,
660
+ "learning_rate": 0.00012459560348403151,
661
+ "loss": 1.8215,
662
+ "step": 920
663
+ },
664
+ {
665
+ "epoch": 1.1524872152487216,
666
+ "grad_norm": 0.30506208539009094,
667
+ "learning_rate": 0.0001237660721692244,
668
+ "loss": 1.8197,
669
+ "step": 930
670
+ },
671
+ {
672
+ "epoch": 1.1648845498217884,
673
+ "grad_norm": 0.29327717423439026,
674
+ "learning_rate": 0.00012293654085441727,
675
+ "loss": 1.7784,
676
+ "step": 940
677
+ },
678
+ {
679
+ "epoch": 1.1772818843948551,
680
+ "grad_norm": 0.23550163209438324,
681
+ "learning_rate": 0.0001221070095396101,
682
+ "loss": 1.9318,
683
+ "step": 950
684
+ },
685
+ {
686
+ "epoch": 1.189679218967922,
687
+ "grad_norm": 0.21349768340587616,
688
+ "learning_rate": 0.000121277478224803,
689
+ "loss": 1.8084,
690
+ "step": 960
691
+ },
692
+ {
693
+ "epoch": 1.2020765535409887,
694
+ "grad_norm": 0.34790855646133423,
695
+ "learning_rate": 0.00012044794690999586,
696
+ "loss": 1.7587,
697
+ "step": 970
698
+ },
699
+ {
700
+ "epoch": 1.2144738881140555,
701
+ "grad_norm": 0.2519979774951935,
702
+ "learning_rate": 0.00011961841559518872,
703
+ "loss": 1.8055,
704
+ "step": 980
705
+ },
706
+ {
707
+ "epoch": 1.2268712226871223,
708
+ "grad_norm": 0.3781174123287201,
709
+ "learning_rate": 0.0001187888842803816,
710
+ "loss": 1.8436,
711
+ "step": 990
712
+ },
713
+ {
714
+ "epoch": 1.2392685572601891,
715
+ "grad_norm": 0.26533016562461853,
716
+ "learning_rate": 0.00011795935296557445,
717
+ "loss": 1.6512,
718
+ "step": 1000
719
+ },
720
+ {
721
+ "epoch": 1.2516658918332557,
722
+ "grad_norm": 0.2862655818462372,
723
+ "learning_rate": 0.00011712982165076732,
724
+ "loss": 2.0418,
725
+ "step": 1010
726
+ },
727
+ {
728
+ "epoch": 1.2640632264063227,
729
+ "grad_norm": 0.27094656229019165,
730
+ "learning_rate": 0.0001163002903359602,
731
+ "loss": 1.7278,
732
+ "step": 1020
733
+ },
734
+ {
735
+ "epoch": 1.2764605609793893,
736
+ "grad_norm": 0.29184144735336304,
737
+ "learning_rate": 0.00011547075902115305,
738
+ "loss": 1.6938,
739
+ "step": 1030
740
+ },
741
+ {
742
+ "epoch": 1.2888578955524563,
743
+ "grad_norm": 0.30606576800346375,
744
+ "learning_rate": 0.00011464122770634591,
745
+ "loss": 1.6627,
746
+ "step": 1040
747
+ },
748
+ {
749
+ "epoch": 1.301255230125523,
750
+ "grad_norm": 0.23827411234378815,
751
+ "learning_rate": 0.0001138116963915388,
752
+ "loss": 1.7671,
753
+ "step": 1050
754
+ },
755
+ {
756
+ "epoch": 1.31365256469859,
757
+ "grad_norm": 0.3038440942764282,
758
+ "learning_rate": 0.00011298216507673165,
759
+ "loss": 1.9029,
760
+ "step": 1060
761
+ },
762
+ {
763
+ "epoch": 1.3260498992716565,
764
+ "grad_norm": 0.22325201332569122,
765
+ "learning_rate": 0.00011215263376192451,
766
+ "loss": 1.7413,
767
+ "step": 1070
768
+ },
769
+ {
770
+ "epoch": 1.3384472338447233,
771
+ "grad_norm": 0.30867546796798706,
772
+ "learning_rate": 0.00011132310244711739,
773
+ "loss": 1.8252,
774
+ "step": 1080
775
+ },
776
+ {
777
+ "epoch": 1.35084456841779,
778
+ "grad_norm": 0.3236595392227173,
779
+ "learning_rate": 0.00011049357113231024,
780
+ "loss": 1.8856,
781
+ "step": 1090
782
+ },
783
+ {
784
+ "epoch": 1.363241902990857,
785
+ "grad_norm": 0.32089242339134216,
786
+ "learning_rate": 0.00010966403981750311,
787
+ "loss": 1.8437,
788
+ "step": 1100
789
+ },
790
+ {
791
+ "epoch": 1.3756392375639237,
792
+ "grad_norm": 0.28864434361457825,
793
+ "learning_rate": 0.00010883450850269599,
794
+ "loss": 1.7875,
795
+ "step": 1110
796
+ },
797
+ {
798
+ "epoch": 1.3880365721369905,
799
+ "grad_norm": 0.3817216455936432,
800
+ "learning_rate": 0.00010800497718788884,
801
+ "loss": 1.8455,
802
+ "step": 1120
803
+ },
804
+ {
805
+ "epoch": 1.4004339067100573,
806
+ "grad_norm": 0.22240202128887177,
807
+ "learning_rate": 0.0001071754458730817,
808
+ "loss": 1.7744,
809
+ "step": 1130
810
+ },
811
+ {
812
+ "epoch": 1.412831241283124,
813
+ "grad_norm": 0.28633448481559753,
814
+ "learning_rate": 0.00010634591455827458,
815
+ "loss": 1.765,
816
+ "step": 1140
817
+ },
818
+ {
819
+ "epoch": 1.425228575856191,
820
+ "grad_norm": 0.19945985078811646,
821
+ "learning_rate": 0.00010551638324346745,
822
+ "loss": 1.9739,
823
+ "step": 1150
824
+ },
825
+ {
826
+ "epoch": 1.4376259104292577,
827
+ "grad_norm": 0.2815242111682892,
828
+ "learning_rate": 0.0001046868519286603,
829
+ "loss": 1.8369,
830
+ "step": 1160
831
+ },
832
+ {
833
+ "epoch": 1.4500232450023245,
834
+ "grad_norm": 0.20174123346805573,
835
+ "learning_rate": 0.00010385732061385318,
836
+ "loss": 1.9351,
837
+ "step": 1170
838
+ },
839
+ {
840
+ "epoch": 1.4624205795753913,
841
+ "grad_norm": 0.27899351716041565,
842
+ "learning_rate": 0.00010302778929904605,
843
+ "loss": 1.9544,
844
+ "step": 1180
845
+ },
846
+ {
847
+ "epoch": 1.474817914148458,
848
+ "grad_norm": 0.22028954327106476,
849
+ "learning_rate": 0.00010219825798423892,
850
+ "loss": 1.852,
851
+ "step": 1190
852
+ },
853
+ {
854
+ "epoch": 1.4872152487215249,
855
+ "grad_norm": 0.2520454525947571,
856
+ "learning_rate": 0.00010136872666943178,
857
+ "loss": 1.6232,
858
+ "step": 1200
859
+ },
860
+ {
861
+ "epoch": 1.4996125832945917,
862
+ "grad_norm": 0.34104567766189575,
863
+ "learning_rate": 0.00010053919535462464,
864
+ "loss": 1.9285,
865
+ "step": 1210
866
+ },
867
+ {
868
+ "epoch": 1.5120099178676585,
869
+ "grad_norm": 0.3216676414012909,
870
+ "learning_rate": 9.970966403981751e-05,
871
+ "loss": 1.6644,
872
+ "step": 1220
873
+ },
874
+ {
875
+ "epoch": 1.5244072524407253,
876
+ "grad_norm": 0.22132286429405212,
877
+ "learning_rate": 9.888013272501037e-05,
878
+ "loss": 1.796,
879
+ "step": 1230
880
+ },
881
+ {
882
+ "epoch": 1.536804587013792,
883
+ "grad_norm": 0.35185569524765015,
884
+ "learning_rate": 9.805060141020324e-05,
885
+ "loss": 1.7821,
886
+ "step": 1240
887
+ },
888
+ {
889
+ "epoch": 1.5492019215868589,
890
+ "grad_norm": 0.16420242190361023,
891
+ "learning_rate": 9.72210700953961e-05,
892
+ "loss": 1.8054,
893
+ "step": 1250
894
+ },
895
+ {
896
+ "epoch": 1.5615992561599255,
897
+ "grad_norm": 0.32100507616996765,
898
+ "learning_rate": 9.639153878058897e-05,
899
+ "loss": 1.8707,
900
+ "step": 1260
901
+ },
902
+ {
903
+ "epoch": 1.5739965907329925,
904
+ "grad_norm": 0.2014254331588745,
905
+ "learning_rate": 9.556200746578184e-05,
906
+ "loss": 1.8087,
907
+ "step": 1270
908
+ },
909
+ {
910
+ "epoch": 1.586393925306059,
911
+ "grad_norm": 0.23392640054225922,
912
+ "learning_rate": 9.47324761509747e-05,
913
+ "loss": 1.7327,
914
+ "step": 1280
915
+ },
916
+ {
917
+ "epoch": 1.598791259879126,
918
+ "grad_norm": 0.19987891614437103,
919
+ "learning_rate": 9.390294483616757e-05,
920
+ "loss": 1.8399,
921
+ "step": 1290
922
+ },
923
+ {
924
+ "epoch": 1.6111885944521926,
925
+ "grad_norm": 0.13928192853927612,
926
+ "learning_rate": 9.307341352136043e-05,
927
+ "loss": 1.9271,
928
+ "step": 1300
929
+ },
930
+ {
931
+ "epoch": 1.6235859290252597,
932
+ "grad_norm": 0.3423463702201843,
933
+ "learning_rate": 9.22438822065533e-05,
934
+ "loss": 1.81,
935
+ "step": 1310
936
+ },
937
+ {
938
+ "epoch": 1.6359832635983262,
939
+ "grad_norm": 0.31117212772369385,
940
+ "learning_rate": 9.141435089174618e-05,
941
+ "loss": 1.8919,
942
+ "step": 1320
943
+ },
944
+ {
945
+ "epoch": 1.6483805981713933,
946
+ "grad_norm": 0.1769014447927475,
947
+ "learning_rate": 9.058481957693903e-05,
948
+ "loss": 1.692,
949
+ "step": 1330
950
+ },
951
+ {
952
+ "epoch": 1.6607779327444598,
953
+ "grad_norm": 0.1725306212902069,
954
+ "learning_rate": 8.97552882621319e-05,
955
+ "loss": 1.7833,
956
+ "step": 1340
957
+ },
958
+ {
959
+ "epoch": 1.6731752673175269,
960
+ "grad_norm": 0.15333665907382965,
961
+ "learning_rate": 8.892575694732477e-05,
962
+ "loss": 1.7129,
963
+ "step": 1350
964
+ },
965
+ {
966
+ "epoch": 1.6855726018905934,
967
+ "grad_norm": 0.225737065076828,
968
+ "learning_rate": 8.809622563251764e-05,
969
+ "loss": 2.0034,
970
+ "step": 1360
971
+ },
972
+ {
973
+ "epoch": 1.6979699364636605,
974
+ "grad_norm": 0.27135154604911804,
975
+ "learning_rate": 8.726669431771049e-05,
976
+ "loss": 1.7641,
977
+ "step": 1370
978
+ },
979
+ {
980
+ "epoch": 1.710367271036727,
981
+ "grad_norm": 0.19318152964115143,
982
+ "learning_rate": 8.643716300290337e-05,
983
+ "loss": 1.9714,
984
+ "step": 1380
985
+ },
986
+ {
987
+ "epoch": 1.722764605609794,
988
+ "grad_norm": 0.09128980338573456,
989
+ "learning_rate": 8.560763168809624e-05,
990
+ "loss": 1.6477,
991
+ "step": 1390
992
+ },
993
+ {
994
+ "epoch": 1.7351619401828606,
995
+ "grad_norm": 0.20176462829113007,
996
+ "learning_rate": 8.477810037328909e-05,
997
+ "loss": 1.8707,
998
+ "step": 1400
999
+ },
1000
+ {
1001
+ "epoch": 1.7475592747559274,
1002
+ "grad_norm": 0.1746564656496048,
1003
+ "learning_rate": 8.394856905848197e-05,
1004
+ "loss": 1.8979,
1005
+ "step": 1410
1006
+ },
1007
+ {
1008
+ "epoch": 1.7599566093289942,
1009
+ "grad_norm": 0.8505027294158936,
1010
+ "learning_rate": 8.311903774367483e-05,
1011
+ "loss": 1.71,
1012
+ "step": 1420
1013
+ },
1014
+ {
1015
+ "epoch": 1.772353943902061,
1016
+ "grad_norm": 0.18071773648262024,
1017
+ "learning_rate": 8.22895064288677e-05,
1018
+ "loss": 1.9505,
1019
+ "step": 1430
1020
+ },
1021
+ {
1022
+ "epoch": 1.7847512784751278,
1023
+ "grad_norm": 0.29732322692871094,
1024
+ "learning_rate": 8.145997511406056e-05,
1025
+ "loss": 1.8041,
1026
+ "step": 1440
1027
+ },
1028
+ {
1029
+ "epoch": 1.7971486130481946,
1030
+ "grad_norm": 0.21795178949832916,
1031
+ "learning_rate": 8.063044379925343e-05,
1032
+ "loss": 1.7381,
1033
+ "step": 1450
1034
+ },
1035
+ {
1036
+ "epoch": 1.8095459476212614,
1037
+ "grad_norm": 0.23459061980247498,
1038
+ "learning_rate": 7.98009124844463e-05,
1039
+ "loss": 1.6636,
1040
+ "step": 1460
1041
+ },
1042
+ {
1043
+ "epoch": 1.8219432821943282,
1044
+ "grad_norm": 0.20895689725875854,
1045
+ "learning_rate": 7.897138116963916e-05,
1046
+ "loss": 1.8257,
1047
+ "step": 1470
1048
+ },
1049
+ {
1050
+ "epoch": 1.834340616767395,
1051
+ "grad_norm": 0.18129809200763702,
1052
+ "learning_rate": 7.814184985483203e-05,
1053
+ "loss": 1.9791,
1054
+ "step": 1480
1055
+ },
1056
+ {
1057
+ "epoch": 1.8467379513404618,
1058
+ "grad_norm": 0.1758158951997757,
1059
+ "learning_rate": 7.731231854002489e-05,
1060
+ "loss": 1.7657,
1061
+ "step": 1490
1062
+ },
1063
+ {
1064
+ "epoch": 1.8591352859135286,
1065
+ "grad_norm": 0.14756101369857788,
1066
+ "learning_rate": 7.648278722521776e-05,
1067
+ "loss": 1.7483,
1068
+ "step": 1500
1069
+ },
1070
+ {
1071
+ "epoch": 1.8715326204865954,
1072
+ "grad_norm": 0.14168867468833923,
1073
+ "learning_rate": 7.565325591041062e-05,
1074
+ "loss": 1.8844,
1075
+ "step": 1510
1076
+ },
1077
+ {
1078
+ "epoch": 1.8839299550596622,
1079
+ "grad_norm": 0.1295221447944641,
1080
+ "learning_rate": 7.482372459560349e-05,
1081
+ "loss": 1.8121,
1082
+ "step": 1520
1083
+ },
1084
+ {
1085
+ "epoch": 1.896327289632729,
1086
+ "grad_norm": 0.19356060028076172,
1087
+ "learning_rate": 7.399419328079635e-05,
1088
+ "loss": 1.5851,
1089
+ "step": 1530
1090
+ },
1091
+ {
1092
+ "epoch": 1.9087246242057958,
1093
+ "grad_norm": 0.20035363733768463,
1094
+ "learning_rate": 7.316466196598922e-05,
1095
+ "loss": 1.7063,
1096
+ "step": 1540
1097
+ },
1098
+ {
1099
+ "epoch": 1.9211219587788624,
1100
+ "grad_norm": 0.12139635533094406,
1101
+ "learning_rate": 7.233513065118208e-05,
1102
+ "loss": 1.7962,
1103
+ "step": 1550
1104
+ },
1105
+ {
1106
+ "epoch": 1.9335192933519294,
1107
+ "grad_norm": 0.10587511211633682,
1108
+ "learning_rate": 7.150559933637495e-05,
1109
+ "loss": 1.7143,
1110
+ "step": 1560
1111
+ },
1112
+ {
1113
+ "epoch": 1.945916627924996,
1114
+ "grad_norm": 0.18997065722942352,
1115
+ "learning_rate": 7.067606802156782e-05,
1116
+ "loss": 1.7636,
1117
+ "step": 1570
1118
+ },
1119
+ {
1120
+ "epoch": 1.958313962498063,
1121
+ "grad_norm": 0.15760765969753265,
1122
+ "learning_rate": 6.984653670676068e-05,
1123
+ "loss": 1.7672,
1124
+ "step": 1580
1125
+ },
1126
+ {
1127
+ "epoch": 1.9707112970711296,
1128
+ "grad_norm": 0.1490117609500885,
1129
+ "learning_rate": 6.901700539195355e-05,
1130
+ "loss": 1.6405,
1131
+ "step": 1590
1132
+ },
1133
+ {
1134
+ "epoch": 1.9831086316441966,
1135
+ "grad_norm": 0.09482846409082413,
1136
+ "learning_rate": 6.818747407714641e-05,
1137
+ "loss": 1.8141,
1138
+ "step": 1600
1139
+ },
1140
+ {
1141
+ "epoch": 1.9955059662172632,
1142
+ "grad_norm": 0.182535782456398,
1143
+ "learning_rate": 6.735794276233928e-05,
1144
+ "loss": 1.7745,
1145
+ "step": 1610
1146
+ },
1147
+ {
1148
+ "epoch": 2.0,
1149
+ "eval_loss": 1.8065738677978516,
1150
+ "eval_runtime": 78.829,
1151
+ "eval_samples_per_second": 9.096,
1152
+ "eval_steps_per_second": 1.142,
1153
+ "step": 1614
1154
+ },
1155
+ {
1156
+ "epoch": 2.00743840074384,
1157
+ "grad_norm": 0.24950097501277924,
1158
+ "learning_rate": 6.652841144753214e-05,
1159
+ "loss": 1.7537,
1160
+ "step": 1620
1161
+ },
1162
+ {
1163
+ "epoch": 2.019835735316907,
1164
+ "grad_norm": 0.12352726608514786,
1165
+ "learning_rate": 6.569888013272502e-05,
1166
+ "loss": 1.7048,
1167
+ "step": 1630
1168
+ },
1169
+ {
1170
+ "epoch": 2.0322330698899735,
1171
+ "grad_norm": 0.23885582387447357,
1172
+ "learning_rate": 6.486934881791787e-05,
1173
+ "loss": 1.7903,
1174
+ "step": 1640
1175
+ },
1176
+ {
1177
+ "epoch": 2.0446304044630406,
1178
+ "grad_norm": 0.08030597865581512,
1179
+ "learning_rate": 6.403981750311074e-05,
1180
+ "loss": 1.8085,
1181
+ "step": 1650
1182
+ },
1183
+ {
1184
+ "epoch": 2.057027739036107,
1185
+ "grad_norm": 0.12623316049575806,
1186
+ "learning_rate": 6.321028618830362e-05,
1187
+ "loss": 1.7876,
1188
+ "step": 1660
1189
+ },
1190
+ {
1191
+ "epoch": 2.069425073609174,
1192
+ "grad_norm": 0.08768967539072037,
1193
+ "learning_rate": 6.238075487349647e-05,
1194
+ "loss": 1.767,
1195
+ "step": 1670
1196
+ },
1197
+ {
1198
+ "epoch": 2.0818224081822407,
1199
+ "grad_norm": 0.39565062522888184,
1200
+ "learning_rate": 6.155122355868934e-05,
1201
+ "loss": 1.7355,
1202
+ "step": 1680
1203
+ },
1204
+ {
1205
+ "epoch": 2.0942197427553078,
1206
+ "grad_norm": 0.1476745903491974,
1207
+ "learning_rate": 6.072169224388221e-05,
1208
+ "loss": 1.8694,
1209
+ "step": 1690
1210
+ },
1211
+ {
1212
+ "epoch": 2.1066170773283743,
1213
+ "grad_norm": 0.10529963672161102,
1214
+ "learning_rate": 5.989216092907508e-05,
1215
+ "loss": 1.8461,
1216
+ "step": 1700
1217
+ },
1218
+ {
1219
+ "epoch": 2.1190144119014414,
1220
+ "grad_norm": 0.11380179971456528,
1221
+ "learning_rate": 5.906262961426794e-05,
1222
+ "loss": 1.7705,
1223
+ "step": 1710
1224
+ },
1225
+ {
1226
+ "epoch": 2.131411746474508,
1227
+ "grad_norm": 0.16582056879997253,
1228
+ "learning_rate": 5.823309829946081e-05,
1229
+ "loss": 1.7271,
1230
+ "step": 1720
1231
+ },
1232
+ {
1233
+ "epoch": 2.143809081047575,
1234
+ "grad_norm": 0.09009672701358795,
1235
+ "learning_rate": 5.740356698465368e-05,
1236
+ "loss": 1.9473,
1237
+ "step": 1730
1238
+ },
1239
+ {
1240
+ "epoch": 2.1562064156206415,
1241
+ "grad_norm": 0.09548009186983109,
1242
+ "learning_rate": 5.6574035669846536e-05,
1243
+ "loss": 1.8999,
1244
+ "step": 1740
1245
+ },
1246
+ {
1247
+ "epoch": 2.1686037501937085,
1248
+ "grad_norm": 0.08916931599378586,
1249
+ "learning_rate": 5.574450435503941e-05,
1250
+ "loss": 1.7541,
1251
+ "step": 1750
1252
+ },
1253
+ {
1254
+ "epoch": 2.181001084766775,
1255
+ "grad_norm": 0.09346842020750046,
1256
+ "learning_rate": 5.4914973040232274e-05,
1257
+ "loss": 1.7848,
1258
+ "step": 1760
1259
+ },
1260
+ {
1261
+ "epoch": 2.193398419339842,
1262
+ "grad_norm": 0.09803362190723419,
1263
+ "learning_rate": 5.408544172542513e-05,
1264
+ "loss": 1.8555,
1265
+ "step": 1770
1266
+ },
1267
+ {
1268
+ "epoch": 2.2057957539129087,
1269
+ "grad_norm": 0.23660095036029816,
1270
+ "learning_rate": 5.3255910410618005e-05,
1271
+ "loss": 1.8483,
1272
+ "step": 1780
1273
+ },
1274
+ {
1275
+ "epoch": 2.2181930884859753,
1276
+ "grad_norm": 0.0797749012708664,
1277
+ "learning_rate": 5.242637909581087e-05,
1278
+ "loss": 1.8747,
1279
+ "step": 1790
1280
+ },
1281
+ {
1282
+ "epoch": 2.2305904230590423,
1283
+ "grad_norm": 0.13405947387218475,
1284
+ "learning_rate": 5.159684778100373e-05,
1285
+ "loss": 1.874,
1286
+ "step": 1800
1287
+ },
1288
+ {
1289
+ "epoch": 2.242987757632109,
1290
+ "grad_norm": 0.0895572081208229,
1291
+ "learning_rate": 5.07673164661966e-05,
1292
+ "loss": 1.7637,
1293
+ "step": 1810
1294
+ },
1295
+ {
1296
+ "epoch": 2.255385092205176,
1297
+ "grad_norm": 0.1163070872426033,
1298
+ "learning_rate": 4.993778515138947e-05,
1299
+ "loss": 1.9363,
1300
+ "step": 1820
1301
+ },
1302
+ {
1303
+ "epoch": 2.2677824267782425,
1304
+ "grad_norm": 0.08817609399557114,
1305
+ "learning_rate": 4.910825383658233e-05,
1306
+ "loss": 1.8142,
1307
+ "step": 1830
1308
+ },
1309
+ {
1310
+ "epoch": 2.2801797613513095,
1311
+ "grad_norm": 0.1023801639676094,
1312
+ "learning_rate": 4.82787225217752e-05,
1313
+ "loss": 1.9672,
1314
+ "step": 1840
1315
+ },
1316
+ {
1317
+ "epoch": 2.292577095924376,
1318
+ "grad_norm": 0.12135002017021179,
1319
+ "learning_rate": 4.7449191206968064e-05,
1320
+ "loss": 1.9507,
1321
+ "step": 1850
1322
+ },
1323
+ {
1324
+ "epoch": 2.304974430497443,
1325
+ "grad_norm": 0.11531046777963638,
1326
+ "learning_rate": 4.6619659892160936e-05,
1327
+ "loss": 1.6656,
1328
+ "step": 1860
1329
+ },
1330
+ {
1331
+ "epoch": 2.3173717650705097,
1332
+ "grad_norm": 0.08688156306743622,
1333
+ "learning_rate": 4.5790128577353795e-05,
1334
+ "loss": 1.6901,
1335
+ "step": 1870
1336
+ },
1337
+ {
1338
+ "epoch": 2.3297690996435767,
1339
+ "grad_norm": 0.12937931716442108,
1340
+ "learning_rate": 4.496059726254666e-05,
1341
+ "loss": 1.6826,
1342
+ "step": 1880
1343
+ },
1344
+ {
1345
+ "epoch": 2.3421664342166433,
1346
+ "grad_norm": 0.08356551826000214,
1347
+ "learning_rate": 4.413106594773953e-05,
1348
+ "loss": 1.7614,
1349
+ "step": 1890
1350
+ },
1351
+ {
1352
+ "epoch": 2.3545637687897103,
1353
+ "grad_norm": 0.08626966178417206,
1354
+ "learning_rate": 4.330153463293239e-05,
1355
+ "loss": 1.924,
1356
+ "step": 1900
1357
+ },
1358
+ {
1359
+ "epoch": 2.366961103362777,
1360
+ "grad_norm": 0.12821203470230103,
1361
+ "learning_rate": 4.247200331812526e-05,
1362
+ "loss": 1.6473,
1363
+ "step": 1910
1364
+ },
1365
+ {
1366
+ "epoch": 2.379358437935844,
1367
+ "grad_norm": 0.08293508738279343,
1368
+ "learning_rate": 4.164247200331813e-05,
1369
+ "loss": 1.8332,
1370
+ "step": 1920
1371
+ },
1372
+ {
1373
+ "epoch": 2.3917557725089105,
1374
+ "grad_norm": 0.07418167591094971,
1375
+ "learning_rate": 4.0812940688510995e-05,
1376
+ "loss": 1.8255,
1377
+ "step": 1930
1378
+ },
1379
+ {
1380
+ "epoch": 2.4041531070819775,
1381
+ "grad_norm": 0.10695914179086685,
1382
+ "learning_rate": 3.998340937370386e-05,
1383
+ "loss": 1.7543,
1384
+ "step": 1940
1385
+ },
1386
+ {
1387
+ "epoch": 2.416550441655044,
1388
+ "grad_norm": 0.0908871591091156,
1389
+ "learning_rate": 3.9153878058896726e-05,
1390
+ "loss": 1.7794,
1391
+ "step": 1950
1392
+ },
1393
+ {
1394
+ "epoch": 2.428947776228111,
1395
+ "grad_norm": 0.11525363475084305,
1396
+ "learning_rate": 3.832434674408959e-05,
1397
+ "loss": 1.6607,
1398
+ "step": 1960
1399
+ },
1400
+ {
1401
+ "epoch": 2.4413451108011777,
1402
+ "grad_norm": 0.07907426357269287,
1403
+ "learning_rate": 3.749481542928246e-05,
1404
+ "loss": 1.7764,
1405
+ "step": 1970
1406
+ },
1407
+ {
1408
+ "epoch": 2.4537424453742447,
1409
+ "grad_norm": 0.0990724042057991,
1410
+ "learning_rate": 3.666528411447532e-05,
1411
+ "loss": 1.857,
1412
+ "step": 1980
1413
+ },
1414
+ {
1415
+ "epoch": 2.4661397799473113,
1416
+ "grad_norm": 0.06209372356534004,
1417
+ "learning_rate": 3.583575279966819e-05,
1418
+ "loss": 1.9036,
1419
+ "step": 1990
1420
+ },
1421
+ {
1422
+ "epoch": 2.4785371145203783,
1423
+ "grad_norm": 0.07287060469388962,
1424
+ "learning_rate": 3.5006221484861054e-05,
1425
+ "loss": 1.8647,
1426
+ "step": 2000
1427
+ },
1428
+ {
1429
+ "epoch": 2.490934449093445,
1430
+ "grad_norm": 0.06776302307844162,
1431
+ "learning_rate": 3.417669017005392e-05,
1432
+ "loss": 1.8703,
1433
+ "step": 2010
1434
+ },
1435
+ {
1436
+ "epoch": 2.5033317836665114,
1437
+ "grad_norm": 0.08577609807252884,
1438
+ "learning_rate": 3.3347158855246785e-05,
1439
+ "loss": 1.8978,
1440
+ "step": 2020
1441
+ },
1442
+ {
1443
+ "epoch": 2.5157291182395785,
1444
+ "grad_norm": 0.087737075984478,
1445
+ "learning_rate": 3.251762754043966e-05,
1446
+ "loss": 1.7975,
1447
+ "step": 2030
1448
+ },
1449
+ {
1450
+ "epoch": 2.5281264528126455,
1451
+ "grad_norm": 0.07157925516366959,
1452
+ "learning_rate": 3.1688096225632516e-05,
1453
+ "loss": 1.8695,
1454
+ "step": 2040
1455
+ },
1456
+ {
1457
+ "epoch": 2.540523787385712,
1458
+ "grad_norm": 0.0907244011759758,
1459
+ "learning_rate": 3.085856491082538e-05,
1460
+ "loss": 1.8752,
1461
+ "step": 2050
1462
+ },
1463
+ {
1464
+ "epoch": 2.5529211219587786,
1465
+ "grad_norm": 0.0730370432138443,
1466
+ "learning_rate": 3.0029033596018254e-05,
1467
+ "loss": 1.6779,
1468
+ "step": 2060
1469
+ },
1470
+ {
1471
+ "epoch": 2.5653184565318456,
1472
+ "grad_norm": 0.09164728224277496,
1473
+ "learning_rate": 2.9199502281211116e-05,
1474
+ "loss": 1.7662,
1475
+ "step": 2070
1476
+ },
1477
+ {
1478
+ "epoch": 2.5777157911049127,
1479
+ "grad_norm": 0.08118163049221039,
1480
+ "learning_rate": 2.8369970966403985e-05,
1481
+ "loss": 1.7423,
1482
+ "step": 2080
1483
+ },
1484
+ {
1485
+ "epoch": 2.5901131256779792,
1486
+ "grad_norm": 0.07020942121744156,
1487
+ "learning_rate": 2.754043965159685e-05,
1488
+ "loss": 1.6809,
1489
+ "step": 2090
1490
+ },
1491
+ {
1492
+ "epoch": 2.602510460251046,
1493
+ "grad_norm": 0.08827990293502808,
1494
+ "learning_rate": 2.6710908336789713e-05,
1495
+ "loss": 1.5771,
1496
+ "step": 2100
1497
+ },
1498
+ {
1499
+ "epoch": 2.614907794824113,
1500
+ "grad_norm": 0.11959416419267654,
1501
+ "learning_rate": 2.5881377021982585e-05,
1502
+ "loss": 1.756,
1503
+ "step": 2110
1504
+ },
1505
+ {
1506
+ "epoch": 2.62730512939718,
1507
+ "grad_norm": 0.18080498278141022,
1508
+ "learning_rate": 2.5051845707175447e-05,
1509
+ "loss": 1.7437,
1510
+ "step": 2120
1511
+ },
1512
+ {
1513
+ "epoch": 2.6397024639702464,
1514
+ "grad_norm": 0.08301777392625809,
1515
+ "learning_rate": 2.4222314392368316e-05,
1516
+ "loss": 1.9331,
1517
+ "step": 2130
1518
+ },
1519
+ {
1520
+ "epoch": 2.652099798543313,
1521
+ "grad_norm": 0.08320324867963791,
1522
+ "learning_rate": 2.3392783077561178e-05,
1523
+ "loss": 1.6681,
1524
+ "step": 2140
1525
+ },
1526
+ {
1527
+ "epoch": 2.66449713311638,
1528
+ "grad_norm": 0.09895749390125275,
1529
+ "learning_rate": 2.2563251762754044e-05,
1530
+ "loss": 1.7887,
1531
+ "step": 2150
1532
+ },
1533
+ {
1534
+ "epoch": 2.6768944676894466,
1535
+ "grad_norm": 0.08482034504413605,
1536
+ "learning_rate": 2.1733720447946913e-05,
1537
+ "loss": 1.8258,
1538
+ "step": 2160
1539
+ },
1540
+ {
1541
+ "epoch": 2.6892918022625136,
1542
+ "grad_norm": 0.08512965589761734,
1543
+ "learning_rate": 2.0904189133139775e-05,
1544
+ "loss": 1.8644,
1545
+ "step": 2170
1546
+ },
1547
+ {
1548
+ "epoch": 2.70168913683558,
1549
+ "grad_norm": 0.0718398168683052,
1550
+ "learning_rate": 2.0074657818332644e-05,
1551
+ "loss": 1.875,
1552
+ "step": 2180
1553
+ },
1554
+ {
1555
+ "epoch": 2.7140864714086472,
1556
+ "grad_norm": 0.07293003052473068,
1557
+ "learning_rate": 1.924512650352551e-05,
1558
+ "loss": 1.905,
1559
+ "step": 2190
1560
+ },
1561
+ {
1562
+ "epoch": 2.726483805981714,
1563
+ "grad_norm": 0.07507035881280899,
1564
+ "learning_rate": 1.8415595188718375e-05,
1565
+ "loss": 1.8567,
1566
+ "step": 2200
1567
+ },
1568
+ {
1569
+ "epoch": 2.738881140554781,
1570
+ "grad_norm": 0.07978376746177673,
1571
+ "learning_rate": 1.758606387391124e-05,
1572
+ "loss": 1.763,
1573
+ "step": 2210
1574
+ },
1575
+ {
1576
+ "epoch": 2.7512784751278474,
1577
+ "grad_norm": 0.06767022609710693,
1578
+ "learning_rate": 1.6756532559104106e-05,
1579
+ "loss": 1.7365,
1580
+ "step": 2220
1581
+ },
1582
+ {
1583
+ "epoch": 2.7636758097009144,
1584
+ "grad_norm": 0.08018273115158081,
1585
+ "learning_rate": 1.5927001244296975e-05,
1586
+ "loss": 1.7595,
1587
+ "step": 2230
1588
+ },
1589
+ {
1590
+ "epoch": 2.776073144273981,
1591
+ "grad_norm": 0.09407597035169601,
1592
+ "learning_rate": 1.5097469929489839e-05,
1593
+ "loss": 1.6947,
1594
+ "step": 2240
1595
+ },
1596
+ {
1597
+ "epoch": 2.788470478847048,
1598
+ "grad_norm": 0.0730314627289772,
1599
+ "learning_rate": 1.4267938614682704e-05,
1600
+ "loss": 1.8826,
1601
+ "step": 2250
1602
+ },
1603
+ {
1604
+ "epoch": 2.8008678134201146,
1605
+ "grad_norm": 0.06652677804231644,
1606
+ "learning_rate": 1.3438407299875571e-05,
1607
+ "loss": 1.8177,
1608
+ "step": 2260
1609
+ },
1610
+ {
1611
+ "epoch": 2.8132651479931816,
1612
+ "grad_norm": 0.07838471233844757,
1613
+ "learning_rate": 1.2608875985068435e-05,
1614
+ "loss": 1.8317,
1615
+ "step": 2270
1616
+ },
1617
+ {
1618
+ "epoch": 2.825662482566248,
1619
+ "grad_norm": 0.0950925275683403,
1620
+ "learning_rate": 1.1779344670261303e-05,
1621
+ "loss": 1.7767,
1622
+ "step": 2280
1623
+ },
1624
+ {
1625
+ "epoch": 2.838059817139315,
1626
+ "grad_norm": 0.08422578871250153,
1627
+ "learning_rate": 1.094981335545417e-05,
1628
+ "loss": 1.8927,
1629
+ "step": 2290
1630
+ },
1631
+ {
1632
+ "epoch": 2.850457151712382,
1633
+ "grad_norm": 0.08127064257860184,
1634
+ "learning_rate": 1.0120282040647035e-05,
1635
+ "loss": 1.8953,
1636
+ "step": 2300
1637
+ },
1638
+ {
1639
+ "epoch": 2.862854486285449,
1640
+ "grad_norm": 0.0867166519165039,
1641
+ "learning_rate": 9.290750725839901e-06,
1642
+ "loss": 1.7904,
1643
+ "step": 2310
1644
+ },
1645
+ {
1646
+ "epoch": 2.8752518208585154,
1647
+ "grad_norm": 0.09853280335664749,
1648
+ "learning_rate": 8.461219411032766e-06,
1649
+ "loss": 1.7074,
1650
+ "step": 2320
1651
+ },
1652
+ {
1653
+ "epoch": 2.887649155431582,
1654
+ "grad_norm": 0.0887097492814064,
1655
+ "learning_rate": 7.631688096225632e-06,
1656
+ "loss": 1.7587,
1657
+ "step": 2330
1658
+ },
1659
+ {
1660
+ "epoch": 2.900046490004649,
1661
+ "grad_norm": 0.0924605056643486,
1662
+ "learning_rate": 6.802156781418499e-06,
1663
+ "loss": 1.9213,
1664
+ "step": 2340
1665
+ },
1666
+ {
1667
+ "epoch": 2.912443824577716,
1668
+ "grad_norm": 0.08043299615383148,
1669
+ "learning_rate": 5.972625466611365e-06,
1670
+ "loss": 1.8088,
1671
+ "step": 2350
1672
+ },
1673
+ {
1674
+ "epoch": 2.9248411591507826,
1675
+ "grad_norm": 0.08430881798267365,
1676
+ "learning_rate": 5.143094151804231e-06,
1677
+ "loss": 1.8026,
1678
+ "step": 2360
1679
+ },
1680
+ {
1681
+ "epoch": 2.937238493723849,
1682
+ "grad_norm": 0.07314042001962662,
1683
+ "learning_rate": 4.313562836997097e-06,
1684
+ "loss": 1.7945,
1685
+ "step": 2370
1686
+ },
1687
+ {
1688
+ "epoch": 2.949635828296916,
1689
+ "grad_norm": 0.07442809641361237,
1690
+ "learning_rate": 3.484031522189963e-06,
1691
+ "loss": 1.6588,
1692
+ "step": 2380
1693
+ },
1694
+ {
1695
+ "epoch": 2.962033162869983,
1696
+ "grad_norm": 0.08359961956739426,
1697
+ "learning_rate": 2.6545002073828286e-06,
1698
+ "loss": 1.7927,
1699
+ "step": 2390
1700
+ },
1701
+ {
1702
+ "epoch": 2.9744304974430498,
1703
+ "grad_norm": 0.07364208996295929,
1704
+ "learning_rate": 1.8249688925756948e-06,
1705
+ "loss": 1.7357,
1706
+ "step": 2400
1707
+ },
1708
+ {
1709
+ "epoch": 2.9868278320161163,
1710
+ "grad_norm": 0.07607521116733551,
1711
+ "learning_rate": 9.95437577768561e-07,
1712
+ "loss": 1.8463,
1713
+ "step": 2410
1714
+ },
1715
+ {
1716
+ "epoch": 2.9992251665891834,
1717
+ "grad_norm": 0.08674101531505585,
1718
+ "learning_rate": 1.6590626296142679e-07,
1719
+ "loss": 1.9422,
1720
+ "step": 2420
1721
+ },
1722
+ {
1723
+ "epoch": 3.0,
1724
+ "eval_loss": 1.8053849935531616,
1725
+ "eval_runtime": 78.8378,
1726
+ "eval_samples_per_second": 9.095,
1727
+ "eval_steps_per_second": 1.142,
1728
+ "step": 2421
1729
+ }
1730
+ ],
1731
+ "logging_steps": 10,
1732
+ "max_steps": 2421,
1733
+ "num_input_tokens_seen": 0,
1734
+ "num_train_epochs": 3,
1735
+ "save_steps": 500,
1736
+ "stateful_callbacks": {
1737
+ "TrainerControl": {
1738
+ "args": {
1739
+ "should_epoch_stop": false,
1740
+ "should_evaluate": false,
1741
+ "should_log": false,
1742
+ "should_save": true,
1743
+ "should_training_stop": true
1744
+ },
1745
+ "attributes": {}
1746
+ }
1747
+ },
1748
+ "total_flos": 8.469388058720993e+17,
1749
+ "train_batch_size": 1,
1750
+ "trial_name": null,
1751
+ "trial_params": null
1752
+ }
checkpoint-2421/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15ed81f35b71a674f101f42b5d39caf5a7c94955958bbf088fb20abcabaa812f
3
+ size 5905
runs/Oct31_19-53-38_66a1d7a2b568/events.out.tfevents.1761940446.66a1d7a2b568.942.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c565ed1e25f4e26b1a87b2a9ca699bb6587e0f3132f8d40f7927babad4c430a7
3
+ size 57740
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
3
+ size 587404
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff