jacobpol commited on
Commit
9522cf5
·
verified ·
1 Parent(s): 51bf0d3

Delete person_lora/checkpoint-1224

Browse files
person_lora/checkpoint-1224/README.md DELETED
@@ -1,207 +0,0 @@
1
- ---
2
- base_model: Qwen/Qwen3-4B-Instruct-2507
3
- library_name: peft
4
- pipeline_tag: text-generation
5
- tags:
6
- - base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
7
- - lora
8
- - transformers
9
- ---
10
-
11
- # Model Card for Model ID
12
-
13
- <!-- Provide a quick summary of what the model is/does. -->
14
-
15
-
16
-
17
- ## Model Details
18
-
19
- ### Model Description
20
-
21
- <!-- Provide a longer summary of what this model is. -->
22
-
23
-
24
-
25
- - **Developed by:** [More Information Needed]
26
- - **Funded by [optional]:** [More Information Needed]
27
- - **Shared by [optional]:** [More Information Needed]
28
- - **Model type:** [More Information Needed]
29
- - **Language(s) (NLP):** [More Information Needed]
30
- - **License:** [More Information Needed]
31
- - **Finetuned from model [optional]:** [More Information Needed]
32
-
33
- ### Model Sources [optional]
34
-
35
- <!-- Provide the basic links for the model. -->
36
-
37
- - **Repository:** [More Information Needed]
38
- - **Paper [optional]:** [More Information Needed]
39
- - **Demo [optional]:** [More Information Needed]
40
-
41
- ## Uses
42
-
43
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
-
45
- ### Direct Use
46
-
47
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
-
49
- [More Information Needed]
50
-
51
- ### Downstream Use [optional]
52
-
53
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
-
55
- [More Information Needed]
56
-
57
- ### Out-of-Scope Use
58
-
59
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
-
61
- [More Information Needed]
62
-
63
- ## Bias, Risks, and Limitations
64
-
65
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
-
67
- [More Information Needed]
68
-
69
- ### Recommendations
70
-
71
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
-
73
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
-
75
- ## How to Get Started with the Model
76
-
77
- Use the code below to get started with the model.
78
-
79
- [More Information Needed]
80
-
81
- ## Training Details
82
-
83
- ### Training Data
84
-
85
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
-
87
- [More Information Needed]
88
-
89
- ### Training Procedure
90
-
91
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
-
93
- #### Preprocessing [optional]
94
-
95
- [More Information Needed]
96
-
97
-
98
- #### Training Hyperparameters
99
-
100
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
-
102
- #### Speeds, Sizes, Times [optional]
103
-
104
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
-
106
- [More Information Needed]
107
-
108
- ## Evaluation
109
-
110
- <!-- This section describes the evaluation protocols and provides the results. -->
111
-
112
- ### Testing Data, Factors & Metrics
113
-
114
- #### Testing Data
115
-
116
- <!-- This should link to a Dataset Card if possible. -->
117
-
118
- [More Information Needed]
119
-
120
- #### Factors
121
-
122
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
-
124
- [More Information Needed]
125
-
126
- #### Metrics
127
-
128
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
-
130
- [More Information Needed]
131
-
132
- ### Results
133
-
134
- [More Information Needed]
135
-
136
- #### Summary
137
-
138
-
139
-
140
- ## Model Examination [optional]
141
-
142
- <!-- Relevant interpretability work for the model goes here -->
143
-
144
- [More Information Needed]
145
-
146
- ## Environmental Impact
147
-
148
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
-
150
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
-
152
- - **Hardware Type:** [More Information Needed]
153
- - **Hours used:** [More Information Needed]
154
- - **Cloud Provider:** [More Information Needed]
155
- - **Compute Region:** [More Information Needed]
156
- - **Carbon Emitted:** [More Information Needed]
157
-
158
- ## Technical Specifications [optional]
159
-
160
- ### Model Architecture and Objective
161
-
162
- [More Information Needed]
163
-
164
- ### Compute Infrastructure
165
-
166
- [More Information Needed]
167
-
168
- #### Hardware
169
-
170
- [More Information Needed]
171
-
172
- #### Software
173
-
174
- [More Information Needed]
175
-
176
- ## Citation [optional]
177
-
178
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
-
180
- **BibTeX:**
181
-
182
- [More Information Needed]
183
-
184
- **APA:**
185
-
186
- [More Information Needed]
187
-
188
- ## Glossary [optional]
189
-
190
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
-
192
- [More Information Needed]
193
-
194
- ## More Information [optional]
195
-
196
- [More Information Needed]
197
-
198
- ## Model Card Authors [optional]
199
-
200
- [More Information Needed]
201
-
202
- ## Model Card Contact
203
-
204
- [More Information Needed]
205
- ### Framework versions
206
-
207
- - PEFT 0.18.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
person_lora/checkpoint-1224/adapter_config.json DELETED
@@ -1,46 +0,0 @@
1
- {
2
- "alora_invocation_tokens": null,
3
- "alpha_pattern": {},
4
- "arrow_config": null,
5
- "auto_mapping": null,
6
- "base_model_name_or_path": "Qwen/Qwen3-4B-Instruct-2507",
7
- "bias": "none",
8
- "corda_config": null,
9
- "ensure_weight_tying": false,
10
- "eva_config": null,
11
- "exclude_modules": null,
12
- "fan_in_fan_out": false,
13
- "inference_mode": true,
14
- "init_lora_weights": true,
15
- "layer_replication": null,
16
- "layers_pattern": null,
17
- "layers_to_transform": null,
18
- "loftq_config": {},
19
- "lora_alpha": 16,
20
- "lora_bias": false,
21
- "lora_dropout": 0.05,
22
- "megatron_config": null,
23
- "megatron_core": "megatron.core",
24
- "modules_to_save": null,
25
- "peft_type": "LORA",
26
- "peft_version": "0.18.1",
27
- "qalora_group_size": 16,
28
- "r": 64,
29
- "rank_pattern": {},
30
- "revision": null,
31
- "target_modules": [
32
- "o_proj",
33
- "v_proj",
34
- "q_proj",
35
- "gate_proj",
36
- "k_proj",
37
- "up_proj",
38
- "down_proj"
39
- ],
40
- "target_parameters": null,
41
- "task_type": "CAUSAL_LM",
42
- "trainable_token_indices": null,
43
- "use_dora": false,
44
- "use_qalora": false,
45
- "use_rslora": false
46
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
person_lora/checkpoint-1224/adapter_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:9e462d70d1737df420c59a95a85eecb7a98820e8686f9394e7b35e1332b259cc
3
- size 528550256
 
 
 
 
person_lora/checkpoint-1224/added_tokens.json DELETED
@@ -1,28 +0,0 @@
1
- {
2
- "</think>": 151668,
3
- "</tool_call>": 151658,
4
- "</tool_response>": 151666,
5
- "<think>": 151667,
6
- "<tool_call>": 151657,
7
- "<tool_response>": 151665,
8
- "<|box_end|>": 151649,
9
- "<|box_start|>": 151648,
10
- "<|endoftext|>": 151643,
11
- "<|file_sep|>": 151664,
12
- "<|fim_middle|>": 151660,
13
- "<|fim_pad|>": 151662,
14
- "<|fim_prefix|>": 151659,
15
- "<|fim_suffix|>": 151661,
16
- "<|im_end|>": 151645,
17
- "<|im_start|>": 151644,
18
- "<|image_pad|>": 151655,
19
- "<|object_ref_end|>": 151647,
20
- "<|object_ref_start|>": 151646,
21
- "<|quad_end|>": 151651,
22
- "<|quad_start|>": 151650,
23
- "<|repo_name|>": 151663,
24
- "<|video_pad|>": 151656,
25
- "<|vision_end|>": 151653,
26
- "<|vision_pad|>": 151654,
27
- "<|vision_start|>": 151652
28
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
person_lora/checkpoint-1224/merges.txt DELETED
The diff for this file is too large to render. See raw diff
 
person_lora/checkpoint-1224/optimizer.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:b9889e82104ffb5dbd1e64acdbdcb75b0ccf7d682b4fc7b15b57005d0a66d166
3
- size 1057390923
 
 
 
 
person_lora/checkpoint-1224/rng_state.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:43ea5b16363545b09472729b7073fe5c4a5944f59878facbc92d227389518462
3
- size 14645
 
 
 
 
person_lora/checkpoint-1224/scheduler.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:6a956ea83f52d68346214fb4c1400f4629bb44b9f3494ea151b52b5e43015afd
3
- size 1465
 
 
 
 
person_lora/checkpoint-1224/special_tokens_map.json DELETED
@@ -1,31 +0,0 @@
1
- {
2
- "additional_special_tokens": [
3
- "<|im_start|>",
4
- "<|im_end|>",
5
- "<|object_ref_start|>",
6
- "<|object_ref_end|>",
7
- "<|box_start|>",
8
- "<|box_end|>",
9
- "<|quad_start|>",
10
- "<|quad_end|>",
11
- "<|vision_start|>",
12
- "<|vision_end|>",
13
- "<|vision_pad|>",
14
- "<|image_pad|>",
15
- "<|video_pad|>"
16
- ],
17
- "eos_token": {
18
- "content": "<|im_end|>",
19
- "lstrip": false,
20
- "normalized": false,
21
- "rstrip": false,
22
- "single_word": false
23
- },
24
- "pad_token": {
25
- "content": "<|endoftext|>",
26
- "lstrip": false,
27
- "normalized": false,
28
- "rstrip": false,
29
- "single_word": false
30
- }
31
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
person_lora/checkpoint-1224/tokenizer.json DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:2f1298e298f2fe0059aba46f037697a339ccba45a1908780ce8ca14b45582f23
3
- size 11422753
 
 
 
 
person_lora/checkpoint-1224/tokenizer_config.json DELETED
@@ -1,240 +0,0 @@
1
- {
2
- "add_bos_token": false,
3
- "add_prefix_space": false,
4
- "added_tokens_decoder": {
5
- "151643": {
6
- "content": "<|endoftext|>",
7
- "lstrip": false,
8
- "normalized": false,
9
- "rstrip": false,
10
- "single_word": false,
11
- "special": true
12
- },
13
- "151644": {
14
- "content": "<|im_start|>",
15
- "lstrip": false,
16
- "normalized": false,
17
- "rstrip": false,
18
- "single_word": false,
19
- "special": true
20
- },
21
- "151645": {
22
- "content": "<|im_end|>",
23
- "lstrip": false,
24
- "normalized": false,
25
- "rstrip": false,
26
- "single_word": false,
27
- "special": true
28
- },
29
- "151646": {
30
- "content": "<|object_ref_start|>",
31
- "lstrip": false,
32
- "normalized": false,
33
- "rstrip": false,
34
- "single_word": false,
35
- "special": true
36
- },
37
- "151647": {
38
- "content": "<|object_ref_end|>",
39
- "lstrip": false,
40
- "normalized": false,
41
- "rstrip": false,
42
- "single_word": false,
43
- "special": true
44
- },
45
- "151648": {
46
- "content": "<|box_start|>",
47
- "lstrip": false,
48
- "normalized": false,
49
- "rstrip": false,
50
- "single_word": false,
51
- "special": true
52
- },
53
- "151649": {
54
- "content": "<|box_end|>",
55
- "lstrip": false,
56
- "normalized": false,
57
- "rstrip": false,
58
- "single_word": false,
59
- "special": true
60
- },
61
- "151650": {
62
- "content": "<|quad_start|>",
63
- "lstrip": false,
64
- "normalized": false,
65
- "rstrip": false,
66
- "single_word": false,
67
- "special": true
68
- },
69
- "151651": {
70
- "content": "<|quad_end|>",
71
- "lstrip": false,
72
- "normalized": false,
73
- "rstrip": false,
74
- "single_word": false,
75
- "special": true
76
- },
77
- "151652": {
78
- "content": "<|vision_start|>",
79
- "lstrip": false,
80
- "normalized": false,
81
- "rstrip": false,
82
- "single_word": false,
83
- "special": true
84
- },
85
- "151653": {
86
- "content": "<|vision_end|>",
87
- "lstrip": false,
88
- "normalized": false,
89
- "rstrip": false,
90
- "single_word": false,
91
- "special": true
92
- },
93
- "151654": {
94
- "content": "<|vision_pad|>",
95
- "lstrip": false,
96
- "normalized": false,
97
- "rstrip": false,
98
- "single_word": false,
99
- "special": true
100
- },
101
- "151655": {
102
- "content": "<|image_pad|>",
103
- "lstrip": false,
104
- "normalized": false,
105
- "rstrip": false,
106
- "single_word": false,
107
- "special": true
108
- },
109
- "151656": {
110
- "content": "<|video_pad|>",
111
- "lstrip": false,
112
- "normalized": false,
113
- "rstrip": false,
114
- "single_word": false,
115
- "special": true
116
- },
117
- "151657": {
118
- "content": "<tool_call>",
119
- "lstrip": false,
120
- "normalized": false,
121
- "rstrip": false,
122
- "single_word": false,
123
- "special": false
124
- },
125
- "151658": {
126
- "content": "</tool_call>",
127
- "lstrip": false,
128
- "normalized": false,
129
- "rstrip": false,
130
- "single_word": false,
131
- "special": false
132
- },
133
- "151659": {
134
- "content": "<|fim_prefix|>",
135
- "lstrip": false,
136
- "normalized": false,
137
- "rstrip": false,
138
- "single_word": false,
139
- "special": false
140
- },
141
- "151660": {
142
- "content": "<|fim_middle|>",
143
- "lstrip": false,
144
- "normalized": false,
145
- "rstrip": false,
146
- "single_word": false,
147
- "special": false
148
- },
149
- "151661": {
150
- "content": "<|fim_suffix|>",
151
- "lstrip": false,
152
- "normalized": false,
153
- "rstrip": false,
154
- "single_word": false,
155
- "special": false
156
- },
157
- "151662": {
158
- "content": "<|fim_pad|>",
159
- "lstrip": false,
160
- "normalized": false,
161
- "rstrip": false,
162
- "single_word": false,
163
- "special": false
164
- },
165
- "151663": {
166
- "content": "<|repo_name|>",
167
- "lstrip": false,
168
- "normalized": false,
169
- "rstrip": false,
170
- "single_word": false,
171
- "special": false
172
- },
173
- "151664": {
174
- "content": "<|file_sep|>",
175
- "lstrip": false,
176
- "normalized": false,
177
- "rstrip": false,
178
- "single_word": false,
179
- "special": false
180
- },
181
- "151665": {
182
- "content": "<tool_response>",
183
- "lstrip": false,
184
- "normalized": false,
185
- "rstrip": false,
186
- "single_word": false,
187
- "special": false
188
- },
189
- "151666": {
190
- "content": "</tool_response>",
191
- "lstrip": false,
192
- "normalized": false,
193
- "rstrip": false,
194
- "single_word": false,
195
- "special": false
196
- },
197
- "151667": {
198
- "content": "<think>",
199
- "lstrip": false,
200
- "normalized": false,
201
- "rstrip": false,
202
- "single_word": false,
203
- "special": false
204
- },
205
- "151668": {
206
- "content": "</think>",
207
- "lstrip": false,
208
- "normalized": false,
209
- "rstrip": false,
210
- "single_word": false,
211
- "special": false
212
- }
213
- },
214
- "additional_special_tokens": [
215
- "<|im_start|>",
216
- "<|im_end|>",
217
- "<|object_ref_start|>",
218
- "<|object_ref_end|>",
219
- "<|box_start|>",
220
- "<|box_end|>",
221
- "<|quad_start|>",
222
- "<|quad_end|>",
223
- "<|vision_start|>",
224
- "<|vision_end|>",
225
- "<|vision_pad|>",
226
- "<|image_pad|>",
227
- "<|video_pad|>"
228
- ],
229
- "bos_token": null,
230
- "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}",
231
- "clean_up_tokenization_spaces": false,
232
- "eos_token": "<|im_end|>",
233
- "errors": "replace",
234
- "extra_special_tokens": {},
235
- "model_max_length": 1010000,
236
- "pad_token": "<|endoftext|>",
237
- "split_special_tokens": false,
238
- "tokenizer_class": "Qwen2Tokenizer",
239
- "unk_token": null
240
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
person_lora/checkpoint-1224/trainer_state.json DELETED
@@ -1,1036 +0,0 @@
1
- {
2
- "best_global_step": null,
3
- "best_metric": null,
4
- "best_model_checkpoint": null,
5
- "epoch": 4.0,
6
- "eval_steps": 200,
7
- "global_step": 1224,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "epoch": 0.032679738562091505,
14
- "grad_norm": 1.453621506690979,
15
- "learning_rate": 4.8648648648648654e-05,
16
- "loss": 0.6269,
17
- "step": 10
18
- },
19
- {
20
- "epoch": 0.06535947712418301,
21
- "grad_norm": 0.7020823955535889,
22
- "learning_rate": 0.0001027027027027027,
23
- "loss": 0.2545,
24
- "step": 20
25
- },
26
- {
27
- "epoch": 0.09803921568627451,
28
- "grad_norm": 0.2977686822414398,
29
- "learning_rate": 0.00015675675675675676,
30
- "loss": 0.1214,
31
- "step": 30
32
- },
33
- {
34
- "epoch": 0.13071895424836602,
35
- "grad_norm": 0.11059613525867462,
36
- "learning_rate": 0.00019999859903498856,
37
- "loss": 0.1248,
38
- "step": 40
39
- },
40
- {
41
- "epoch": 0.16339869281045752,
42
- "grad_norm": 0.20021583139896393,
43
- "learning_rate": 0.00019994956938114075,
44
- "loss": 0.1142,
45
- "step": 50
46
- },
47
- {
48
- "epoch": 0.19607843137254902,
49
- "grad_norm": 0.21602283418178558,
50
- "learning_rate": 0.00019983053072583596,
51
- "loss": 0.1006,
52
- "step": 60
53
- },
54
- {
55
- "epoch": 0.22875816993464052,
56
- "grad_norm": 0.1741652488708496,
57
- "learning_rate": 0.00019964156644889706,
58
- "loss": 0.098,
59
- "step": 70
60
- },
61
- {
62
- "epoch": 0.26143790849673204,
63
- "grad_norm": 0.20578978955745697,
64
- "learning_rate": 0.0001993828089090768,
65
- "loss": 0.1017,
66
- "step": 80
67
- },
68
- {
69
- "epoch": 0.29411764705882354,
70
- "grad_norm": 0.152154341340065,
71
- "learning_rate": 0.00019905443935134791,
72
- "loss": 0.0716,
73
- "step": 90
74
- },
75
- {
76
- "epoch": 0.32679738562091504,
77
- "grad_norm": 0.20485801994800568,
78
- "learning_rate": 0.00019865668777995147,
79
- "loss": 0.0839,
80
- "step": 100
81
- },
82
- {
83
- "epoch": 0.35947712418300654,
84
- "grad_norm": 0.3825615644454956,
85
- "learning_rate": 0.0001981898327972918,
86
- "loss": 0.0907,
87
- "step": 110
88
- },
89
- {
90
- "epoch": 0.39215686274509803,
91
- "grad_norm": 0.34921425580978394,
92
- "learning_rate": 0.00019765420140879135,
93
- "loss": 0.0598,
94
- "step": 120
95
- },
96
- {
97
- "epoch": 0.42483660130718953,
98
- "grad_norm": 0.3151702582836151,
99
- "learning_rate": 0.00019705016879384201,
100
- "loss": 0.0975,
101
- "step": 130
102
- },
103
- {
104
- "epoch": 0.45751633986928103,
105
- "grad_norm": 0.1447100192308426,
106
- "learning_rate": 0.00019637815804301315,
107
- "loss": 0.0702,
108
- "step": 140
109
- },
110
- {
111
- "epoch": 0.49019607843137253,
112
- "grad_norm": 0.09320889413356781,
113
- "learning_rate": 0.00019563863986170077,
114
- "loss": 0.0687,
115
- "step": 150
116
- },
117
- {
118
- "epoch": 0.5228758169934641,
119
- "grad_norm": 0.3998013436794281,
120
- "learning_rate": 0.000194832132240425,
121
- "loss": 0.0715,
122
- "step": 160
123
- },
124
- {
125
- "epoch": 0.5555555555555556,
126
- "grad_norm": 0.0680910125374794,
127
- "learning_rate": 0.00019395920009200723,
128
- "loss": 0.0739,
129
- "step": 170
130
- },
131
- {
132
- "epoch": 0.5882352941176471,
133
- "grad_norm": 0.158641055226326,
134
- "learning_rate": 0.00019302045485588068,
135
- "loss": 0.0582,
136
- "step": 180
137
- },
138
- {
139
- "epoch": 0.6209150326797386,
140
- "grad_norm": 0.1919500231742859,
141
- "learning_rate": 0.00019201655406981164,
142
- "loss": 0.0741,
143
- "step": 190
144
- },
145
- {
146
- "epoch": 0.6535947712418301,
147
- "grad_norm": 0.18826788663864136,
148
- "learning_rate": 0.00019094820090933195,
149
- "loss": 0.0618,
150
- "step": 200
151
- },
152
- {
153
- "epoch": 0.6535947712418301,
154
- "eval_loss": 0.09742313623428345,
155
- "eval_runtime": 17.1018,
156
- "eval_samples_per_second": 4.269,
157
- "eval_steps_per_second": 2.164,
158
- "step": 200
159
- },
160
- {
161
- "eval_ner_f1": 0.0,
162
- "step": 200
163
- },
164
- {
165
- "eval_ner_precision": 0.0,
166
- "step": 200
167
- },
168
- {
169
- "eval_ner_recall": 0.0,
170
- "step": 200
171
- },
172
- {
173
- "eval_ner_f1_person": 0.0,
174
- "step": 200
175
- },
176
- {
177
- "epoch": 0.6862745098039216,
178
- "grad_norm": 0.2360253632068634,
179
- "learning_rate": 0.00018981614369520405,
180
- "loss": 0.0579,
181
- "step": 210
182
- },
183
- {
184
- "epoch": 0.7189542483660131,
185
- "grad_norm": 0.1330314427614212,
186
- "learning_rate": 0.00018862117536926496,
187
- "loss": 0.0675,
188
- "step": 220
189
- },
190
- {
191
- "epoch": 0.7516339869281046,
192
- "grad_norm": 0.16936901211738586,
193
- "learning_rate": 0.0001873641329390154,
194
- "loss": 0.0529,
195
- "step": 230
196
- },
197
- {
198
- "epoch": 0.7843137254901961,
199
- "grad_norm": 0.08676150441169739,
200
- "learning_rate": 0.00018604589689134372,
201
- "loss": 0.0678,
202
- "step": 240
203
- },
204
- {
205
- "epoch": 0.8169934640522876,
206
- "grad_norm": 0.07660745829343796,
207
- "learning_rate": 0.00018466739057579462,
208
- "loss": 0.0669,
209
- "step": 250
210
- },
211
- {
212
- "epoch": 0.8496732026143791,
213
- "grad_norm": 0.12025874108076096,
214
- "learning_rate": 0.00018322957955781526,
215
- "loss": 0.0744,
216
- "step": 260
217
- },
218
- {
219
- "epoch": 0.8823529411764706,
220
- "grad_norm": 0.2834235429763794,
221
- "learning_rate": 0.00018173347094243146,
222
- "loss": 0.0576,
223
- "step": 270
224
- },
225
- {
226
- "epoch": 0.9150326797385621,
227
- "grad_norm": 0.11870580911636353,
228
- "learning_rate": 0.0001801801126688278,
229
- "loss": 0.0632,
230
- "step": 280
231
- },
232
- {
233
- "epoch": 0.9477124183006536,
234
- "grad_norm": 0.16607636213302612,
235
- "learning_rate": 0.00017857059277632563,
236
- "loss": 0.0608,
237
- "step": 290
238
- },
239
- {
240
- "epoch": 0.9803921568627451,
241
- "grad_norm": 0.11665856838226318,
242
- "learning_rate": 0.0001769060386422733,
243
- "loss": 0.038,
244
- "step": 300
245
- },
246
- {
247
- "epoch": 1.0130718954248366,
248
- "grad_norm": 0.11913657933473587,
249
- "learning_rate": 0.00017518761619238234,
250
- "loss": 0.0434,
251
- "step": 310
252
- },
253
- {
254
- "epoch": 1.0457516339869282,
255
- "grad_norm": 0.07901415973901749,
256
- "learning_rate": 0.0001734165290840626,
257
- "loss": 0.0486,
258
- "step": 320
259
- },
260
- {
261
- "epoch": 1.0784313725490196,
262
- "grad_norm": 0.17951519787311554,
263
- "learning_rate": 0.00017159401786332864,
264
- "loss": 0.0296,
265
- "step": 330
266
- },
267
- {
268
- "epoch": 1.1111111111111112,
269
- "grad_norm": 0.2891804575920105,
270
- "learning_rate": 0.00016972135909586742,
271
- "loss": 0.0337,
272
- "step": 340
273
- },
274
- {
275
- "epoch": 1.1437908496732025,
276
- "grad_norm": 0.14063680171966553,
277
- "learning_rate": 0.00016779986447287677,
278
- "loss": 0.0414,
279
- "step": 350
280
- },
281
- {
282
- "epoch": 1.1764705882352942,
283
- "grad_norm": 0.07126651704311371,
284
- "learning_rate": 0.00016583087989229997,
285
- "loss": 0.0301,
286
- "step": 360
287
- },
288
- {
289
- "epoch": 1.2091503267973855,
290
- "grad_norm": 0.10456930100917816,
291
- "learning_rate": 0.00016381578451610062,
292
- "loss": 0.028,
293
- "step": 370
294
- },
295
- {
296
- "epoch": 1.2418300653594772,
297
- "grad_norm": 0.03307747840881348,
298
- "learning_rate": 0.00016175598980423797,
299
- "loss": 0.0391,
300
- "step": 380
301
- },
302
- {
303
- "epoch": 1.2745098039215685,
304
- "grad_norm": 0.23047687113285065,
305
- "learning_rate": 0.00015965293852601944,
306
- "loss": 0.0546,
307
- "step": 390
308
- },
309
- {
310
- "epoch": 1.3071895424836601,
311
- "grad_norm": 0.0811915472149849,
312
- "learning_rate": 0.00015750810374952226,
313
- "loss": 0.0404,
314
- "step": 400
315
- },
316
- {
317
- "epoch": 1.3071895424836601,
318
- "eval_loss": 0.08959854394197464,
319
- "eval_runtime": 17.3406,
320
- "eval_samples_per_second": 4.21,
321
- "eval_steps_per_second": 2.134,
322
- "step": 400
323
- },
324
- {
325
- "eval_ner_f1": 0.5098039215686274,
326
- "step": 400
327
- },
328
- {
329
- "eval_ner_precision": 0.65,
330
- "step": 400
331
- },
332
- {
333
- "eval_ner_recall": 0.41935483870967744,
334
- "step": 400
335
- },
336
- {
337
- "eval_ner_f1_commodity": 0.0,
338
- "step": 400
339
- },
340
- {
341
- "eval_ner_f1_person": 0.5306122448979592,
342
- "step": 400
343
- },
344
- {
345
- "epoch": 1.3398692810457518,
346
- "grad_norm": 0.04820709675550461,
347
- "learning_rate": 0.00015532298780979336,
348
- "loss": 0.0393,
349
- "step": 410
350
- },
351
- {
352
- "epoch": 1.3725490196078431,
353
- "grad_norm": 0.1625761240720749,
354
- "learning_rate": 0.0001530991212565484,
355
- "loss": 0.0389,
356
- "step": 420
357
- },
358
- {
359
- "epoch": 1.4052287581699345,
360
- "grad_norm": 0.12835846841335297,
361
- "learning_rate": 0.00015083806178210892,
362
- "loss": 0.0282,
363
- "step": 430
364
- },
365
- {
366
- "epoch": 1.4379084967320261,
367
- "grad_norm": 0.15881434082984924,
368
- "learning_rate": 0.00014854139313032726,
369
- "loss": 0.0606,
370
- "step": 440
371
- },
372
- {
373
- "epoch": 1.4705882352941178,
374
- "grad_norm": 0.24892982840538025,
375
- "learning_rate": 0.00014621072398726356,
376
- "loss": 0.0376,
377
- "step": 450
378
- },
379
- {
380
- "epoch": 1.5032679738562091,
381
- "grad_norm": 0.034560319036245346,
382
- "learning_rate": 0.00014384768685439273,
383
- "loss": 0.0282,
384
- "step": 460
385
- },
386
- {
387
- "epoch": 1.5359477124183005,
388
- "grad_norm": 0.1848367601633072,
389
- "learning_rate": 0.0001414539369051298,
390
- "loss": 0.0411,
391
- "step": 470
392
- },
393
- {
394
- "epoch": 1.5686274509803921,
395
- "grad_norm": 0.3837202489376068,
396
- "learning_rate": 0.0001390311508254747,
397
- "loss": 0.0341,
398
- "step": 480
399
- },
400
- {
401
- "epoch": 1.6013071895424837,
402
- "grad_norm": 0.10172578692436218,
403
- "learning_rate": 0.0001365810256395891,
404
- "loss": 0.0325,
405
- "step": 490
406
- },
407
- {
408
- "epoch": 1.6339869281045751,
409
- "grad_norm": 0.18793442845344543,
410
- "learning_rate": 0.000134105277521127,
411
- "loss": 0.0357,
412
- "step": 500
413
- },
414
- {
415
- "epoch": 1.6666666666666665,
416
- "grad_norm": 0.15017642080783844,
417
- "learning_rate": 0.0001316056405911527,
418
- "loss": 0.0232,
419
- "step": 510
420
- },
421
- {
422
- "epoch": 1.6993464052287581,
423
- "grad_norm": 0.06392411887645721,
424
- "learning_rate": 0.0001290838657034874,
425
- "loss": 0.029,
426
- "step": 520
427
- },
428
- {
429
- "epoch": 1.7320261437908497,
430
- "grad_norm": 0.0888790488243103,
431
- "learning_rate": 0.00012654171921833534,
432
- "loss": 0.0428,
433
- "step": 530
434
- },
435
- {
436
- "epoch": 1.7647058823529411,
437
- "grad_norm": 0.06441206485033035,
438
- "learning_rate": 0.00012398098176504872,
439
- "loss": 0.0336,
440
- "step": 540
441
- },
442
- {
443
- "epoch": 1.7973856209150327,
444
- "grad_norm": 0.16848251223564148,
445
- "learning_rate": 0.00012140344699489797,
446
- "loss": 0.0511,
447
- "step": 550
448
- },
449
- {
450
- "epoch": 1.8300653594771243,
451
- "grad_norm": 0.07297755777835846,
452
- "learning_rate": 0.00011881092032472073,
453
- "loss": 0.0284,
454
- "step": 560
455
- },
456
- {
457
- "epoch": 1.8627450980392157,
458
- "grad_norm": 0.09729588031768799,
459
- "learning_rate": 0.00011620521767232988,
460
- "loss": 0.0406,
461
- "step": 570
462
- },
463
- {
464
- "epoch": 1.8954248366013071,
465
- "grad_norm": 0.038536667823791504,
466
- "learning_rate": 0.00011358816418456624,
467
- "loss": 0.0354,
468
- "step": 580
469
- },
470
- {
471
- "epoch": 1.9281045751633987,
472
- "grad_norm": 0.20749039947986603,
473
- "learning_rate": 0.00011096159295888646,
474
- "loss": 0.0241,
475
- "step": 590
476
- },
477
- {
478
- "epoch": 1.9607843137254903,
479
- "grad_norm": 0.13462506234645844,
480
- "learning_rate": 0.00010832734375938269,
481
- "loss": 0.0291,
482
- "step": 600
483
- },
484
- {
485
- "epoch": 1.9607843137254903,
486
- "eval_loss": 0.09001053869724274,
487
- "eval_runtime": 17.3619,
488
- "eval_samples_per_second": 4.205,
489
- "eval_steps_per_second": 2.131,
490
- "step": 600
491
- },
492
- {
493
- "eval_ner_f1": 0.4090909090909091,
494
- "step": 600
495
- },
496
- {
497
- "eval_ner_precision": 0.6923076923076923,
498
- "step": 600
499
- },
500
- {
501
- "eval_ner_recall": 0.2903225806451613,
502
- "step": 600
503
- },
504
- {
505
- "eval_ner_f1_person": 0.4090909090909091,
506
- "step": 600
507
- },
508
- {
509
- "epoch": 1.9934640522875817,
510
- "grad_norm": 0.15671618282794952,
511
- "learning_rate": 0.00010568726172813193,
512
- "loss": 0.0257,
513
- "step": 610
514
- },
515
- {
516
- "epoch": 2.026143790849673,
517
- "grad_norm": 0.03700106590986252,
518
- "learning_rate": 0.00010304319609277888,
519
- "loss": 0.0175,
520
- "step": 620
521
- },
522
- {
523
- "epoch": 2.0588235294117645,
524
- "grad_norm": 0.05351804941892624,
525
- "learning_rate": 0.00010039699887125678,
526
- "loss": 0.0206,
527
- "step": 630
528
- },
529
- {
530
- "epoch": 2.0915032679738563,
531
- "grad_norm": 0.17226466536521912,
532
- "learning_rate": 9.77505235745541e-05,
533
- "loss": 0.0148,
534
- "step": 640
535
- },
536
- {
537
- "epoch": 2.1241830065359477,
538
- "grad_norm": 0.03601714223623276,
539
- "learning_rate": 9.510562390843513e-05,
540
- "loss": 0.0171,
541
- "step": 650
542
- },
543
- {
544
- "epoch": 2.156862745098039,
545
- "grad_norm": 0.034734755754470825,
546
- "learning_rate": 9.246415247502437e-05,
547
- "loss": 0.0264,
548
- "step": 660
549
- },
550
- {
551
- "epoch": 2.189542483660131,
552
- "grad_norm": 0.023934612050652504,
553
- "learning_rate": 8.982795947516392e-05,
554
- "loss": 0.0227,
555
- "step": 670
556
- },
557
- {
558
- "epoch": 2.2222222222222223,
559
- "grad_norm": 0.07543577253818512,
560
- "learning_rate": 8.719889141245256e-05,
561
- "loss": 0.0183,
562
- "step": 680
563
- },
564
- {
565
- "epoch": 2.2549019607843137,
566
- "grad_norm": 0.04695666953921318,
567
- "learning_rate": 8.457878979987507e-05,
568
- "loss": 0.0119,
569
- "step": 690
570
- },
571
- {
572
- "epoch": 2.287581699346405,
573
- "grad_norm": 0.03636680543422699,
574
- "learning_rate": 8.196948986992666e-05,
575
- "loss": 0.0151,
576
- "step": 700
577
- },
578
- {
579
- "epoch": 2.3202614379084965,
580
- "grad_norm": 0.28800758719444275,
581
- "learning_rate": 7.937281928913688e-05,
582
- "loss": 0.0282,
583
- "step": 710
584
- },
585
- {
586
- "epoch": 2.3529411764705883,
587
- "grad_norm": 0.2814720869064331,
588
- "learning_rate": 7.67905968778928e-05,
589
- "loss": 0.0107,
590
- "step": 720
591
- },
592
- {
593
- "epoch": 2.3856209150326797,
594
- "grad_norm": 0.06004492938518524,
595
- "learning_rate": 7.42246313364587e-05,
596
- "loss": 0.0141,
597
- "step": 730
598
- },
599
- {
600
- "epoch": 2.418300653594771,
601
- "grad_norm": 0.48177215456962585,
602
- "learning_rate": 7.167671997808405e-05,
603
- "loss": 0.0148,
604
- "step": 740
605
- },
606
- {
607
- "epoch": 2.450980392156863,
608
- "grad_norm": 0.07847360521554947,
609
- "learning_rate": 6.914864747008762e-05,
610
- "loss": 0.0153,
611
- "step": 750
612
- },
613
- {
614
- "epoch": 2.4836601307189543,
615
- "grad_norm": 0.03728066757321358,
616
- "learning_rate": 6.664218458379933e-05,
617
- "loss": 0.0212,
618
- "step": 760
619
- },
620
- {
621
- "epoch": 2.5163398692810457,
622
- "grad_norm": 0.055430226027965546,
623
- "learning_rate": 6.415908695423534e-05,
624
- "loss": 0.0195,
625
- "step": 770
626
- },
627
- {
628
- "epoch": 2.549019607843137,
629
- "grad_norm": 0.3753442168235779,
630
- "learning_rate": 6.170109385037545e-05,
631
- "loss": 0.0143,
632
- "step": 780
633
- },
634
- {
635
- "epoch": 2.581699346405229,
636
- "grad_norm": 0.11030200123786926,
637
- "learning_rate": 5.926992695690378e-05,
638
- "loss": 0.0148,
639
- "step": 790
640
- },
641
- {
642
- "epoch": 2.6143790849673203,
643
- "grad_norm": 0.07732052356004715,
644
- "learning_rate": 5.68672891682664e-05,
645
- "loss": 0.0204,
646
- "step": 800
647
- },
648
- {
649
- "epoch": 2.6143790849673203,
650
- "eval_loss": 0.09645482152700424,
651
- "eval_runtime": 16.7055,
652
- "eval_samples_per_second": 4.37,
653
- "eval_steps_per_second": 2.215,
654
- "step": 800
655
- },
656
- {
657
- "eval_ner_f1": 0.2564102564102564,
658
- "step": 800
659
- },
660
- {
661
- "eval_ner_precision": 0.625,
662
- "step": 800
663
- },
664
- {
665
- "eval_ner_recall": 0.16129032258064516,
666
- "step": 800
667
- },
668
- {
669
- "eval_ner_f1_person": 0.2564102564102564,
670
- "step": 800
671
- },
672
- {
673
- "epoch": 2.6470588235294117,
674
- "grad_norm": 0.040318384766578674,
675
- "learning_rate": 5.449486339589043e-05,
676
- "loss": 0.0199,
677
- "step": 810
678
- },
679
- {
680
- "epoch": 2.6797385620915035,
681
- "grad_norm": 0.11425317078828812,
682
- "learning_rate": 5.215431138939999e-05,
683
- "loss": 0.0215,
684
- "step": 820
685
- },
686
- {
687
- "epoch": 2.712418300653595,
688
- "grad_norm": 0.07913585007190704,
689
- "learning_rate": 4.984727257265509e-05,
690
- "loss": 0.0094,
691
- "step": 830
692
- },
693
- {
694
- "epoch": 2.7450980392156863,
695
- "grad_norm": 0.11035473644733429,
696
- "learning_rate": 4.757536289542798e-05,
697
- "loss": 0.0183,
698
- "step": 840
699
- },
700
- {
701
- "epoch": 2.7777777777777777,
702
- "grad_norm": 0.04927445575594902,
703
- "learning_rate": 4.534017370152218e-05,
704
- "loss": 0.0181,
705
- "step": 850
706
- },
707
- {
708
- "epoch": 2.810457516339869,
709
- "grad_norm": 0.043610814958810806,
710
- "learning_rate": 4.314327061412656e-05,
711
- "loss": 0.0123,
712
- "step": 860
713
- },
714
- {
715
- "epoch": 2.843137254901961,
716
- "grad_norm": 0.038343094289302826,
717
- "learning_rate": 4.0986192439184864e-05,
718
- "loss": 0.0119,
719
- "step": 870
720
- },
721
- {
722
- "epoch": 2.8758169934640523,
723
- "grad_norm": 0.13254104554653168,
724
- "learning_rate": 3.88704500875498e-05,
725
- "loss": 0.0144,
726
- "step": 880
727
- },
728
- {
729
- "epoch": 2.9084967320261437,
730
- "grad_norm": 0.13977399468421936,
731
- "learning_rate": 3.679752551667541e-05,
732
- "loss": 0.0085,
733
- "step": 890
734
- },
735
- {
736
- "epoch": 2.9411764705882355,
737
- "grad_norm": 0.020304501056671143,
738
- "learning_rate": 3.4768870692590147e-05,
739
- "loss": 0.016,
740
- "step": 900
741
- },
742
- {
743
- "epoch": 2.973856209150327,
744
- "grad_norm": 0.06801599264144897,
745
- "learning_rate": 3.278590657287713e-05,
746
- "loss": 0.0111,
747
- "step": 910
748
- },
749
- {
750
- "epoch": 3.0065359477124183,
751
- "grad_norm": 0.06201297789812088,
752
- "learning_rate": 3.08500221113738e-05,
753
- "loss": 0.014,
754
- "step": 920
755
- },
756
- {
757
- "epoch": 3.0392156862745097,
758
- "grad_norm": 0.04338189586997032,
759
- "learning_rate": 2.8962573285288695e-05,
760
- "loss": 0.0088,
761
- "step": 930
762
- },
763
- {
764
- "epoch": 3.0718954248366015,
765
- "grad_norm": 0.009708147495985031,
766
- "learning_rate": 2.712488214541642e-05,
767
- "loss": 0.0057,
768
- "step": 940
769
- },
770
- {
771
- "epoch": 3.104575163398693,
772
- "grad_norm": 0.04377015307545662,
773
- "learning_rate": 2.5338235890115902e-05,
774
- "loss": 0.011,
775
- "step": 950
776
- },
777
- {
778
- "epoch": 3.1372549019607843,
779
- "grad_norm": 0.09793521463871002,
780
- "learning_rate": 2.360388596370122e-05,
781
- "loss": 0.0112,
782
- "step": 960
783
- },
784
- {
785
- "epoch": 3.1699346405228757,
786
- "grad_norm": 0.01356339082121849,
787
- "learning_rate": 2.1923047179875654e-05,
788
- "loss": 0.0044,
789
- "step": 970
790
- },
791
- {
792
- "epoch": 3.2026143790849675,
793
- "grad_norm": 0.02072557434439659,
794
- "learning_rate": 2.0296896870823766e-05,
795
- "loss": 0.0055,
796
- "step": 980
797
- },
798
- {
799
- "epoch": 3.235294117647059,
800
- "grad_norm": 0.09715542942285538,
801
- "learning_rate": 1.8726574062557012e-05,
802
- "loss": 0.0105,
803
- "step": 990
804
- },
805
- {
806
- "epoch": 3.2679738562091503,
807
- "grad_norm": 0.03253450244665146,
808
- "learning_rate": 1.721317867709057e-05,
809
- "loss": 0.0024,
810
- "step": 1000
811
- },
812
- {
813
- "epoch": 3.2679738562091503,
814
- "eval_loss": 0.1047215685248375,
815
- "eval_runtime": 16.8699,
816
- "eval_samples_per_second": 4.327,
817
- "eval_steps_per_second": 2.193,
818
- "step": 1000
819
- },
820
- {
821
- "eval_ner_f1": 0.05714285714285715,
822
- "step": 1000
823
- },
824
- {
825
- "eval_ner_precision": 0.25,
826
- "step": 1000
827
- },
828
- {
829
- "eval_ner_recall": 0.03225806451612903,
830
- "step": 1000
831
- },
832
- {
833
- "eval_ner_f1_person": 0.05714285714285715,
834
- "step": 1000
835
- },
836
- {
837
- "epoch": 3.3006535947712417,
838
- "grad_norm": 0.09732872247695923,
839
- "learning_rate": 1.5757770762010438e-05,
840
- "loss": 0.0089,
841
- "step": 1010
842
- },
843
- {
844
- "epoch": 3.3333333333333335,
845
- "grad_norm": 0.032012488692998886,
846
- "learning_rate": 1.4361369747970311e-05,
847
- "loss": 0.0077,
848
- "step": 1020
849
- },
850
- {
851
- "epoch": 3.366013071895425,
852
- "grad_norm": 0.08777438849210739,
853
- "learning_rate": 1.3024953734638168e-05,
854
- "loss": 0.0027,
855
- "step": 1030
856
- },
857
- {
858
- "epoch": 3.3986928104575163,
859
- "grad_norm": 0.028469018638134003,
860
- "learning_rate": 1.1749458805592983e-05,
861
- "loss": 0.0077,
862
- "step": 1040
863
- },
864
- {
865
- "epoch": 3.431372549019608,
866
- "grad_norm": 0.06986037641763687,
867
- "learning_rate": 1.0535778372651317e-05,
868
- "loss": 0.0049,
869
- "step": 1050
870
- },
871
- {
872
- "epoch": 3.4640522875816995,
873
- "grad_norm": 0.008913267403841019,
874
- "learning_rate": 9.384762550083037e-06,
875
- "loss": 0.0079,
876
- "step": 1060
877
- },
878
- {
879
- "epoch": 3.496732026143791,
880
- "grad_norm": 0.0693899616599083,
881
- "learning_rate": 8.297217559154535e-06,
882
- "loss": 0.0049,
883
- "step": 1070
884
- },
885
- {
886
- "epoch": 3.5294117647058822,
887
- "grad_norm": 0.13099117577075958,
888
- "learning_rate": 7.273905163416395e-06,
889
- "loss": 0.0053,
890
- "step": 1080
891
- },
892
- {
893
- "epoch": 3.5620915032679736,
894
- "grad_norm": 0.009806470945477486,
895
- "learning_rate": 6.315542135131381e-06,
896
- "loss": 0.004,
897
- "step": 1090
898
- },
899
- {
900
- "epoch": 3.5947712418300655,
901
- "grad_norm": 0.18246600031852722,
902
- "learning_rate": 5.422799753216023e-06,
903
- "loss": 0.0116,
904
- "step": 1100
905
- },
906
- {
907
- "epoch": 3.627450980392157,
908
- "grad_norm": 0.013247921131551266,
909
- "learning_rate": 4.596303333047891e-06,
910
- "loss": 0.0069,
911
- "step": 1110
912
- },
913
- {
914
- "epoch": 3.6601307189542482,
915
- "grad_norm": 0.03266795352101326,
916
- "learning_rate": 3.836631788467671e-06,
917
- "loss": 0.0053,
918
- "step": 1120
919
- },
920
- {
921
- "epoch": 3.69281045751634,
922
- "grad_norm": 0.0331471785902977,
923
- "learning_rate": 3.1443172262828223e-06,
924
- "loss": 0.0031,
925
- "step": 1130
926
- },
927
- {
928
- "epoch": 3.7254901960784315,
929
- "grad_norm": 0.03891368955373764,
930
- "learning_rate": 2.519844573556984e-06,
931
- "loss": 0.0084,
932
- "step": 1140
933
- },
934
- {
935
- "epoch": 3.758169934640523,
936
- "grad_norm": 0.055439382791519165,
937
- "learning_rate": 1.963651237946107e-06,
938
- "loss": 0.0076,
939
- "step": 1150
940
- },
941
- {
942
- "epoch": 3.7908496732026142,
943
- "grad_norm": 0.04643230885267258,
944
- "learning_rate": 1.4761268013191553e-06,
945
- "loss": 0.0052,
946
- "step": 1160
947
- },
948
- {
949
- "epoch": 3.8235294117647056,
950
- "grad_norm": 0.006929585710167885,
951
- "learning_rate": 1.0576127468781783e-06,
952
- "loss": 0.0082,
953
- "step": 1170
954
- },
955
- {
956
- "epoch": 3.8562091503267975,
957
- "grad_norm": 0.06278964877128601,
958
- "learning_rate": 7.084022199686513e-07,
959
- "loss": 0.0071,
960
- "step": 1180
961
- },
962
- {
963
- "epoch": 3.888888888888889,
964
- "grad_norm": 0.04118943214416504,
965
- "learning_rate": 4.2873982274781453e-07,
966
- "loss": 0.0144,
967
- "step": 1190
968
- },
969
- {
970
- "epoch": 3.9215686274509802,
971
- "grad_norm": 0.014042374677956104,
972
- "learning_rate": 2.1882144285477746e-07,
973
- "loss": 0.0029,
974
- "step": 1200
975
- },
976
- {
977
- "epoch": 3.9215686274509802,
978
- "eval_loss": 0.10614956170320511,
979
- "eval_runtime": 18.3168,
980
- "eval_samples_per_second": 3.985,
981
- "eval_steps_per_second": 2.02,
982
- "step": 1200
983
- },
984
- {
985
- "eval_ner_f1": 0.05714285714285715,
986
- "step": 1200
987
- },
988
- {
989
- "eval_ner_precision": 0.25,
990
- "step": 1200
991
- },
992
- {
993
- "eval_ner_recall": 0.03225806451612903,
994
- "step": 1200
995
- },
996
- {
997
- "eval_ner_f1_person": 0.05714285714285715,
998
- "step": 1200
999
- },
1000
- {
1001
- "epoch": 3.954248366013072,
1002
- "grad_norm": 0.16615094244480133,
1003
- "learning_rate": 7.87941162023076e-08,
1004
- "loss": 0.0037,
1005
- "step": 1210
1006
- },
1007
- {
1008
- "epoch": 3.9869281045751634,
1009
- "grad_norm": 0.019119175150990486,
1010
- "learning_rate": 8.755923986480952e-09,
1011
- "loss": 0.0078,
1012
- "step": 1220
1013
- }
1014
- ],
1015
- "logging_steps": 10,
1016
- "max_steps": 1224,
1017
- "num_input_tokens_seen": 0,
1018
- "num_train_epochs": 4,
1019
- "save_steps": 200,
1020
- "stateful_callbacks": {
1021
- "TrainerControl": {
1022
- "args": {
1023
- "should_epoch_stop": false,
1024
- "should_evaluate": false,
1025
- "should_log": false,
1026
- "should_save": true,
1027
- "should_training_stop": true
1028
- },
1029
- "attributes": {}
1030
- }
1031
- },
1032
- "total_flos": 2.033937038405929e+17,
1033
- "train_batch_size": 2,
1034
- "trial_name": null,
1035
- "trial_params": null
1036
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
person_lora/checkpoint-1224/training_args.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:822f3da171fb89395f6d9447d0736b055ce3407cae8df6d7d4b3e34103c10c62
3
- size 5713
 
 
 
 
person_lora/checkpoint-1224/vocab.json DELETED
The diff for this file is too large to render. See raw diff