krajnish95 commited on
Commit
262344c
·
verified ·
1 Parent(s): 87969e1

Upload fine-tuned SOQL model

Browse files
README.md CHANGED
@@ -1,3 +1,207 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: deepseek-ai/deepseek-coder-1.3b-base
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:deepseek-ai/deepseek-coder-1.3b-base
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.0
adapter_config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "deepseek-ai/deepseek-coder-1.3b-base",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.1,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 16,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "q_proj",
33
+ "v_proj"
34
+ ],
35
+ "target_parameters": null,
36
+ "task_type": "CAUSAL_LM",
37
+ "trainable_token_indices": null,
38
+ "use_dora": false,
39
+ "use_qalora": false,
40
+ "use_rslora": false
41
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0aea056688ac952d67d234e97fd349e3b2ceaff98af405e3d5979feaeee0d733
3
+ size 12595704
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c926b45cac5b5744899be00bc3d5e32c253e9a9ae956fdb1c8d5742217043d6d
3
+ size 25248459
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02d46e8cfb95e0d430bae929a69385fe72ef665c67a250d5a140a9db917d7010
3
+ size 14645
scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f38a0767f9f6090e9b15d65b7c1bb7ce6bb9e60f653012462b16e0441e508b06
3
+ size 1383
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4970cbda64f6a5c051b8154b09ecef6a3686c122d275959ec337948092f7a4f
3
+ size 1465
trainer_state.json ADDED
@@ -0,0 +1,1448 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 3.0,
6
+ "eval_steps": 200,
7
+ "global_step": 5070,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.014795088030773782,
14
+ "grad_norm": 0.08312089741230011,
15
+ "learning_rate": 4.8e-05,
16
+ "loss": 25.3001,
17
+ "step": 25
18
+ },
19
+ {
20
+ "epoch": 0.029590176061547565,
21
+ "grad_norm": 0.16134510934352875,
22
+ "learning_rate": 9.8e-05,
23
+ "loss": 25.2653,
24
+ "step": 50
25
+ },
26
+ {
27
+ "epoch": 0.04438526409232135,
28
+ "grad_norm": 0.20426703989505768,
29
+ "learning_rate": 0.000148,
30
+ "loss": 25.1317,
31
+ "step": 75
32
+ },
33
+ {
34
+ "epoch": 0.05918035212309513,
35
+ "grad_norm": 0.12948599457740784,
36
+ "learning_rate": 0.00019800000000000002,
37
+ "loss": 25.1001,
38
+ "step": 100
39
+ },
40
+ {
41
+ "epoch": 0.07397544015386892,
42
+ "grad_norm": 0.12529096007347107,
43
+ "learning_rate": 0.00019903420523138834,
44
+ "loss": 25.1769,
45
+ "step": 125
46
+ },
47
+ {
48
+ "epoch": 0.0887705281846427,
49
+ "grad_norm": 0.1009916439652443,
50
+ "learning_rate": 0.00019802816901408452,
51
+ "loss": 25.173,
52
+ "step": 150
53
+ },
54
+ {
55
+ "epoch": 0.10356561621541648,
56
+ "grad_norm": 0.11219477653503418,
57
+ "learning_rate": 0.0001970221327967807,
58
+ "loss": 25.0957,
59
+ "step": 175
60
+ },
61
+ {
62
+ "epoch": 0.11836070424619026,
63
+ "grad_norm": 0.08366899192333221,
64
+ "learning_rate": 0.00019601609657947687,
65
+ "loss": 25.069,
66
+ "step": 200
67
+ },
68
+ {
69
+ "epoch": 0.13315579227696406,
70
+ "grad_norm": 0.05875258892774582,
71
+ "learning_rate": 0.00019501006036217304,
72
+ "loss": 25.0362,
73
+ "step": 225
74
+ },
75
+ {
76
+ "epoch": 0.14795088030773784,
77
+ "grad_norm": 0.06766419112682343,
78
+ "learning_rate": 0.00019400402414486922,
79
+ "loss": 25.0112,
80
+ "step": 250
81
+ },
82
+ {
83
+ "epoch": 0.16274596833851163,
84
+ "grad_norm": 0.07124276459217072,
85
+ "learning_rate": 0.0001929979879275654,
86
+ "loss": 25.0194,
87
+ "step": 275
88
+ },
89
+ {
90
+ "epoch": 0.1775410563692854,
91
+ "grad_norm": 0.09075228869915009,
92
+ "learning_rate": 0.0001919919517102616,
93
+ "loss": 25.1531,
94
+ "step": 300
95
+ },
96
+ {
97
+ "epoch": 0.19233614440005917,
98
+ "grad_norm": 0.06596764922142029,
99
+ "learning_rate": 0.00019098591549295774,
100
+ "loss": 25.0991,
101
+ "step": 325
102
+ },
103
+ {
104
+ "epoch": 0.20713123243083295,
105
+ "grad_norm": 0.06218697875738144,
106
+ "learning_rate": 0.00018997987927565392,
107
+ "loss": 25.0787,
108
+ "step": 350
109
+ },
110
+ {
111
+ "epoch": 0.22192632046160674,
112
+ "grad_norm": 0.07952570170164108,
113
+ "learning_rate": 0.00018897384305835012,
114
+ "loss": 25.0317,
115
+ "step": 375
116
+ },
117
+ {
118
+ "epoch": 0.23672140849238052,
119
+ "grad_norm": 0.06121606379747391,
120
+ "learning_rate": 0.0001879678068410463,
121
+ "loss": 25.0975,
122
+ "step": 400
123
+ },
124
+ {
125
+ "epoch": 0.25151649652315433,
126
+ "grad_norm": 0.06957080215215683,
127
+ "learning_rate": 0.00018696177062374247,
128
+ "loss": 25.1054,
129
+ "step": 425
130
+ },
131
+ {
132
+ "epoch": 0.2663115845539281,
133
+ "grad_norm": 0.06280346214771271,
134
+ "learning_rate": 0.00018595573440643862,
135
+ "loss": 25.0019,
136
+ "step": 450
137
+ },
138
+ {
139
+ "epoch": 0.2811066725847019,
140
+ "grad_norm": 0.0642634779214859,
141
+ "learning_rate": 0.00018494969818913482,
142
+ "loss": 25.1581,
143
+ "step": 475
144
+ },
145
+ {
146
+ "epoch": 0.2959017606154757,
147
+ "grad_norm": 0.05680055171251297,
148
+ "learning_rate": 0.000183943661971831,
149
+ "loss": 25.0826,
150
+ "step": 500
151
+ },
152
+ {
153
+ "epoch": 0.31069684864624947,
154
+ "grad_norm": 0.038917265832424164,
155
+ "learning_rate": 0.00018293762575452717,
156
+ "loss": 25.1511,
157
+ "step": 525
158
+ },
159
+ {
160
+ "epoch": 0.32549193667702325,
161
+ "grad_norm": 0.047237247228622437,
162
+ "learning_rate": 0.00018193158953722335,
163
+ "loss": 25.022,
164
+ "step": 550
165
+ },
166
+ {
167
+ "epoch": 0.34028702470779704,
168
+ "grad_norm": 0.04916401579976082,
169
+ "learning_rate": 0.00018092555331991953,
170
+ "loss": 25.0202,
171
+ "step": 575
172
+ },
173
+ {
174
+ "epoch": 0.3550821127385708,
175
+ "grad_norm": 0.06772845983505249,
176
+ "learning_rate": 0.0001799195171026157,
177
+ "loss": 25.0699,
178
+ "step": 600
179
+ },
180
+ {
181
+ "epoch": 0.3698772007693446,
182
+ "grad_norm": 0.048369135707616806,
183
+ "learning_rate": 0.00017891348088531188,
184
+ "loss": 25.0411,
185
+ "step": 625
186
+ },
187
+ {
188
+ "epoch": 0.38467228880011833,
189
+ "grad_norm": 0.045925214886665344,
190
+ "learning_rate": 0.00017790744466800805,
191
+ "loss": 25.1226,
192
+ "step": 650
193
+ },
194
+ {
195
+ "epoch": 0.3994673768308921,
196
+ "grad_norm": 0.0681011825799942,
197
+ "learning_rate": 0.00017690140845070425,
198
+ "loss": 25.0724,
199
+ "step": 675
200
+ },
201
+ {
202
+ "epoch": 0.4142624648616659,
203
+ "grad_norm": 0.06605461239814758,
204
+ "learning_rate": 0.0001758953722334004,
205
+ "loss": 25.0152,
206
+ "step": 700
207
+ },
208
+ {
209
+ "epoch": 0.4290575528924397,
210
+ "grad_norm": 0.05168164521455765,
211
+ "learning_rate": 0.00017488933601609658,
212
+ "loss": 25.1138,
213
+ "step": 725
214
+ },
215
+ {
216
+ "epoch": 0.44385264092321347,
217
+ "grad_norm": 0.03383536636829376,
218
+ "learning_rate": 0.00017388329979879275,
219
+ "loss": 25.1285,
220
+ "step": 750
221
+ },
222
+ {
223
+ "epoch": 0.45864772895398725,
224
+ "grad_norm": 0.03839387744665146,
225
+ "learning_rate": 0.00017287726358148896,
226
+ "loss": 25.0419,
227
+ "step": 775
228
+ },
229
+ {
230
+ "epoch": 0.47344281698476104,
231
+ "grad_norm": 0.05215064063668251,
232
+ "learning_rate": 0.00017187122736418513,
233
+ "loss": 25.0599,
234
+ "step": 800
235
+ },
236
+ {
237
+ "epoch": 0.4882379050155348,
238
+ "grad_norm": 0.05809099227190018,
239
+ "learning_rate": 0.00017086519114688128,
240
+ "loss": 25.0682,
241
+ "step": 825
242
+ },
243
+ {
244
+ "epoch": 0.5030329930463087,
245
+ "grad_norm": 0.04687444493174553,
246
+ "learning_rate": 0.00016985915492957746,
247
+ "loss": 25.1285,
248
+ "step": 850
249
+ },
250
+ {
251
+ "epoch": 0.5178280810770824,
252
+ "grad_norm": 0.043396566063165665,
253
+ "learning_rate": 0.00016885311871227366,
254
+ "loss": 25.0015,
255
+ "step": 875
256
+ },
257
+ {
258
+ "epoch": 0.5326231691078562,
259
+ "grad_norm": 0.04217955097556114,
260
+ "learning_rate": 0.00016784708249496983,
261
+ "loss": 25.0274,
262
+ "step": 900
263
+ },
264
+ {
265
+ "epoch": 0.54741825713863,
266
+ "grad_norm": 0.04644334316253662,
267
+ "learning_rate": 0.000166841046277666,
268
+ "loss": 25.1121,
269
+ "step": 925
270
+ },
271
+ {
272
+ "epoch": 0.5622133451694038,
273
+ "grad_norm": 0.06119885668158531,
274
+ "learning_rate": 0.00016583501006036218,
275
+ "loss": 25.1759,
276
+ "step": 950
277
+ },
278
+ {
279
+ "epoch": 0.5770084332001776,
280
+ "grad_norm": 0.05505023151636124,
281
+ "learning_rate": 0.00016482897384305836,
282
+ "loss": 25.0348,
283
+ "step": 975
284
+ },
285
+ {
286
+ "epoch": 0.5918035212309514,
287
+ "grad_norm": 0.036642227321863174,
288
+ "learning_rate": 0.00016382293762575454,
289
+ "loss": 25.1238,
290
+ "step": 1000
291
+ },
292
+ {
293
+ "epoch": 0.6065986092617252,
294
+ "grad_norm": 0.020652858540415764,
295
+ "learning_rate": 0.0001628169014084507,
296
+ "loss": 25.042,
297
+ "step": 1025
298
+ },
299
+ {
300
+ "epoch": 0.6213936972924989,
301
+ "grad_norm": 0.03491384536027908,
302
+ "learning_rate": 0.00016181086519114689,
303
+ "loss": 25.03,
304
+ "step": 1050
305
+ },
306
+ {
307
+ "epoch": 0.6361887853232727,
308
+ "grad_norm": 0.04702286049723625,
309
+ "learning_rate": 0.00016080482897384306,
310
+ "loss": 25.055,
311
+ "step": 1075
312
+ },
313
+ {
314
+ "epoch": 0.6509838733540465,
315
+ "grad_norm": 0.04155293107032776,
316
+ "learning_rate": 0.00015979879275653924,
317
+ "loss": 25.0551,
318
+ "step": 1100
319
+ },
320
+ {
321
+ "epoch": 0.6657789613848203,
322
+ "grad_norm": 0.09428809583187103,
323
+ "learning_rate": 0.0001587927565392354,
324
+ "loss": 25.046,
325
+ "step": 1125
326
+ },
327
+ {
328
+ "epoch": 0.6805740494155941,
329
+ "grad_norm": 0.0833851769566536,
330
+ "learning_rate": 0.00015778672032193162,
331
+ "loss": 25.1491,
332
+ "step": 1150
333
+ },
334
+ {
335
+ "epoch": 0.6953691374463679,
336
+ "grad_norm": 0.05562705919146538,
337
+ "learning_rate": 0.0001567806841046278,
338
+ "loss": 25.1716,
339
+ "step": 1175
340
+ },
341
+ {
342
+ "epoch": 0.7101642254771416,
343
+ "grad_norm": 0.03471198305487633,
344
+ "learning_rate": 0.00015577464788732394,
345
+ "loss": 25.0191,
346
+ "step": 1200
347
+ },
348
+ {
349
+ "epoch": 0.7249593135079154,
350
+ "grad_norm": 0.0596119686961174,
351
+ "learning_rate": 0.00015476861167002011,
352
+ "loss": 25.0971,
353
+ "step": 1225
354
+ },
355
+ {
356
+ "epoch": 0.7397544015386892,
357
+ "grad_norm": 0.04158185049891472,
358
+ "learning_rate": 0.00015376257545271632,
359
+ "loss": 25.0796,
360
+ "step": 1250
361
+ },
362
+ {
363
+ "epoch": 0.7545494895694629,
364
+ "grad_norm": 0.04319076985120773,
365
+ "learning_rate": 0.0001527565392354125,
366
+ "loss": 25.0667,
367
+ "step": 1275
368
+ },
369
+ {
370
+ "epoch": 0.7693445776002367,
371
+ "grad_norm": 0.03822188079357147,
372
+ "learning_rate": 0.00015175050301810867,
373
+ "loss": 24.9891,
374
+ "step": 1300
375
+ },
376
+ {
377
+ "epoch": 0.7841396656310105,
378
+ "grad_norm": 0.04330127686262131,
379
+ "learning_rate": 0.00015074446680080482,
380
+ "loss": 25.0619,
381
+ "step": 1325
382
+ },
383
+ {
384
+ "epoch": 0.7989347536617842,
385
+ "grad_norm": 0.041904184967279434,
386
+ "learning_rate": 0.00014973843058350102,
387
+ "loss": 25.0861,
388
+ "step": 1350
389
+ },
390
+ {
391
+ "epoch": 0.813729841692558,
392
+ "grad_norm": 0.033138617873191833,
393
+ "learning_rate": 0.0001487323943661972,
394
+ "loss": 25.0589,
395
+ "step": 1375
396
+ },
397
+ {
398
+ "epoch": 0.8285249297233318,
399
+ "grad_norm": 0.041870083659887314,
400
+ "learning_rate": 0.00014772635814889337,
401
+ "loss": 25.0909,
402
+ "step": 1400
403
+ },
404
+ {
405
+ "epoch": 0.8433200177541056,
406
+ "grad_norm": 0.04157957062125206,
407
+ "learning_rate": 0.00014672032193158955,
408
+ "loss": 24.993,
409
+ "step": 1425
410
+ },
411
+ {
412
+ "epoch": 0.8581151057848794,
413
+ "grad_norm": 0.07715047895908356,
414
+ "learning_rate": 0.00014571428571428572,
415
+ "loss": 25.064,
416
+ "step": 1450
417
+ },
418
+ {
419
+ "epoch": 0.8729101938156532,
420
+ "grad_norm": 0.034074705094099045,
421
+ "learning_rate": 0.0001447082494969819,
422
+ "loss": 25.0995,
423
+ "step": 1475
424
+ },
425
+ {
426
+ "epoch": 0.8877052818464269,
427
+ "grad_norm": 0.030254002660512924,
428
+ "learning_rate": 0.00014370221327967807,
429
+ "loss": 25.059,
430
+ "step": 1500
431
+ },
432
+ {
433
+ "epoch": 0.9025003698772007,
434
+ "grad_norm": 0.031632065773010254,
435
+ "learning_rate": 0.00014269617706237425,
436
+ "loss": 25.0555,
437
+ "step": 1525
438
+ },
439
+ {
440
+ "epoch": 0.9172954579079745,
441
+ "grad_norm": 0.041883744299411774,
442
+ "learning_rate": 0.00014169014084507045,
443
+ "loss": 25.077,
444
+ "step": 1550
445
+ },
446
+ {
447
+ "epoch": 0.9320905459387483,
448
+ "grad_norm": 0.03643975779414177,
449
+ "learning_rate": 0.0001406841046277666,
450
+ "loss": 25.0556,
451
+ "step": 1575
452
+ },
453
+ {
454
+ "epoch": 0.9468856339695221,
455
+ "grad_norm": 0.0535462461411953,
456
+ "learning_rate": 0.00013967806841046277,
457
+ "loss": 25.0945,
458
+ "step": 1600
459
+ },
460
+ {
461
+ "epoch": 0.9616807220002959,
462
+ "grad_norm": 0.0349336676299572,
463
+ "learning_rate": 0.00013867203219315898,
464
+ "loss": 25.0823,
465
+ "step": 1625
466
+ },
467
+ {
468
+ "epoch": 0.9764758100310696,
469
+ "grad_norm": 0.041822321712970734,
470
+ "learning_rate": 0.00013766599597585515,
471
+ "loss": 25.0856,
472
+ "step": 1650
473
+ },
474
+ {
475
+ "epoch": 0.9912708980618434,
476
+ "grad_norm": 0.04077128693461418,
477
+ "learning_rate": 0.0001366599597585513,
478
+ "loss": 25.0666,
479
+ "step": 1675
480
+ },
481
+ {
482
+ "epoch": 1.0059180352123096,
483
+ "grad_norm": 0.019722955301404,
484
+ "learning_rate": 0.00013565392354124748,
485
+ "loss": 25.0682,
486
+ "step": 1700
487
+ },
488
+ {
489
+ "epoch": 1.0207131232430833,
490
+ "grad_norm": 0.03321698307991028,
491
+ "learning_rate": 0.00013464788732394368,
492
+ "loss": 25.1371,
493
+ "step": 1725
494
+ },
495
+ {
496
+ "epoch": 1.0355082112738572,
497
+ "grad_norm": 0.041920218616724014,
498
+ "learning_rate": 0.00013364185110663985,
499
+ "loss": 24.9934,
500
+ "step": 1750
501
+ },
502
+ {
503
+ "epoch": 1.0503032993046308,
504
+ "grad_norm": 0.03674553334712982,
505
+ "learning_rate": 0.00013263581488933603,
506
+ "loss": 25.1142,
507
+ "step": 1775
508
+ },
509
+ {
510
+ "epoch": 1.0650983873354047,
511
+ "grad_norm": 0.03412509709596634,
512
+ "learning_rate": 0.00013162977867203218,
513
+ "loss": 25.0697,
514
+ "step": 1800
515
+ },
516
+ {
517
+ "epoch": 1.0798934753661784,
518
+ "grad_norm": 0.039162006229162216,
519
+ "learning_rate": 0.00013062374245472838,
520
+ "loss": 25.0618,
521
+ "step": 1825
522
+ },
523
+ {
524
+ "epoch": 1.0946885633969523,
525
+ "grad_norm": 0.06998290121555328,
526
+ "learning_rate": 0.00012961770623742456,
527
+ "loss": 25.1045,
528
+ "step": 1850
529
+ },
530
+ {
531
+ "epoch": 1.109483651427726,
532
+ "grad_norm": 0.029092537239193916,
533
+ "learning_rate": 0.00012861167002012073,
534
+ "loss": 24.9204,
535
+ "step": 1875
536
+ },
537
+ {
538
+ "epoch": 1.1242787394584999,
539
+ "grad_norm": 0.027615416795015335,
540
+ "learning_rate": 0.0001276056338028169,
541
+ "loss": 25.066,
542
+ "step": 1900
543
+ },
544
+ {
545
+ "epoch": 1.1390738274892735,
546
+ "grad_norm": 0.029055271297693253,
547
+ "learning_rate": 0.00012659959758551308,
548
+ "loss": 25.0113,
549
+ "step": 1925
550
+ },
551
+ {
552
+ "epoch": 1.1538689155200474,
553
+ "grad_norm": 0.03415543958544731,
554
+ "learning_rate": 0.00012559356136820926,
555
+ "loss": 25.025,
556
+ "step": 1950
557
+ },
558
+ {
559
+ "epoch": 1.1686640035508211,
560
+ "grad_norm": 0.04656201973557472,
561
+ "learning_rate": 0.00012458752515090543,
562
+ "loss": 25.0861,
563
+ "step": 1975
564
+ },
565
+ {
566
+ "epoch": 1.1834590915815948,
567
+ "grad_norm": 0.03144572675228119,
568
+ "learning_rate": 0.0001235814889336016,
569
+ "loss": 25.025,
570
+ "step": 2000
571
+ },
572
+ {
573
+ "epoch": 1.1982541796123687,
574
+ "grad_norm": 0.036187801510095596,
575
+ "learning_rate": 0.0001225754527162978,
576
+ "loss": 25.1279,
577
+ "step": 2025
578
+ },
579
+ {
580
+ "epoch": 1.2130492676431426,
581
+ "grad_norm": 0.031620871275663376,
582
+ "learning_rate": 0.00012156941649899396,
583
+ "loss": 24.9931,
584
+ "step": 2050
585
+ },
586
+ {
587
+ "epoch": 1.2278443556739163,
588
+ "grad_norm": 0.034921254962682724,
589
+ "learning_rate": 0.00012056338028169015,
590
+ "loss": 25.0693,
591
+ "step": 2075
592
+ },
593
+ {
594
+ "epoch": 1.24263944370469,
595
+ "grad_norm": 0.03131638467311859,
596
+ "learning_rate": 0.00011955734406438632,
597
+ "loss": 25.0967,
598
+ "step": 2100
599
+ },
600
+ {
601
+ "epoch": 1.2574345317354638,
602
+ "grad_norm": 0.03819598630070686,
603
+ "learning_rate": 0.0001185513078470825,
604
+ "loss": 25.1187,
605
+ "step": 2125
606
+ },
607
+ {
608
+ "epoch": 1.2722296197662377,
609
+ "grad_norm": 0.02910611405968666,
610
+ "learning_rate": 0.00011754527162977869,
611
+ "loss": 25.1261,
612
+ "step": 2150
613
+ },
614
+ {
615
+ "epoch": 1.2870247077970114,
616
+ "grad_norm": 0.028234301134943962,
617
+ "learning_rate": 0.00011653923541247485,
618
+ "loss": 25.1246,
619
+ "step": 2175
620
+ },
621
+ {
622
+ "epoch": 1.301819795827785,
623
+ "grad_norm": 0.03955462947487831,
624
+ "learning_rate": 0.00011553319919517103,
625
+ "loss": 25.1732,
626
+ "step": 2200
627
+ },
628
+ {
629
+ "epoch": 1.316614883858559,
630
+ "grad_norm": 0.06937497854232788,
631
+ "learning_rate": 0.00011452716297786721,
632
+ "loss": 25.1451,
633
+ "step": 2225
634
+ },
635
+ {
636
+ "epoch": 1.3314099718893329,
637
+ "grad_norm": 0.025304747745394707,
638
+ "learning_rate": 0.00011352112676056339,
639
+ "loss": 25.0555,
640
+ "step": 2250
641
+ },
642
+ {
643
+ "epoch": 1.3462050599201065,
644
+ "grad_norm": 0.03724834695458412,
645
+ "learning_rate": 0.00011251509054325957,
646
+ "loss": 25.0753,
647
+ "step": 2275
648
+ },
649
+ {
650
+ "epoch": 1.3610001479508802,
651
+ "grad_norm": 0.033879607915878296,
652
+ "learning_rate": 0.00011150905432595573,
653
+ "loss": 25.0954,
654
+ "step": 2300
655
+ },
656
+ {
657
+ "epoch": 1.375795235981654,
658
+ "grad_norm": 0.024116437882184982,
659
+ "learning_rate": 0.00011050301810865192,
660
+ "loss": 25.0815,
661
+ "step": 2325
662
+ },
663
+ {
664
+ "epoch": 1.390590324012428,
665
+ "grad_norm": 0.0369427315890789,
666
+ "learning_rate": 0.00010949698189134809,
667
+ "loss": 25.1871,
668
+ "step": 2350
669
+ },
670
+ {
671
+ "epoch": 1.4053854120432017,
672
+ "grad_norm": 0.03194064274430275,
673
+ "learning_rate": 0.00010849094567404428,
674
+ "loss": 25.0658,
675
+ "step": 2375
676
+ },
677
+ {
678
+ "epoch": 1.4201805000739753,
679
+ "grad_norm": 0.028080854564905167,
680
+ "learning_rate": 0.00010748490945674046,
681
+ "loss": 25.072,
682
+ "step": 2400
683
+ },
684
+ {
685
+ "epoch": 1.4349755881047492,
686
+ "grad_norm": 0.028207939118146896,
687
+ "learning_rate": 0.00010647887323943662,
688
+ "loss": 25.0783,
689
+ "step": 2425
690
+ },
691
+ {
692
+ "epoch": 1.4497706761355231,
693
+ "grad_norm": 0.028028592467308044,
694
+ "learning_rate": 0.0001054728370221328,
695
+ "loss": 25.0861,
696
+ "step": 2450
697
+ },
698
+ {
699
+ "epoch": 1.4645657641662968,
700
+ "grad_norm": 0.03247331827878952,
701
+ "learning_rate": 0.00010446680080482898,
702
+ "loss": 25.096,
703
+ "step": 2475
704
+ },
705
+ {
706
+ "epoch": 1.4793608521970705,
707
+ "grad_norm": 0.031846486032009125,
708
+ "learning_rate": 0.00010346076458752516,
709
+ "loss": 25.1463,
710
+ "step": 2500
711
+ },
712
+ {
713
+ "epoch": 1.4941559402278444,
714
+ "grad_norm": 0.02977578528225422,
715
+ "learning_rate": 0.00010245472837022135,
716
+ "loss": 25.0884,
717
+ "step": 2525
718
+ },
719
+ {
720
+ "epoch": 1.5089510282586183,
721
+ "grad_norm": 0.028014151379466057,
722
+ "learning_rate": 0.0001014486921529175,
723
+ "loss": 25.1056,
724
+ "step": 2550
725
+ },
726
+ {
727
+ "epoch": 1.523746116289392,
728
+ "grad_norm": 0.03780132159590721,
729
+ "learning_rate": 0.00010044265593561368,
730
+ "loss": 25.0724,
731
+ "step": 2575
732
+ },
733
+ {
734
+ "epoch": 1.5385412043201656,
735
+ "grad_norm": 0.0535607747733593,
736
+ "learning_rate": 9.943661971830986e-05,
737
+ "loss": 25.078,
738
+ "step": 2600
739
+ },
740
+ {
741
+ "epoch": 1.5533362923509395,
742
+ "grad_norm": 0.03189392760396004,
743
+ "learning_rate": 9.843058350100605e-05,
744
+ "loss": 25.0673,
745
+ "step": 2625
746
+ },
747
+ {
748
+ "epoch": 1.5681313803817134,
749
+ "grad_norm": 0.049738895148038864,
750
+ "learning_rate": 9.742454728370221e-05,
751
+ "loss": 24.9634,
752
+ "step": 2650
753
+ },
754
+ {
755
+ "epoch": 1.582926468412487,
756
+ "grad_norm": 0.02845073491334915,
757
+ "learning_rate": 9.64185110663984e-05,
758
+ "loss": 25.0932,
759
+ "step": 2675
760
+ },
761
+ {
762
+ "epoch": 1.5977215564432607,
763
+ "grad_norm": 0.023111697286367416,
764
+ "learning_rate": 9.541247484909458e-05,
765
+ "loss": 25.0889,
766
+ "step": 2700
767
+ },
768
+ {
769
+ "epoch": 1.6125166444740346,
770
+ "grad_norm": 0.029083825647830963,
771
+ "learning_rate": 9.440643863179075e-05,
772
+ "loss": 25.0686,
773
+ "step": 2725
774
+ },
775
+ {
776
+ "epoch": 1.6273117325048085,
777
+ "grad_norm": 0.02137155272066593,
778
+ "learning_rate": 9.340040241448693e-05,
779
+ "loss": 25.0946,
780
+ "step": 2750
781
+ },
782
+ {
783
+ "epoch": 1.6421068205355822,
784
+ "grad_norm": 0.0255599282681942,
785
+ "learning_rate": 9.23943661971831e-05,
786
+ "loss": 24.9854,
787
+ "step": 2775
788
+ },
789
+ {
790
+ "epoch": 1.6569019085663559,
791
+ "grad_norm": 0.029308602213859558,
792
+ "learning_rate": 9.138832997987928e-05,
793
+ "loss": 25.082,
794
+ "step": 2800
795
+ },
796
+ {
797
+ "epoch": 1.6716969965971298,
798
+ "grad_norm": 0.03403579071164131,
799
+ "learning_rate": 9.038229376257545e-05,
800
+ "loss": 25.0498,
801
+ "step": 2825
802
+ },
803
+ {
804
+ "epoch": 1.6864920846279037,
805
+ "grad_norm": 0.03986509144306183,
806
+ "learning_rate": 8.937625754527164e-05,
807
+ "loss": 25.018,
808
+ "step": 2850
809
+ },
810
+ {
811
+ "epoch": 1.7012871726586773,
812
+ "grad_norm": 0.03191569447517395,
813
+ "learning_rate": 8.83702213279678e-05,
814
+ "loss": 24.9913,
815
+ "step": 2875
816
+ },
817
+ {
818
+ "epoch": 1.716082260689451,
819
+ "grad_norm": 0.04105932265520096,
820
+ "learning_rate": 8.736418511066399e-05,
821
+ "loss": 25.0711,
822
+ "step": 2900
823
+ },
824
+ {
825
+ "epoch": 1.730877348720225,
826
+ "grad_norm": 0.025144483894109726,
827
+ "learning_rate": 8.635814889336017e-05,
828
+ "loss": 25.1367,
829
+ "step": 2925
830
+ },
831
+ {
832
+ "epoch": 1.7456724367509988,
833
+ "grad_norm": 0.043975986540317535,
834
+ "learning_rate": 8.535211267605634e-05,
835
+ "loss": 25.0362,
836
+ "step": 2950
837
+ },
838
+ {
839
+ "epoch": 1.7604675247817725,
840
+ "grad_norm": 0.03802431747317314,
841
+ "learning_rate": 8.434607645875252e-05,
842
+ "loss": 25.0193,
843
+ "step": 2975
844
+ },
845
+ {
846
+ "epoch": 1.7752626128125462,
847
+ "grad_norm": 0.04078783094882965,
848
+ "learning_rate": 8.33400402414487e-05,
849
+ "loss": 25.1258,
850
+ "step": 3000
851
+ },
852
+ {
853
+ "epoch": 1.79005770084332,
854
+ "grad_norm": 0.03382691368460655,
855
+ "learning_rate": 8.233400402414487e-05,
856
+ "loss": 25.1644,
857
+ "step": 3025
858
+ },
859
+ {
860
+ "epoch": 1.804852788874094,
861
+ "grad_norm": 0.038282644003629684,
862
+ "learning_rate": 8.132796780684106e-05,
863
+ "loss": 25.0844,
864
+ "step": 3050
865
+ },
866
+ {
867
+ "epoch": 1.8196478769048676,
868
+ "grad_norm": 0.035348743200302124,
869
+ "learning_rate": 8.032193158953722e-05,
870
+ "loss": 25.0073,
871
+ "step": 3075
872
+ },
873
+ {
874
+ "epoch": 1.8344429649356413,
875
+ "grad_norm": 0.02938913181424141,
876
+ "learning_rate": 7.931589537223341e-05,
877
+ "loss": 25.1201,
878
+ "step": 3100
879
+ },
880
+ {
881
+ "epoch": 1.8492380529664152,
882
+ "grad_norm": 0.07416849583387375,
883
+ "learning_rate": 7.830985915492957e-05,
884
+ "loss": 25.0517,
885
+ "step": 3125
886
+ },
887
+ {
888
+ "epoch": 1.864033140997189,
889
+ "grad_norm": 0.05911872908473015,
890
+ "learning_rate": 7.730382293762576e-05,
891
+ "loss": 25.085,
892
+ "step": 3150
893
+ },
894
+ {
895
+ "epoch": 1.8788282290279628,
896
+ "grad_norm": 0.03909540921449661,
897
+ "learning_rate": 7.629778672032194e-05,
898
+ "loss": 25.0817,
899
+ "step": 3175
900
+ },
901
+ {
902
+ "epoch": 1.8936233170587364,
903
+ "grad_norm": 0.03437948599457741,
904
+ "learning_rate": 7.529175050301811e-05,
905
+ "loss": 25.0958,
906
+ "step": 3200
907
+ },
908
+ {
909
+ "epoch": 1.9084184050895103,
910
+ "grad_norm": 0.03295779228210449,
911
+ "learning_rate": 7.428571428571429e-05,
912
+ "loss": 25.075,
913
+ "step": 3225
914
+ },
915
+ {
916
+ "epoch": 1.9232134931202842,
917
+ "grad_norm": 0.028107503429055214,
918
+ "learning_rate": 7.327967806841046e-05,
919
+ "loss": 25.0909,
920
+ "step": 3250
921
+ },
922
+ {
923
+ "epoch": 1.938008581151058,
924
+ "grad_norm": 0.03533456474542618,
925
+ "learning_rate": 7.227364185110664e-05,
926
+ "loss": 25.0143,
927
+ "step": 3275
928
+ },
929
+ {
930
+ "epoch": 1.9528036691818316,
931
+ "grad_norm": 0.030729498714208603,
932
+ "learning_rate": 7.126760563380283e-05,
933
+ "loss": 25.0024,
934
+ "step": 3300
935
+ },
936
+ {
937
+ "epoch": 1.9675987572126055,
938
+ "grad_norm": 0.02123214863240719,
939
+ "learning_rate": 7.0261569416499e-05,
940
+ "loss": 25.1211,
941
+ "step": 3325
942
+ },
943
+ {
944
+ "epoch": 1.9823938452433794,
945
+ "grad_norm": 0.03742063418030739,
946
+ "learning_rate": 6.925553319919518e-05,
947
+ "loss": 25.1436,
948
+ "step": 3350
949
+ },
950
+ {
951
+ "epoch": 1.997188933274153,
952
+ "grad_norm": 0.03187941387295723,
953
+ "learning_rate": 6.824949698189135e-05,
954
+ "loss": 24.9736,
955
+ "step": 3375
956
+ },
957
+ {
958
+ "epoch": 2.011836070424619,
959
+ "grad_norm": 0.032343629747629166,
960
+ "learning_rate": 6.724346076458753e-05,
961
+ "loss": 25.0978,
962
+ "step": 3400
963
+ },
964
+ {
965
+ "epoch": 2.026631158455393,
966
+ "grad_norm": 0.032404378056526184,
967
+ "learning_rate": 6.62374245472837e-05,
968
+ "loss": 25.0795,
969
+ "step": 3425
970
+ },
971
+ {
972
+ "epoch": 2.0414262464861666,
973
+ "grad_norm": 0.025061512365937233,
974
+ "learning_rate": 6.523138832997988e-05,
975
+ "loss": 25.0731,
976
+ "step": 3450
977
+ },
978
+ {
979
+ "epoch": 2.0562213345169402,
980
+ "grad_norm": 0.03708725795149803,
981
+ "learning_rate": 6.422535211267607e-05,
982
+ "loss": 25.1702,
983
+ "step": 3475
984
+ },
985
+ {
986
+ "epoch": 2.0710164225477143,
987
+ "grad_norm": 0.030988432466983795,
988
+ "learning_rate": 6.321931589537223e-05,
989
+ "loss": 25.0596,
990
+ "step": 3500
991
+ },
992
+ {
993
+ "epoch": 2.085811510578488,
994
+ "grad_norm": 0.02645169198513031,
995
+ "learning_rate": 6.221327967806842e-05,
996
+ "loss": 25.078,
997
+ "step": 3525
998
+ },
999
+ {
1000
+ "epoch": 2.1006065986092617,
1001
+ "grad_norm": 0.032431282103061676,
1002
+ "learning_rate": 6.120724346076458e-05,
1003
+ "loss": 25.0198,
1004
+ "step": 3550
1005
+ },
1006
+ {
1007
+ "epoch": 2.1154016866400354,
1008
+ "grad_norm": 0.02121451124548912,
1009
+ "learning_rate": 6.0201207243460764e-05,
1010
+ "loss": 25.103,
1011
+ "step": 3575
1012
+ },
1013
+ {
1014
+ "epoch": 2.1301967746708095,
1015
+ "grad_norm": 0.02786272205412388,
1016
+ "learning_rate": 5.9195171026156946e-05,
1017
+ "loss": 25.1066,
1018
+ "step": 3600
1019
+ },
1020
+ {
1021
+ "epoch": 2.144991862701583,
1022
+ "grad_norm": 0.031123634427785873,
1023
+ "learning_rate": 5.818913480885312e-05,
1024
+ "loss": 25.0548,
1025
+ "step": 3625
1026
+ },
1027
+ {
1028
+ "epoch": 2.159786950732357,
1029
+ "grad_norm": 0.02966328151524067,
1030
+ "learning_rate": 5.71830985915493e-05,
1031
+ "loss": 25.072,
1032
+ "step": 3650
1033
+ },
1034
+ {
1035
+ "epoch": 2.1745820387631305,
1036
+ "grad_norm": 0.03140266612172127,
1037
+ "learning_rate": 5.617706237424547e-05,
1038
+ "loss": 25.0909,
1039
+ "step": 3675
1040
+ },
1041
+ {
1042
+ "epoch": 2.1893771267939046,
1043
+ "grad_norm": 0.02315523661673069,
1044
+ "learning_rate": 5.5171026156941655e-05,
1045
+ "loss": 25.0799,
1046
+ "step": 3700
1047
+ },
1048
+ {
1049
+ "epoch": 2.2041722148246783,
1050
+ "grad_norm": 0.030268024653196335,
1051
+ "learning_rate": 5.416498993963784e-05,
1052
+ "loss": 25.031,
1053
+ "step": 3725
1054
+ },
1055
+ {
1056
+ "epoch": 2.218967302855452,
1057
+ "grad_norm": 0.028935110196471214,
1058
+ "learning_rate": 5.3158953722334005e-05,
1059
+ "loss": 25.0775,
1060
+ "step": 3750
1061
+ },
1062
+ {
1063
+ "epoch": 2.2337623908862256,
1064
+ "grad_norm": 0.025782838463783264,
1065
+ "learning_rate": 5.215291750503019e-05,
1066
+ "loss": 25.0107,
1067
+ "step": 3775
1068
+ },
1069
+ {
1070
+ "epoch": 2.2485574789169998,
1071
+ "grad_norm": 0.032656069844961166,
1072
+ "learning_rate": 5.1146881287726356e-05,
1073
+ "loss": 25.0786,
1074
+ "step": 3800
1075
+ },
1076
+ {
1077
+ "epoch": 2.2633525669477734,
1078
+ "grad_norm": 0.02676152065396309,
1079
+ "learning_rate": 5.014084507042254e-05,
1080
+ "loss": 25.1132,
1081
+ "step": 3825
1082
+ },
1083
+ {
1084
+ "epoch": 2.278147654978547,
1085
+ "grad_norm": 0.05472303926944733,
1086
+ "learning_rate": 4.9134808853118714e-05,
1087
+ "loss": 25.0132,
1088
+ "step": 3850
1089
+ },
1090
+ {
1091
+ "epoch": 2.2929427430093208,
1092
+ "grad_norm": 0.028085239231586456,
1093
+ "learning_rate": 4.812877263581489e-05,
1094
+ "loss": 25.0521,
1095
+ "step": 3875
1096
+ },
1097
+ {
1098
+ "epoch": 2.307737831040095,
1099
+ "grad_norm": 0.030685044825077057,
1100
+ "learning_rate": 4.712273641851107e-05,
1101
+ "loss": 25.1104,
1102
+ "step": 3900
1103
+ },
1104
+ {
1105
+ "epoch": 2.3225329190708686,
1106
+ "grad_norm": 0.03963826224207878,
1107
+ "learning_rate": 4.611670020120725e-05,
1108
+ "loss": 25.0079,
1109
+ "step": 3925
1110
+ },
1111
+ {
1112
+ "epoch": 2.3373280071016422,
1113
+ "grad_norm": 0.025965016335248947,
1114
+ "learning_rate": 4.511066398390342e-05,
1115
+ "loss": 25.0729,
1116
+ "step": 3950
1117
+ },
1118
+ {
1119
+ "epoch": 2.352123095132416,
1120
+ "grad_norm": 0.0241608377546072,
1121
+ "learning_rate": 4.41046277665996e-05,
1122
+ "loss": 25.0157,
1123
+ "step": 3975
1124
+ },
1125
+ {
1126
+ "epoch": 2.3669181831631896,
1127
+ "grad_norm": 0.028290821239352226,
1128
+ "learning_rate": 4.3098591549295774e-05,
1129
+ "loss": 25.102,
1130
+ "step": 4000
1131
+ },
1132
+ {
1133
+ "epoch": 2.3817132711939637,
1134
+ "grad_norm": 0.0340087004005909,
1135
+ "learning_rate": 4.2092555331991956e-05,
1136
+ "loss": 25.028,
1137
+ "step": 4025
1138
+ },
1139
+ {
1140
+ "epoch": 2.3965083592247374,
1141
+ "grad_norm": 0.04680619016289711,
1142
+ "learning_rate": 4.108651911468813e-05,
1143
+ "loss": 25.1001,
1144
+ "step": 4050
1145
+ },
1146
+ {
1147
+ "epoch": 2.411303447255511,
1148
+ "grad_norm": 0.0324757918715477,
1149
+ "learning_rate": 4.008048289738431e-05,
1150
+ "loss": 25.0479,
1151
+ "step": 4075
1152
+ },
1153
+ {
1154
+ "epoch": 2.426098535286285,
1155
+ "grad_norm": 0.0365682914853096,
1156
+ "learning_rate": 3.907444668008048e-05,
1157
+ "loss": 25.0715,
1158
+ "step": 4100
1159
+ },
1160
+ {
1161
+ "epoch": 2.440893623317059,
1162
+ "grad_norm": 0.030593266710639,
1163
+ "learning_rate": 3.806841046277666e-05,
1164
+ "loss": 25.11,
1165
+ "step": 4125
1166
+ },
1167
+ {
1168
+ "epoch": 2.4556887113478325,
1169
+ "grad_norm": 0.031157072633504868,
1170
+ "learning_rate": 3.706237424547283e-05,
1171
+ "loss": 25.0853,
1172
+ "step": 4150
1173
+ },
1174
+ {
1175
+ "epoch": 2.470483799378606,
1176
+ "grad_norm": 0.03608427569270134,
1177
+ "learning_rate": 3.6056338028169015e-05,
1178
+ "loss": 24.9746,
1179
+ "step": 4175
1180
+ },
1181
+ {
1182
+ "epoch": 2.48527888740938,
1183
+ "grad_norm": 0.025035865604877472,
1184
+ "learning_rate": 3.505030181086519e-05,
1185
+ "loss": 25.0303,
1186
+ "step": 4200
1187
+ },
1188
+ {
1189
+ "epoch": 2.500073975440154,
1190
+ "grad_norm": 0.031874652951955795,
1191
+ "learning_rate": 3.4044265593561366e-05,
1192
+ "loss": 25.1169,
1193
+ "step": 4225
1194
+ },
1195
+ {
1196
+ "epoch": 2.5148690634709276,
1197
+ "grad_norm": 0.02959771454334259,
1198
+ "learning_rate": 3.303822937625755e-05,
1199
+ "loss": 25.0229,
1200
+ "step": 4250
1201
+ },
1202
+ {
1203
+ "epoch": 2.5296641515017013,
1204
+ "grad_norm": 0.029369182884693146,
1205
+ "learning_rate": 3.2032193158953724e-05,
1206
+ "loss": 25.1294,
1207
+ "step": 4275
1208
+ },
1209
+ {
1210
+ "epoch": 2.5444592395324754,
1211
+ "grad_norm": 0.03211820125579834,
1212
+ "learning_rate": 3.1026156941649906e-05,
1213
+ "loss": 25.0991,
1214
+ "step": 4300
1215
+ },
1216
+ {
1217
+ "epoch": 2.559254327563249,
1218
+ "grad_norm": 0.0338427871465683,
1219
+ "learning_rate": 3.002012072434608e-05,
1220
+ "loss": 25.0448,
1221
+ "step": 4325
1222
+ },
1223
+ {
1224
+ "epoch": 2.574049415594023,
1225
+ "grad_norm": 0.03451085835695267,
1226
+ "learning_rate": 2.9014084507042254e-05,
1227
+ "loss": 25.0453,
1228
+ "step": 4350
1229
+ },
1230
+ {
1231
+ "epoch": 2.5888445036247965,
1232
+ "grad_norm": 0.02821294404566288,
1233
+ "learning_rate": 2.8008048289738433e-05,
1234
+ "loss": 25.0382,
1235
+ "step": 4375
1236
+ },
1237
+ {
1238
+ "epoch": 2.60363959165557,
1239
+ "grad_norm": 0.032822128385305405,
1240
+ "learning_rate": 2.7002012072434608e-05,
1241
+ "loss": 25.0809,
1242
+ "step": 4400
1243
+ },
1244
+ {
1245
+ "epoch": 2.6184346796863442,
1246
+ "grad_norm": 0.03818770870566368,
1247
+ "learning_rate": 2.5995975855130787e-05,
1248
+ "loss": 25.067,
1249
+ "step": 4425
1250
+ },
1251
+ {
1252
+ "epoch": 2.633229767717118,
1253
+ "grad_norm": 0.03749888017773628,
1254
+ "learning_rate": 2.4989939637826962e-05,
1255
+ "loss": 25.091,
1256
+ "step": 4450
1257
+ },
1258
+ {
1259
+ "epoch": 2.6480248557478916,
1260
+ "grad_norm": 0.027703339233994484,
1261
+ "learning_rate": 2.398390342052314e-05,
1262
+ "loss": 25.1411,
1263
+ "step": 4475
1264
+ },
1265
+ {
1266
+ "epoch": 2.6628199437786657,
1267
+ "grad_norm": 0.03599558025598526,
1268
+ "learning_rate": 2.2977867203219317e-05,
1269
+ "loss": 25.0994,
1270
+ "step": 4500
1271
+ },
1272
+ {
1273
+ "epoch": 2.6776150318094394,
1274
+ "grad_norm": 0.030554693192243576,
1275
+ "learning_rate": 2.1971830985915496e-05,
1276
+ "loss": 25.0371,
1277
+ "step": 4525
1278
+ },
1279
+ {
1280
+ "epoch": 2.692410119840213,
1281
+ "grad_norm": 0.02407955750823021,
1282
+ "learning_rate": 2.096579476861167e-05,
1283
+ "loss": 25.0952,
1284
+ "step": 4550
1285
+ },
1286
+ {
1287
+ "epoch": 2.7072052078709867,
1288
+ "grad_norm": 0.03127657622098923,
1289
+ "learning_rate": 1.9959758551307846e-05,
1290
+ "loss": 25.1274,
1291
+ "step": 4575
1292
+ },
1293
+ {
1294
+ "epoch": 2.7220002959017604,
1295
+ "grad_norm": 0.03968256711959839,
1296
+ "learning_rate": 1.8953722334004025e-05,
1297
+ "loss": 24.9342,
1298
+ "step": 4600
1299
+ },
1300
+ {
1301
+ "epoch": 2.7367953839325345,
1302
+ "grad_norm": 0.034262653440237045,
1303
+ "learning_rate": 1.79476861167002e-05,
1304
+ "loss": 25.1047,
1305
+ "step": 4625
1306
+ },
1307
+ {
1308
+ "epoch": 2.751590471963308,
1309
+ "grad_norm": 0.022854868322610855,
1310
+ "learning_rate": 1.6941649899396376e-05,
1311
+ "loss": 25.1082,
1312
+ "step": 4650
1313
+ },
1314
+ {
1315
+ "epoch": 2.766385559994082,
1316
+ "grad_norm": 0.03031456097960472,
1317
+ "learning_rate": 1.5935613682092555e-05,
1318
+ "loss": 25.0574,
1319
+ "step": 4675
1320
+ },
1321
+ {
1322
+ "epoch": 2.781180648024856,
1323
+ "grad_norm": 0.033804699778556824,
1324
+ "learning_rate": 1.4929577464788732e-05,
1325
+ "loss": 25.0682,
1326
+ "step": 4700
1327
+ },
1328
+ {
1329
+ "epoch": 2.7959757360556297,
1330
+ "grad_norm": 0.03536156564950943,
1331
+ "learning_rate": 1.3923541247484911e-05,
1332
+ "loss": 25.0752,
1333
+ "step": 4725
1334
+ },
1335
+ {
1336
+ "epoch": 2.8107708240864033,
1337
+ "grad_norm": 0.02477777935564518,
1338
+ "learning_rate": 1.2917505030181087e-05,
1339
+ "loss": 25.0747,
1340
+ "step": 4750
1341
+ },
1342
+ {
1343
+ "epoch": 2.825565912117177,
1344
+ "grad_norm": 0.03472886234521866,
1345
+ "learning_rate": 1.1911468812877265e-05,
1346
+ "loss": 25.058,
1347
+ "step": 4775
1348
+ },
1349
+ {
1350
+ "epoch": 2.8403610001479507,
1351
+ "grad_norm": 0.03604018688201904,
1352
+ "learning_rate": 1.0905432595573441e-05,
1353
+ "loss": 25.1504,
1354
+ "step": 4800
1355
+ },
1356
+ {
1357
+ "epoch": 2.855156088178725,
1358
+ "grad_norm": 0.03106354922056198,
1359
+ "learning_rate": 9.899396378269618e-06,
1360
+ "loss": 25.1244,
1361
+ "step": 4825
1362
+ },
1363
+ {
1364
+ "epoch": 2.8699511762094985,
1365
+ "grad_norm": 0.033049870282411575,
1366
+ "learning_rate": 8.893360160965795e-06,
1367
+ "loss": 25.1288,
1368
+ "step": 4850
1369
+ },
1370
+ {
1371
+ "epoch": 2.884746264240272,
1372
+ "grad_norm": 0.03024250827729702,
1373
+ "learning_rate": 7.887323943661972e-06,
1374
+ "loss": 25.0513,
1375
+ "step": 4875
1376
+ },
1377
+ {
1378
+ "epoch": 2.8995413522710463,
1379
+ "grad_norm": 0.032848477363586426,
1380
+ "learning_rate": 6.881287726358149e-06,
1381
+ "loss": 25.0365,
1382
+ "step": 4900
1383
+ },
1384
+ {
1385
+ "epoch": 2.91433644030182,
1386
+ "grad_norm": 0.03320826590061188,
1387
+ "learning_rate": 5.875251509054326e-06,
1388
+ "loss": 25.1055,
1389
+ "step": 4925
1390
+ },
1391
+ {
1392
+ "epoch": 2.9291315283325936,
1393
+ "grad_norm": 0.033013731241226196,
1394
+ "learning_rate": 4.869215291750504e-06,
1395
+ "loss": 25.1015,
1396
+ "step": 4950
1397
+ },
1398
+ {
1399
+ "epoch": 2.9439266163633673,
1400
+ "grad_norm": 0.026161963120102882,
1401
+ "learning_rate": 3.86317907444668e-06,
1402
+ "loss": 25.0681,
1403
+ "step": 4975
1404
+ },
1405
+ {
1406
+ "epoch": 2.958721704394141,
1407
+ "grad_norm": 0.031280238181352615,
1408
+ "learning_rate": 2.8571428571428573e-06,
1409
+ "loss": 25.0965,
1410
+ "step": 5000
1411
+ },
1412
+ {
1413
+ "epoch": 2.973516792424915,
1414
+ "grad_norm": 0.02889222465455532,
1415
+ "learning_rate": 1.8511066398390342e-06,
1416
+ "loss": 25.0899,
1417
+ "step": 5025
1418
+ },
1419
+ {
1420
+ "epoch": 2.9883118804556887,
1421
+ "grad_norm": 0.04580514132976532,
1422
+ "learning_rate": 8.450704225352112e-07,
1423
+ "loss": 25.1615,
1424
+ "step": 5050
1425
+ }
1426
+ ],
1427
+ "logging_steps": 25,
1428
+ "max_steps": 5070,
1429
+ "num_input_tokens_seen": 0,
1430
+ "num_train_epochs": 3,
1431
+ "save_steps": 200,
1432
+ "stateful_callbacks": {
1433
+ "TrainerControl": {
1434
+ "args": {
1435
+ "should_epoch_stop": false,
1436
+ "should_evaluate": false,
1437
+ "should_log": false,
1438
+ "should_save": true,
1439
+ "should_training_stop": true
1440
+ },
1441
+ "attributes": {}
1442
+ }
1443
+ },
1444
+ "total_flos": 1.599080012584059e+17,
1445
+ "train_batch_size": 2,
1446
+ "trial_name": null,
1447
+ "trial_params": null
1448
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d8b208aca29f450560d7805c3e0d7e559c8b70bc5ec3efc74926a45ae644966a
3
+ size 5777