Seonghaa commited on
Commit
3330fd3
·
verified ·
1 Parent(s): 001fd94

🚀 Upload KoELECTRA emotion classification model

Browse files
checkpoint-16135/config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ElectraForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "embedding_size": 768,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "\uae30\uc068",
13
+ "1": "\ub2f9\ud669",
14
+ "2": "\ubd84\ub178",
15
+ "3": "\ubd88\uc548",
16
+ "4": "\uc0c1\ucc98",
17
+ "5": "\uc2ac\ud514"
18
+ },
19
+ "initializer_range": 0.02,
20
+ "intermediate_size": 3072,
21
+ "label2id": {
22
+ "\uae30\uc068": 0,
23
+ "\ub2f9\ud669": 1,
24
+ "\ubd84\ub178": 2,
25
+ "\ubd88\uc548": 3,
26
+ "\uc0c1\ucc98": 4,
27
+ "\uc2ac\ud514": 5
28
+ },
29
+ "layer_norm_eps": 1e-12,
30
+ "max_position_embeddings": 512,
31
+ "model_type": "electra",
32
+ "num_attention_heads": 12,
33
+ "num_hidden_layers": 12,
34
+ "pad_token_id": 0,
35
+ "position_embedding_type": "absolute",
36
+ "problem_type": "single_label_classification",
37
+ "summary_activation": "gelu",
38
+ "summary_last_dropout": 0.1,
39
+ "summary_type": "first",
40
+ "summary_use_proj": true,
41
+ "tokenizer_class": "BertTokenizer",
42
+ "torch_dtype": "float32",
43
+ "transformers_version": "4.52.2",
44
+ "type_vocab_size": 2,
45
+ "use_cache": true,
46
+ "vocab_size": 54343
47
+ }
checkpoint-16135/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:149e0f950d5f1601cc510fbd6b354de2f1e128f9f787c2d32ed18a6ec9b5f743
3
+ size 511149672
checkpoint-16135/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3388c9d7394db970a92bd97bdf4ddc045eb9ca89edde4582ae3102e6b5abbd32
3
+ size 1022417532
checkpoint-16135/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f8e73afc24b71ca38f1ca196f4e24ac8a6c21f769e4f9c669aeed936401370b7
3
+ size 14168
checkpoint-16135/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf55eaf975de0a96c6214e9e47886d4be4d2ee565319d1980a9a5f976ed4bdc3
3
+ size 1056
checkpoint-16135/trainer_state.json ADDED
@@ -0,0 +1,2347 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 16135,
3
+ "best_metric": 0.7960859271865419,
4
+ "best_model_checkpoint": "/content/drive/MyDrive/\uac10\uc815\ubd84\ub958/data/emotion_model/checkpoint-16135",
5
+ "epoch": 5.0,
6
+ "eval_steps": 500,
7
+ "global_step": 16135,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.015494267121165169,
14
+ "grad_norm": 1.9321871995925903,
15
+ "learning_rate": 4.5553145336225596e-07,
16
+ "loss": 1.7919,
17
+ "step": 50
18
+ },
19
+ {
20
+ "epoch": 0.030988534242330338,
21
+ "grad_norm": 1.7712626457214355,
22
+ "learning_rate": 9.203594669972111e-07,
23
+ "loss": 1.7865,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.04648280136349551,
28
+ "grad_norm": 1.9185744524002075,
29
+ "learning_rate": 1.385187480632166e-06,
30
+ "loss": 1.7864,
31
+ "step": 150
32
+ },
33
+ {
34
+ "epoch": 0.061977068484660676,
35
+ "grad_norm": 1.9046003818511963,
36
+ "learning_rate": 1.8500154942671213e-06,
37
+ "loss": 1.7864,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 0.07747133560582585,
42
+ "grad_norm": 2.0914034843444824,
43
+ "learning_rate": 2.3148435079020763e-06,
44
+ "loss": 1.7813,
45
+ "step": 250
46
+ },
47
+ {
48
+ "epoch": 0.09296560272699102,
49
+ "grad_norm": 2.088219165802002,
50
+ "learning_rate": 2.7796715215370313e-06,
51
+ "loss": 1.7773,
52
+ "step": 300
53
+ },
54
+ {
55
+ "epoch": 0.10845986984815618,
56
+ "grad_norm": 2.1377577781677246,
57
+ "learning_rate": 3.2444995351719864e-06,
58
+ "loss": 1.7587,
59
+ "step": 350
60
+ },
61
+ {
62
+ "epoch": 0.12395413696932135,
63
+ "grad_norm": 2.2140750885009766,
64
+ "learning_rate": 3.7093275488069414e-06,
65
+ "loss": 1.7388,
66
+ "step": 400
67
+ },
68
+ {
69
+ "epoch": 0.13944840409048653,
70
+ "grad_norm": 2.1278295516967773,
71
+ "learning_rate": 4.174155562441896e-06,
72
+ "loss": 1.6861,
73
+ "step": 450
74
+ },
75
+ {
76
+ "epoch": 0.1549426712116517,
77
+ "grad_norm": 4.734658241271973,
78
+ "learning_rate": 4.638983576076852e-06,
79
+ "loss": 1.6267,
80
+ "step": 500
81
+ },
82
+ {
83
+ "epoch": 0.17043693833281687,
84
+ "grad_norm": 4.140384197235107,
85
+ "learning_rate": 5.103811589711806e-06,
86
+ "loss": 1.5732,
87
+ "step": 550
88
+ },
89
+ {
90
+ "epoch": 0.18593120545398203,
91
+ "grad_norm": 2.705493688583374,
92
+ "learning_rate": 5.568639603346762e-06,
93
+ "loss": 1.5438,
94
+ "step": 600
95
+ },
96
+ {
97
+ "epoch": 0.2014254725751472,
98
+ "grad_norm": 5.3791422843933105,
99
+ "learning_rate": 6.0334676169817164e-06,
100
+ "loss": 1.4644,
101
+ "step": 650
102
+ },
103
+ {
104
+ "epoch": 0.21691973969631237,
105
+ "grad_norm": 2.8989250659942627,
106
+ "learning_rate": 6.498295630616672e-06,
107
+ "loss": 1.4743,
108
+ "step": 700
109
+ },
110
+ {
111
+ "epoch": 0.23241400681747754,
112
+ "grad_norm": 3.40291166305542,
113
+ "learning_rate": 6.963123644251627e-06,
114
+ "loss": 1.4187,
115
+ "step": 750
116
+ },
117
+ {
118
+ "epoch": 0.2479082739386427,
119
+ "grad_norm": 5.748068809509277,
120
+ "learning_rate": 7.427951657886583e-06,
121
+ "loss": 1.3533,
122
+ "step": 800
123
+ },
124
+ {
125
+ "epoch": 0.26340254105980787,
126
+ "grad_norm": 5.777422904968262,
127
+ "learning_rate": 7.892779671521537e-06,
128
+ "loss": 1.3099,
129
+ "step": 850
130
+ },
131
+ {
132
+ "epoch": 0.27889680818097307,
133
+ "grad_norm": 4.8046112060546875,
134
+ "learning_rate": 8.357607685156493e-06,
135
+ "loss": 1.2723,
136
+ "step": 900
137
+ },
138
+ {
139
+ "epoch": 0.2943910753021382,
140
+ "grad_norm": 5.549858570098877,
141
+ "learning_rate": 8.822435698791447e-06,
142
+ "loss": 1.2325,
143
+ "step": 950
144
+ },
145
+ {
146
+ "epoch": 0.3098853424233034,
147
+ "grad_norm": 6.851742744445801,
148
+ "learning_rate": 9.287263712426402e-06,
149
+ "loss": 1.1884,
150
+ "step": 1000
151
+ },
152
+ {
153
+ "epoch": 0.32537960954446854,
154
+ "grad_norm": 5.2497735023498535,
155
+ "learning_rate": 9.752091726061357e-06,
156
+ "loss": 1.2224,
157
+ "step": 1050
158
+ },
159
+ {
160
+ "epoch": 0.34087387666563373,
161
+ "grad_norm": 7.023674488067627,
162
+ "learning_rate": 1.0216919739696313e-05,
163
+ "loss": 1.177,
164
+ "step": 1100
165
+ },
166
+ {
167
+ "epoch": 0.3563681437867989,
168
+ "grad_norm": 4.888996124267578,
169
+ "learning_rate": 1.0681747753331269e-05,
170
+ "loss": 1.1577,
171
+ "step": 1150
172
+ },
173
+ {
174
+ "epoch": 0.37186241090796407,
175
+ "grad_norm": 6.2133660316467285,
176
+ "learning_rate": 1.1146575766966222e-05,
177
+ "loss": 1.1726,
178
+ "step": 1200
179
+ },
180
+ {
181
+ "epoch": 0.3873566780291292,
182
+ "grad_norm": 6.936697483062744,
183
+ "learning_rate": 1.1611403780601178e-05,
184
+ "loss": 1.1001,
185
+ "step": 1250
186
+ },
187
+ {
188
+ "epoch": 0.4028509451502944,
189
+ "grad_norm": 8.526293754577637,
190
+ "learning_rate": 1.2076231794236133e-05,
191
+ "loss": 1.0752,
192
+ "step": 1300
193
+ },
194
+ {
195
+ "epoch": 0.41834521227145954,
196
+ "grad_norm": 5.5933756828308105,
197
+ "learning_rate": 1.254105980787109e-05,
198
+ "loss": 1.0809,
199
+ "step": 1350
200
+ },
201
+ {
202
+ "epoch": 0.43383947939262474,
203
+ "grad_norm": 6.998812675476074,
204
+ "learning_rate": 1.3005887821506042e-05,
205
+ "loss": 1.0566,
206
+ "step": 1400
207
+ },
208
+ {
209
+ "epoch": 0.4493337465137899,
210
+ "grad_norm": 7.077617645263672,
211
+ "learning_rate": 1.3470715835140998e-05,
212
+ "loss": 1.0993,
213
+ "step": 1450
214
+ },
215
+ {
216
+ "epoch": 0.4648280136349551,
217
+ "grad_norm": 8.715201377868652,
218
+ "learning_rate": 1.3935543848775953e-05,
219
+ "loss": 1.0375,
220
+ "step": 1500
221
+ },
222
+ {
223
+ "epoch": 0.4803222807561202,
224
+ "grad_norm": 6.017217636108398,
225
+ "learning_rate": 1.440037186241091e-05,
226
+ "loss": 1.0388,
227
+ "step": 1550
228
+ },
229
+ {
230
+ "epoch": 0.4958165478772854,
231
+ "grad_norm": 5.349973201751709,
232
+ "learning_rate": 1.4865199876045862e-05,
233
+ "loss": 1.0354,
234
+ "step": 1600
235
+ },
236
+ {
237
+ "epoch": 0.5113108149984505,
238
+ "grad_norm": 12.728338241577148,
239
+ "learning_rate": 1.533002788968082e-05,
240
+ "loss": 1.0314,
241
+ "step": 1650
242
+ },
243
+ {
244
+ "epoch": 0.5268050821196157,
245
+ "grad_norm": 5.962468147277832,
246
+ "learning_rate": 1.5794855903315773e-05,
247
+ "loss": 1.03,
248
+ "step": 1700
249
+ },
250
+ {
251
+ "epoch": 0.5422993492407809,
252
+ "grad_norm": 5.971400260925293,
253
+ "learning_rate": 1.6259683916950726e-05,
254
+ "loss": 1.0541,
255
+ "step": 1750
256
+ },
257
+ {
258
+ "epoch": 0.5577936163619461,
259
+ "grad_norm": 6.260463714599609,
260
+ "learning_rate": 1.6724511930585682e-05,
261
+ "loss": 0.9831,
262
+ "step": 1800
263
+ },
264
+ {
265
+ "epoch": 0.5732878834831112,
266
+ "grad_norm": 7.8115010261535645,
267
+ "learning_rate": 1.718933994422064e-05,
268
+ "loss": 0.966,
269
+ "step": 1850
270
+ },
271
+ {
272
+ "epoch": 0.5887821506042764,
273
+ "grad_norm": 5.005403995513916,
274
+ "learning_rate": 1.7654167957855595e-05,
275
+ "loss": 0.9592,
276
+ "step": 1900
277
+ },
278
+ {
279
+ "epoch": 0.6042764177254416,
280
+ "grad_norm": 7.7732157707214355,
281
+ "learning_rate": 1.8118995971490548e-05,
282
+ "loss": 0.9766,
283
+ "step": 1950
284
+ },
285
+ {
286
+ "epoch": 0.6197706848466068,
287
+ "grad_norm": 7.265392303466797,
288
+ "learning_rate": 1.8583823985125504e-05,
289
+ "loss": 1.0171,
290
+ "step": 2000
291
+ },
292
+ {
293
+ "epoch": 0.6352649519677719,
294
+ "grad_norm": 15.946109771728516,
295
+ "learning_rate": 1.904865199876046e-05,
296
+ "loss": 0.9824,
297
+ "step": 2050
298
+ },
299
+ {
300
+ "epoch": 0.6507592190889371,
301
+ "grad_norm": 7.261445999145508,
302
+ "learning_rate": 1.9513480012395417e-05,
303
+ "loss": 0.9699,
304
+ "step": 2100
305
+ },
306
+ {
307
+ "epoch": 0.6662534862101023,
308
+ "grad_norm": 8.201744079589844,
309
+ "learning_rate": 1.997830802603037e-05,
310
+ "loss": 0.9957,
311
+ "step": 2150
312
+ },
313
+ {
314
+ "epoch": 0.6817477533312675,
315
+ "grad_norm": 6.183067798614502,
316
+ "learning_rate": 2.0443136039665322e-05,
317
+ "loss": 0.8554,
318
+ "step": 2200
319
+ },
320
+ {
321
+ "epoch": 0.6972420204524326,
322
+ "grad_norm": 7.481590270996094,
323
+ "learning_rate": 2.090796405330028e-05,
324
+ "loss": 0.9929,
325
+ "step": 2250
326
+ },
327
+ {
328
+ "epoch": 0.7127362875735977,
329
+ "grad_norm": 7.3274030685424805,
330
+ "learning_rate": 2.1372792066935235e-05,
331
+ "loss": 0.9438,
332
+ "step": 2300
333
+ },
334
+ {
335
+ "epoch": 0.7282305546947629,
336
+ "grad_norm": 11.69247055053711,
337
+ "learning_rate": 2.183762008057019e-05,
338
+ "loss": 0.9767,
339
+ "step": 2350
340
+ },
341
+ {
342
+ "epoch": 0.7437248218159281,
343
+ "grad_norm": 7.929721832275391,
344
+ "learning_rate": 2.2302448094205144e-05,
345
+ "loss": 1.0036,
346
+ "step": 2400
347
+ },
348
+ {
349
+ "epoch": 0.7592190889370932,
350
+ "grad_norm": 9.753717422485352,
351
+ "learning_rate": 2.27672761078401e-05,
352
+ "loss": 0.9726,
353
+ "step": 2450
354
+ },
355
+ {
356
+ "epoch": 0.7747133560582584,
357
+ "grad_norm": 7.797086715698242,
358
+ "learning_rate": 2.3232104121475057e-05,
359
+ "loss": 0.9322,
360
+ "step": 2500
361
+ },
362
+ {
363
+ "epoch": 0.7902076231794236,
364
+ "grad_norm": 6.927332878112793,
365
+ "learning_rate": 2.369693213511001e-05,
366
+ "loss": 0.9378,
367
+ "step": 2550
368
+ },
369
+ {
370
+ "epoch": 0.8057018903005888,
371
+ "grad_norm": 3.726092576980591,
372
+ "learning_rate": 2.4161760148744962e-05,
373
+ "loss": 0.958,
374
+ "step": 2600
375
+ },
376
+ {
377
+ "epoch": 0.821196157421754,
378
+ "grad_norm": 5.661774635314941,
379
+ "learning_rate": 2.462658816237992e-05,
380
+ "loss": 0.9651,
381
+ "step": 2650
382
+ },
383
+ {
384
+ "epoch": 0.8366904245429191,
385
+ "grad_norm": 6.513345718383789,
386
+ "learning_rate": 2.5091416176014875e-05,
387
+ "loss": 0.971,
388
+ "step": 2700
389
+ },
390
+ {
391
+ "epoch": 0.8521846916640843,
392
+ "grad_norm": 6.713255405426025,
393
+ "learning_rate": 2.555624418964983e-05,
394
+ "loss": 0.8616,
395
+ "step": 2750
396
+ },
397
+ {
398
+ "epoch": 0.8676789587852495,
399
+ "grad_norm": 8.527266502380371,
400
+ "learning_rate": 2.6021072203284784e-05,
401
+ "loss": 0.9413,
402
+ "step": 2800
403
+ },
404
+ {
405
+ "epoch": 0.8831732259064147,
406
+ "grad_norm": 6.599502086639404,
407
+ "learning_rate": 2.648590021691974e-05,
408
+ "loss": 0.9985,
409
+ "step": 2850
410
+ },
411
+ {
412
+ "epoch": 0.8986674930275798,
413
+ "grad_norm": 4.0680155754089355,
414
+ "learning_rate": 2.6950728230554697e-05,
415
+ "loss": 0.9415,
416
+ "step": 2900
417
+ },
418
+ {
419
+ "epoch": 0.914161760148745,
420
+ "grad_norm": 5.083493232727051,
421
+ "learning_rate": 2.741555624418965e-05,
422
+ "loss": 0.9805,
423
+ "step": 2950
424
+ },
425
+ {
426
+ "epoch": 0.9296560272699101,
427
+ "grad_norm": 4.0469069480896,
428
+ "learning_rate": 2.7880384257824606e-05,
429
+ "loss": 0.9547,
430
+ "step": 3000
431
+ },
432
+ {
433
+ "epoch": 0.9451502943910753,
434
+ "grad_norm": 6.075752258300781,
435
+ "learning_rate": 2.834521227145956e-05,
436
+ "loss": 0.9623,
437
+ "step": 3050
438
+ },
439
+ {
440
+ "epoch": 0.9606445615122404,
441
+ "grad_norm": 6.5252299308776855,
442
+ "learning_rate": 2.8810040285094515e-05,
443
+ "loss": 0.9838,
444
+ "step": 3100
445
+ },
446
+ {
447
+ "epoch": 0.9761388286334056,
448
+ "grad_norm": 6.530562877655029,
449
+ "learning_rate": 2.927486829872947e-05,
450
+ "loss": 0.911,
451
+ "step": 3150
452
+ },
453
+ {
454
+ "epoch": 0.9916330957545708,
455
+ "grad_norm": 8.217161178588867,
456
+ "learning_rate": 2.9739696312364428e-05,
457
+ "loss": 0.9457,
458
+ "step": 3200
459
+ },
460
+ {
461
+ "epoch": 1.0,
462
+ "eval_accuracy": 0.7087787983737389,
463
+ "eval_f1": 0.7069046924545164,
464
+ "eval_loss": 0.8573769330978394,
465
+ "eval_runtime": 25.5149,
466
+ "eval_samples_per_second": 260.279,
467
+ "eval_steps_per_second": 16.304,
468
+ "step": 3227
469
+ },
470
+ {
471
+ "epoch": 1.007127362875736,
472
+ "grad_norm": 4.367455005645752,
473
+ "learning_rate": 2.997727507488896e-05,
474
+ "loss": 0.9143,
475
+ "step": 3250
476
+ },
477
+ {
478
+ "epoch": 1.022621629996901,
479
+ "grad_norm": 4.636725902557373,
480
+ "learning_rate": 2.9925627517818407e-05,
481
+ "loss": 0.9304,
482
+ "step": 3300
483
+ },
484
+ {
485
+ "epoch": 1.0381158971180664,
486
+ "grad_norm": 4.251437664031982,
487
+ "learning_rate": 2.9873979960747857e-05,
488
+ "loss": 0.8549,
489
+ "step": 3350
490
+ },
491
+ {
492
+ "epoch": 1.0536101642392315,
493
+ "grad_norm": 6.648655414581299,
494
+ "learning_rate": 2.9822332403677307e-05,
495
+ "loss": 0.8698,
496
+ "step": 3400
497
+ },
498
+ {
499
+ "epoch": 1.0691044313603966,
500
+ "grad_norm": 7.102205276489258,
501
+ "learning_rate": 2.9770684846606757e-05,
502
+ "loss": 0.8597,
503
+ "step": 3450
504
+ },
505
+ {
506
+ "epoch": 1.0845986984815619,
507
+ "grad_norm": 10.821270942687988,
508
+ "learning_rate": 2.9719037289536206e-05,
509
+ "loss": 0.8457,
510
+ "step": 3500
511
+ },
512
+ {
513
+ "epoch": 1.100092965602727,
514
+ "grad_norm": 6.111588001251221,
515
+ "learning_rate": 2.9667389732465652e-05,
516
+ "loss": 0.8973,
517
+ "step": 3550
518
+ },
519
+ {
520
+ "epoch": 1.1155872327238923,
521
+ "grad_norm": 9.016953468322754,
522
+ "learning_rate": 2.9615742175395106e-05,
523
+ "loss": 0.8228,
524
+ "step": 3600
525
+ },
526
+ {
527
+ "epoch": 1.1310814998450573,
528
+ "grad_norm": 7.717069625854492,
529
+ "learning_rate": 2.9564094618324555e-05,
530
+ "loss": 0.8103,
531
+ "step": 3650
532
+ },
533
+ {
534
+ "epoch": 1.1465757669662224,
535
+ "grad_norm": 7.848579406738281,
536
+ "learning_rate": 2.9512447061254005e-05,
537
+ "loss": 0.8097,
538
+ "step": 3700
539
+ },
540
+ {
541
+ "epoch": 1.1620700340873877,
542
+ "grad_norm": 4.738124847412109,
543
+ "learning_rate": 2.9460799504183455e-05,
544
+ "loss": 0.9105,
545
+ "step": 3750
546
+ },
547
+ {
548
+ "epoch": 1.1775643012085528,
549
+ "grad_norm": 5.289875507354736,
550
+ "learning_rate": 2.94091519471129e-05,
551
+ "loss": 0.8876,
552
+ "step": 3800
553
+ },
554
+ {
555
+ "epoch": 1.1930585683297181,
556
+ "grad_norm": 6.445308685302734,
557
+ "learning_rate": 2.935750439004235e-05,
558
+ "loss": 0.8377,
559
+ "step": 3850
560
+ },
561
+ {
562
+ "epoch": 1.2085528354508832,
563
+ "grad_norm": 4.725327968597412,
564
+ "learning_rate": 2.93058568329718e-05,
565
+ "loss": 0.9332,
566
+ "step": 3900
567
+ },
568
+ {
569
+ "epoch": 1.2240471025720483,
570
+ "grad_norm": 53.85081481933594,
571
+ "learning_rate": 2.925420927590125e-05,
572
+ "loss": 0.8882,
573
+ "step": 3950
574
+ },
575
+ {
576
+ "epoch": 1.2395413696932136,
577
+ "grad_norm": 5.677978515625,
578
+ "learning_rate": 2.9202561718830703e-05,
579
+ "loss": 0.8481,
580
+ "step": 4000
581
+ },
582
+ {
583
+ "epoch": 1.2550356368143787,
584
+ "grad_norm": 3.941765785217285,
585
+ "learning_rate": 2.915091416176015e-05,
586
+ "loss": 0.835,
587
+ "step": 4050
588
+ },
589
+ {
590
+ "epoch": 1.2705299039355438,
591
+ "grad_norm": 8.099725723266602,
592
+ "learning_rate": 2.90992666046896e-05,
593
+ "loss": 0.8322,
594
+ "step": 4100
595
+ },
596
+ {
597
+ "epoch": 1.286024171056709,
598
+ "grad_norm": 6.59591007232666,
599
+ "learning_rate": 2.904761904761905e-05,
600
+ "loss": 0.8809,
601
+ "step": 4150
602
+ },
603
+ {
604
+ "epoch": 1.3015184381778742,
605
+ "grad_norm": 7.200226306915283,
606
+ "learning_rate": 2.8995971490548498e-05,
607
+ "loss": 0.8609,
608
+ "step": 4200
609
+ },
610
+ {
611
+ "epoch": 1.3170127052990392,
612
+ "grad_norm": 4.902937412261963,
613
+ "learning_rate": 2.8944323933477948e-05,
614
+ "loss": 0.8633,
615
+ "step": 4250
616
+ },
617
+ {
618
+ "epoch": 1.3325069724202045,
619
+ "grad_norm": 5.792146682739258,
620
+ "learning_rate": 2.8892676376407394e-05,
621
+ "loss": 0.8684,
622
+ "step": 4300
623
+ },
624
+ {
625
+ "epoch": 1.3480012395413696,
626
+ "grad_norm": 4.636809349060059,
627
+ "learning_rate": 2.8841028819336844e-05,
628
+ "loss": 0.8245,
629
+ "step": 4350
630
+ },
631
+ {
632
+ "epoch": 1.363495506662535,
633
+ "grad_norm": 5.28842306137085,
634
+ "learning_rate": 2.8789381262266297e-05,
635
+ "loss": 0.7859,
636
+ "step": 4400
637
+ },
638
+ {
639
+ "epoch": 1.3789897737837,
640
+ "grad_norm": 4.259128570556641,
641
+ "learning_rate": 2.8737733705195747e-05,
642
+ "loss": 0.8228,
643
+ "step": 4450
644
+ },
645
+ {
646
+ "epoch": 1.394484040904865,
647
+ "grad_norm": 7.914375305175781,
648
+ "learning_rate": 2.8686086148125196e-05,
649
+ "loss": 0.9448,
650
+ "step": 4500
651
+ },
652
+ {
653
+ "epoch": 1.4099783080260304,
654
+ "grad_norm": 7.636547088623047,
655
+ "learning_rate": 2.8634438591054643e-05,
656
+ "loss": 0.8781,
657
+ "step": 4550
658
+ },
659
+ {
660
+ "epoch": 1.4254725751471955,
661
+ "grad_norm": 8.681707382202148,
662
+ "learning_rate": 2.8582791033984092e-05,
663
+ "loss": 0.8367,
664
+ "step": 4600
665
+ },
666
+ {
667
+ "epoch": 1.4409668422683608,
668
+ "grad_norm": 7.864759922027588,
669
+ "learning_rate": 2.8531143476913542e-05,
670
+ "loss": 0.9088,
671
+ "step": 4650
672
+ },
673
+ {
674
+ "epoch": 1.4564611093895259,
675
+ "grad_norm": 4.892348289489746,
676
+ "learning_rate": 2.8479495919842992e-05,
677
+ "loss": 0.8993,
678
+ "step": 4700
679
+ },
680
+ {
681
+ "epoch": 1.471955376510691,
682
+ "grad_norm": 23.208873748779297,
683
+ "learning_rate": 2.842784836277244e-05,
684
+ "loss": 0.8542,
685
+ "step": 4750
686
+ },
687
+ {
688
+ "epoch": 1.4874496436318563,
689
+ "grad_norm": 8.983469009399414,
690
+ "learning_rate": 2.837620080570189e-05,
691
+ "loss": 0.949,
692
+ "step": 4800
693
+ },
694
+ {
695
+ "epoch": 1.5029439107530214,
696
+ "grad_norm": 10.706644058227539,
697
+ "learning_rate": 2.832455324863134e-05,
698
+ "loss": 0.9008,
699
+ "step": 4850
700
+ },
701
+ {
702
+ "epoch": 1.5184381778741867,
703
+ "grad_norm": 4.685935020446777,
704
+ "learning_rate": 2.827290569156079e-05,
705
+ "loss": 0.8613,
706
+ "step": 4900
707
+ },
708
+ {
709
+ "epoch": 1.5339324449953518,
710
+ "grad_norm": 5.286406993865967,
711
+ "learning_rate": 2.822125813449024e-05,
712
+ "loss": 0.8929,
713
+ "step": 4950
714
+ },
715
+ {
716
+ "epoch": 1.5494267121165168,
717
+ "grad_norm": 4.907707691192627,
718
+ "learning_rate": 2.816961057741969e-05,
719
+ "loss": 0.8321,
720
+ "step": 5000
721
+ },
722
+ {
723
+ "epoch": 1.564920979237682,
724
+ "grad_norm": 6.398087501525879,
725
+ "learning_rate": 2.8117963020349136e-05,
726
+ "loss": 0.8626,
727
+ "step": 5050
728
+ },
729
+ {
730
+ "epoch": 1.5804152463588472,
731
+ "grad_norm": 5.323617458343506,
732
+ "learning_rate": 2.8066315463278586e-05,
733
+ "loss": 0.8324,
734
+ "step": 5100
735
+ },
736
+ {
737
+ "epoch": 1.5959095134800125,
738
+ "grad_norm": 4.136271953582764,
739
+ "learning_rate": 2.8014667906208035e-05,
740
+ "loss": 0.879,
741
+ "step": 5150
742
+ },
743
+ {
744
+ "epoch": 1.6114037806011776,
745
+ "grad_norm": 6.873619556427002,
746
+ "learning_rate": 2.796302034913749e-05,
747
+ "loss": 0.9043,
748
+ "step": 5200
749
+ },
750
+ {
751
+ "epoch": 1.6268980477223427,
752
+ "grad_norm": 7.138693809509277,
753
+ "learning_rate": 2.7911372792066938e-05,
754
+ "loss": 0.8183,
755
+ "step": 5250
756
+ },
757
+ {
758
+ "epoch": 1.6423923148435078,
759
+ "grad_norm": 6.483767032623291,
760
+ "learning_rate": 2.7859725234996384e-05,
761
+ "loss": 0.8867,
762
+ "step": 5300
763
+ },
764
+ {
765
+ "epoch": 1.657886581964673,
766
+ "grad_norm": 3.2249104976654053,
767
+ "learning_rate": 2.7808077677925834e-05,
768
+ "loss": 0.8097,
769
+ "step": 5350
770
+ },
771
+ {
772
+ "epoch": 1.6733808490858384,
773
+ "grad_norm": 6.961575984954834,
774
+ "learning_rate": 2.7756430120855284e-05,
775
+ "loss": 0.8364,
776
+ "step": 5400
777
+ },
778
+ {
779
+ "epoch": 1.6888751162070035,
780
+ "grad_norm": 7.0920000076293945,
781
+ "learning_rate": 2.7704782563784733e-05,
782
+ "loss": 0.8283,
783
+ "step": 5450
784
+ },
785
+ {
786
+ "epoch": 1.7043693833281686,
787
+ "grad_norm": 5.436604976654053,
788
+ "learning_rate": 2.7653135006714183e-05,
789
+ "loss": 0.8786,
790
+ "step": 5500
791
+ },
792
+ {
793
+ "epoch": 1.7198636504493336,
794
+ "grad_norm": 4.0141282081604,
795
+ "learning_rate": 2.760148744964363e-05,
796
+ "loss": 0.8452,
797
+ "step": 5550
798
+ },
799
+ {
800
+ "epoch": 1.735357917570499,
801
+ "grad_norm": 5.783074378967285,
802
+ "learning_rate": 2.7549839892573083e-05,
803
+ "loss": 0.8168,
804
+ "step": 5600
805
+ },
806
+ {
807
+ "epoch": 1.750852184691664,
808
+ "grad_norm": 7.773756504058838,
809
+ "learning_rate": 2.7498192335502532e-05,
810
+ "loss": 0.8598,
811
+ "step": 5650
812
+ },
813
+ {
814
+ "epoch": 1.7663464518128293,
815
+ "grad_norm": 5.375339984893799,
816
+ "learning_rate": 2.7446544778431982e-05,
817
+ "loss": 0.8366,
818
+ "step": 5700
819
+ },
820
+ {
821
+ "epoch": 1.7818407189339944,
822
+ "grad_norm": 4.240859031677246,
823
+ "learning_rate": 2.739489722136143e-05,
824
+ "loss": 0.8136,
825
+ "step": 5750
826
+ },
827
+ {
828
+ "epoch": 1.7973349860551595,
829
+ "grad_norm": 6.107599258422852,
830
+ "learning_rate": 2.734324966429088e-05,
831
+ "loss": 0.8074,
832
+ "step": 5800
833
+ },
834
+ {
835
+ "epoch": 1.8128292531763246,
836
+ "grad_norm": 6.027589797973633,
837
+ "learning_rate": 2.7291602107220328e-05,
838
+ "loss": 0.7808,
839
+ "step": 5850
840
+ },
841
+ {
842
+ "epoch": 1.82832352029749,
843
+ "grad_norm": 4.829204559326172,
844
+ "learning_rate": 2.7239954550149777e-05,
845
+ "loss": 0.8473,
846
+ "step": 5900
847
+ },
848
+ {
849
+ "epoch": 1.8438177874186552,
850
+ "grad_norm": 5.385358810424805,
851
+ "learning_rate": 2.7188306993079227e-05,
852
+ "loss": 0.844,
853
+ "step": 5950
854
+ },
855
+ {
856
+ "epoch": 1.8593120545398203,
857
+ "grad_norm": 5.991063594818115,
858
+ "learning_rate": 2.713665943600868e-05,
859
+ "loss": 0.8667,
860
+ "step": 6000
861
+ },
862
+ {
863
+ "epoch": 1.8748063216609854,
864
+ "grad_norm": 4.269604682922363,
865
+ "learning_rate": 2.708501187893813e-05,
866
+ "loss": 0.8987,
867
+ "step": 6050
868
+ },
869
+ {
870
+ "epoch": 1.8903005887821505,
871
+ "grad_norm": 6.90878438949585,
872
+ "learning_rate": 2.7033364321867576e-05,
873
+ "loss": 0.8517,
874
+ "step": 6100
875
+ },
876
+ {
877
+ "epoch": 1.9057948559033158,
878
+ "grad_norm": 8.742233276367188,
879
+ "learning_rate": 2.6981716764797026e-05,
880
+ "loss": 0.8729,
881
+ "step": 6150
882
+ },
883
+ {
884
+ "epoch": 1.921289123024481,
885
+ "grad_norm": 9.10084342956543,
886
+ "learning_rate": 2.6930069207726475e-05,
887
+ "loss": 0.8803,
888
+ "step": 6200
889
+ },
890
+ {
891
+ "epoch": 1.9367833901456462,
892
+ "grad_norm": 4.210537433624268,
893
+ "learning_rate": 2.6878421650655925e-05,
894
+ "loss": 0.7938,
895
+ "step": 6250
896
+ },
897
+ {
898
+ "epoch": 1.9522776572668112,
899
+ "grad_norm": 6.604791641235352,
900
+ "learning_rate": 2.6826774093585375e-05,
901
+ "loss": 0.7958,
902
+ "step": 6300
903
+ },
904
+ {
905
+ "epoch": 1.9677719243879763,
906
+ "grad_norm": 6.213857173919678,
907
+ "learning_rate": 2.677512653651482e-05,
908
+ "loss": 0.8463,
909
+ "step": 6350
910
+ },
911
+ {
912
+ "epoch": 1.9832661915091416,
913
+ "grad_norm": 4.303800582885742,
914
+ "learning_rate": 2.6723478979444274e-05,
915
+ "loss": 0.7909,
916
+ "step": 6400
917
+ },
918
+ {
919
+ "epoch": 1.998760458630307,
920
+ "grad_norm": 4.933095932006836,
921
+ "learning_rate": 2.6671831422373724e-05,
922
+ "loss": 0.7888,
923
+ "step": 6450
924
+ },
925
+ {
926
+ "epoch": 2.0,
927
+ "eval_accuracy": 0.7590724288510766,
928
+ "eval_f1": 0.7587245577707713,
929
+ "eval_loss": 0.7010347247123718,
930
+ "eval_runtime": 25.4199,
931
+ "eval_samples_per_second": 261.252,
932
+ "eval_steps_per_second": 16.365,
933
+ "step": 6454
934
+ },
935
+ {
936
+ "epoch": 2.014254725751472,
937
+ "grad_norm": 4.0570244789123535,
938
+ "learning_rate": 2.6620183865303173e-05,
939
+ "loss": 0.7236,
940
+ "step": 6500
941
+ },
942
+ {
943
+ "epoch": 2.029748992872637,
944
+ "grad_norm": 5.307652473449707,
945
+ "learning_rate": 2.6568536308232623e-05,
946
+ "loss": 0.7213,
947
+ "step": 6550
948
+ },
949
+ {
950
+ "epoch": 2.045243259993802,
951
+ "grad_norm": 5.398072719573975,
952
+ "learning_rate": 2.651688875116207e-05,
953
+ "loss": 0.6839,
954
+ "step": 6600
955
+ },
956
+ {
957
+ "epoch": 2.0607375271149673,
958
+ "grad_norm": 5.296418190002441,
959
+ "learning_rate": 2.646524119409152e-05,
960
+ "loss": 0.6856,
961
+ "step": 6650
962
+ },
963
+ {
964
+ "epoch": 2.076231794236133,
965
+ "grad_norm": 4.173377990722656,
966
+ "learning_rate": 2.641359363702097e-05,
967
+ "loss": 0.7109,
968
+ "step": 6700
969
+ },
970
+ {
971
+ "epoch": 2.091726061357298,
972
+ "grad_norm": 5.590676784515381,
973
+ "learning_rate": 2.636194607995042e-05,
974
+ "loss": 0.6814,
975
+ "step": 6750
976
+ },
977
+ {
978
+ "epoch": 2.107220328478463,
979
+ "grad_norm": 8.112780570983887,
980
+ "learning_rate": 2.631029852287987e-05,
981
+ "loss": 0.7302,
982
+ "step": 6800
983
+ },
984
+ {
985
+ "epoch": 2.122714595599628,
986
+ "grad_norm": 6.514364242553711,
987
+ "learning_rate": 2.6258650965809318e-05,
988
+ "loss": 0.6917,
989
+ "step": 6850
990
+ },
991
+ {
992
+ "epoch": 2.138208862720793,
993
+ "grad_norm": 8.156841278076172,
994
+ "learning_rate": 2.6207003408738767e-05,
995
+ "loss": 0.6568,
996
+ "step": 6900
997
+ },
998
+ {
999
+ "epoch": 2.1537031298419587,
1000
+ "grad_norm": 7.641481876373291,
1001
+ "learning_rate": 2.6155355851668217e-05,
1002
+ "loss": 0.6132,
1003
+ "step": 6950
1004
+ },
1005
+ {
1006
+ "epoch": 2.1691973969631237,
1007
+ "grad_norm": 6.33613395690918,
1008
+ "learning_rate": 2.6103708294597667e-05,
1009
+ "loss": 0.6393,
1010
+ "step": 7000
1011
+ },
1012
+ {
1013
+ "epoch": 2.184691664084289,
1014
+ "grad_norm": 4.2916436195373535,
1015
+ "learning_rate": 2.6052060737527116e-05,
1016
+ "loss": 0.6709,
1017
+ "step": 7050
1018
+ },
1019
+ {
1020
+ "epoch": 2.200185931205454,
1021
+ "grad_norm": 4.763488292694092,
1022
+ "learning_rate": 2.6000413180456563e-05,
1023
+ "loss": 0.6919,
1024
+ "step": 7100
1025
+ },
1026
+ {
1027
+ "epoch": 2.215680198326619,
1028
+ "grad_norm": 8.614394187927246,
1029
+ "learning_rate": 2.5948765623386012e-05,
1030
+ "loss": 0.6501,
1031
+ "step": 7150
1032
+ },
1033
+ {
1034
+ "epoch": 2.2311744654477845,
1035
+ "grad_norm": 9.684426307678223,
1036
+ "learning_rate": 2.5897118066315465e-05,
1037
+ "loss": 0.6947,
1038
+ "step": 7200
1039
+ },
1040
+ {
1041
+ "epoch": 2.2466687325689496,
1042
+ "grad_norm": 6.210818767547607,
1043
+ "learning_rate": 2.5845470509244915e-05,
1044
+ "loss": 0.6873,
1045
+ "step": 7250
1046
+ },
1047
+ {
1048
+ "epoch": 2.2621629996901147,
1049
+ "grad_norm": 6.774372577667236,
1050
+ "learning_rate": 2.5793822952174365e-05,
1051
+ "loss": 0.7195,
1052
+ "step": 7300
1053
+ },
1054
+ {
1055
+ "epoch": 2.27765726681128,
1056
+ "grad_norm": 6.014688491821289,
1057
+ "learning_rate": 2.574217539510381e-05,
1058
+ "loss": 0.6298,
1059
+ "step": 7350
1060
+ },
1061
+ {
1062
+ "epoch": 2.293151533932445,
1063
+ "grad_norm": 14.994784355163574,
1064
+ "learning_rate": 2.569052783803326e-05,
1065
+ "loss": 0.7403,
1066
+ "step": 7400
1067
+ },
1068
+ {
1069
+ "epoch": 2.3086458010536104,
1070
+ "grad_norm": 6.315488815307617,
1071
+ "learning_rate": 2.563888028096271e-05,
1072
+ "loss": 0.6679,
1073
+ "step": 7450
1074
+ },
1075
+ {
1076
+ "epoch": 2.3241400681747755,
1077
+ "grad_norm": 8.482314109802246,
1078
+ "learning_rate": 2.558723272389216e-05,
1079
+ "loss": 0.7173,
1080
+ "step": 7500
1081
+ },
1082
+ {
1083
+ "epoch": 2.3396343352959406,
1084
+ "grad_norm": 10.161298751831055,
1085
+ "learning_rate": 2.553558516682161e-05,
1086
+ "loss": 0.732,
1087
+ "step": 7550
1088
+ },
1089
+ {
1090
+ "epoch": 2.3551286024171056,
1091
+ "grad_norm": 6.758267402648926,
1092
+ "learning_rate": 2.548393760975106e-05,
1093
+ "loss": 0.6192,
1094
+ "step": 7600
1095
+ },
1096
+ {
1097
+ "epoch": 2.3706228695382707,
1098
+ "grad_norm": 4.528532981872559,
1099
+ "learning_rate": 2.543229005268051e-05,
1100
+ "loss": 0.7614,
1101
+ "step": 7650
1102
+ },
1103
+ {
1104
+ "epoch": 2.3861171366594363,
1105
+ "grad_norm": 6.397975921630859,
1106
+ "learning_rate": 2.538064249560996e-05,
1107
+ "loss": 0.6951,
1108
+ "step": 7700
1109
+ },
1110
+ {
1111
+ "epoch": 2.4016114037806013,
1112
+ "grad_norm": 5.440258979797363,
1113
+ "learning_rate": 2.532899493853941e-05,
1114
+ "loss": 0.7188,
1115
+ "step": 7750
1116
+ },
1117
+ {
1118
+ "epoch": 2.4171056709017664,
1119
+ "grad_norm": 2.4531173706054688,
1120
+ "learning_rate": 2.5277347381468858e-05,
1121
+ "loss": 0.6347,
1122
+ "step": 7800
1123
+ },
1124
+ {
1125
+ "epoch": 2.4325999380229315,
1126
+ "grad_norm": 15.269991874694824,
1127
+ "learning_rate": 2.5225699824398304e-05,
1128
+ "loss": 0.6601,
1129
+ "step": 7850
1130
+ },
1131
+ {
1132
+ "epoch": 2.4480942051440966,
1133
+ "grad_norm": 6.438554286956787,
1134
+ "learning_rate": 2.5174052267327754e-05,
1135
+ "loss": 0.698,
1136
+ "step": 7900
1137
+ },
1138
+ {
1139
+ "epoch": 2.4635884722652617,
1140
+ "grad_norm": 8.922213554382324,
1141
+ "learning_rate": 2.5122404710257204e-05,
1142
+ "loss": 0.6958,
1143
+ "step": 7950
1144
+ },
1145
+ {
1146
+ "epoch": 2.479082739386427,
1147
+ "grad_norm": 6.724533557891846,
1148
+ "learning_rate": 2.5070757153186657e-05,
1149
+ "loss": 0.7131,
1150
+ "step": 8000
1151
+ },
1152
+ {
1153
+ "epoch": 2.4945770065075923,
1154
+ "grad_norm": 5.617169380187988,
1155
+ "learning_rate": 2.5019109596116107e-05,
1156
+ "loss": 0.7711,
1157
+ "step": 8050
1158
+ },
1159
+ {
1160
+ "epoch": 2.5100712736287574,
1161
+ "grad_norm": 6.441185474395752,
1162
+ "learning_rate": 2.4967462039045553e-05,
1163
+ "loss": 0.6612,
1164
+ "step": 8100
1165
+ },
1166
+ {
1167
+ "epoch": 2.5255655407499225,
1168
+ "grad_norm": 6.033916473388672,
1169
+ "learning_rate": 2.4915814481975003e-05,
1170
+ "loss": 0.698,
1171
+ "step": 8150
1172
+ },
1173
+ {
1174
+ "epoch": 2.5410598078710875,
1175
+ "grad_norm": 6.174665451049805,
1176
+ "learning_rate": 2.4864166924904452e-05,
1177
+ "loss": 0.6968,
1178
+ "step": 8200
1179
+ },
1180
+ {
1181
+ "epoch": 2.5565540749922526,
1182
+ "grad_norm": 20.01167869567871,
1183
+ "learning_rate": 2.4812519367833902e-05,
1184
+ "loss": 0.6456,
1185
+ "step": 8250
1186
+ },
1187
+ {
1188
+ "epoch": 2.572048342113418,
1189
+ "grad_norm": 10.404682159423828,
1190
+ "learning_rate": 2.476087181076335e-05,
1191
+ "loss": 0.6808,
1192
+ "step": 8300
1193
+ },
1194
+ {
1195
+ "epoch": 2.5875426092345832,
1196
+ "grad_norm": 5.160488128662109,
1197
+ "learning_rate": 2.47092242536928e-05,
1198
+ "loss": 0.6913,
1199
+ "step": 8350
1200
+ },
1201
+ {
1202
+ "epoch": 2.6030368763557483,
1203
+ "grad_norm": 6.452591896057129,
1204
+ "learning_rate": 2.465757669662225e-05,
1205
+ "loss": 0.6594,
1206
+ "step": 8400
1207
+ },
1208
+ {
1209
+ "epoch": 2.6185311434769134,
1210
+ "grad_norm": 12.436300277709961,
1211
+ "learning_rate": 2.46059291395517e-05,
1212
+ "loss": 0.7255,
1213
+ "step": 8450
1214
+ },
1215
+ {
1216
+ "epoch": 2.6340254105980785,
1217
+ "grad_norm": 6.132791042327881,
1218
+ "learning_rate": 2.455428158248115e-05,
1219
+ "loss": 0.6753,
1220
+ "step": 8500
1221
+ },
1222
+ {
1223
+ "epoch": 2.649519677719244,
1224
+ "grad_norm": 10.712909698486328,
1225
+ "learning_rate": 2.45026340254106e-05,
1226
+ "loss": 0.6445,
1227
+ "step": 8550
1228
+ },
1229
+ {
1230
+ "epoch": 2.665013944840409,
1231
+ "grad_norm": 12.122429847717285,
1232
+ "learning_rate": 2.445098646834005e-05,
1233
+ "loss": 0.6424,
1234
+ "step": 8600
1235
+ },
1236
+ {
1237
+ "epoch": 2.680508211961574,
1238
+ "grad_norm": 8.575897216796875,
1239
+ "learning_rate": 2.4399338911269496e-05,
1240
+ "loss": 0.7242,
1241
+ "step": 8650
1242
+ },
1243
+ {
1244
+ "epoch": 2.6960024790827393,
1245
+ "grad_norm": 8.740906715393066,
1246
+ "learning_rate": 2.4347691354198946e-05,
1247
+ "loss": 0.6949,
1248
+ "step": 8700
1249
+ },
1250
+ {
1251
+ "epoch": 2.7114967462039044,
1252
+ "grad_norm": 4.871994972229004,
1253
+ "learning_rate": 2.4296043797128395e-05,
1254
+ "loss": 0.787,
1255
+ "step": 8750
1256
+ },
1257
+ {
1258
+ "epoch": 2.72699101332507,
1259
+ "grad_norm": 6.642944812774658,
1260
+ "learning_rate": 2.424439624005785e-05,
1261
+ "loss": 0.6925,
1262
+ "step": 8800
1263
+ },
1264
+ {
1265
+ "epoch": 2.742485280446235,
1266
+ "grad_norm": 12.149236679077148,
1267
+ "learning_rate": 2.4192748682987298e-05,
1268
+ "loss": 0.6972,
1269
+ "step": 8850
1270
+ },
1271
+ {
1272
+ "epoch": 2.7579795475674,
1273
+ "grad_norm": 8.100613594055176,
1274
+ "learning_rate": 2.4141101125916744e-05,
1275
+ "loss": 0.7358,
1276
+ "step": 8900
1277
+ },
1278
+ {
1279
+ "epoch": 2.773473814688565,
1280
+ "grad_norm": 12.28987979888916,
1281
+ "learning_rate": 2.4089453568846194e-05,
1282
+ "loss": 0.7176,
1283
+ "step": 8950
1284
+ },
1285
+ {
1286
+ "epoch": 2.78896808180973,
1287
+ "grad_norm": 9.355488777160645,
1288
+ "learning_rate": 2.4037806011775644e-05,
1289
+ "loss": 0.6856,
1290
+ "step": 9000
1291
+ },
1292
+ {
1293
+ "epoch": 2.8044623489308957,
1294
+ "grad_norm": 11.875406265258789,
1295
+ "learning_rate": 2.3986158454705093e-05,
1296
+ "loss": 0.6501,
1297
+ "step": 9050
1298
+ },
1299
+ {
1300
+ "epoch": 2.819956616052061,
1301
+ "grad_norm": 8.061235427856445,
1302
+ "learning_rate": 2.3934510897634543e-05,
1303
+ "loss": 0.6823,
1304
+ "step": 9100
1305
+ },
1306
+ {
1307
+ "epoch": 2.835450883173226,
1308
+ "grad_norm": 7.949320316314697,
1309
+ "learning_rate": 2.388286334056399e-05,
1310
+ "loss": 0.6764,
1311
+ "step": 9150
1312
+ },
1313
+ {
1314
+ "epoch": 2.850945150294391,
1315
+ "grad_norm": 5.9249587059021,
1316
+ "learning_rate": 2.3831215783493442e-05,
1317
+ "loss": 0.6511,
1318
+ "step": 9200
1319
+ },
1320
+ {
1321
+ "epoch": 2.866439417415556,
1322
+ "grad_norm": 8.400185585021973,
1323
+ "learning_rate": 2.3779568226422892e-05,
1324
+ "loss": 0.6515,
1325
+ "step": 9250
1326
+ },
1327
+ {
1328
+ "epoch": 2.8819336845367216,
1329
+ "grad_norm": 11.487894058227539,
1330
+ "learning_rate": 2.3727920669352342e-05,
1331
+ "loss": 0.6719,
1332
+ "step": 9300
1333
+ },
1334
+ {
1335
+ "epoch": 2.8974279516578867,
1336
+ "grad_norm": 8.317901611328125,
1337
+ "learning_rate": 2.367627311228179e-05,
1338
+ "loss": 0.6697,
1339
+ "step": 9350
1340
+ },
1341
+ {
1342
+ "epoch": 2.9129222187790518,
1343
+ "grad_norm": 9.878332138061523,
1344
+ "learning_rate": 2.3624625555211238e-05,
1345
+ "loss": 0.6801,
1346
+ "step": 9400
1347
+ },
1348
+ {
1349
+ "epoch": 2.928416485900217,
1350
+ "grad_norm": 8.855628967285156,
1351
+ "learning_rate": 2.3572977998140687e-05,
1352
+ "loss": 0.6445,
1353
+ "step": 9450
1354
+ },
1355
+ {
1356
+ "epoch": 2.943910753021382,
1357
+ "grad_norm": 5.350094318389893,
1358
+ "learning_rate": 2.3521330441070137e-05,
1359
+ "loss": 0.6891,
1360
+ "step": 9500
1361
+ },
1362
+ {
1363
+ "epoch": 2.9594050201425475,
1364
+ "grad_norm": 8.540812492370605,
1365
+ "learning_rate": 2.3469682883999587e-05,
1366
+ "loss": 0.6556,
1367
+ "step": 9550
1368
+ },
1369
+ {
1370
+ "epoch": 2.9748992872637126,
1371
+ "grad_norm": 4.337664604187012,
1372
+ "learning_rate": 2.341803532692904e-05,
1373
+ "loss": 0.7013,
1374
+ "step": 9600
1375
+ },
1376
+ {
1377
+ "epoch": 2.9903935543848776,
1378
+ "grad_norm": 7.002617359161377,
1379
+ "learning_rate": 2.3366387769858486e-05,
1380
+ "loss": 0.6518,
1381
+ "step": 9650
1382
+ },
1383
+ {
1384
+ "epoch": 3.0,
1385
+ "eval_accuracy": 0.777292576419214,
1386
+ "eval_f1": 0.775972842719567,
1387
+ "eval_loss": 0.69657963514328,
1388
+ "eval_runtime": 25.4856,
1389
+ "eval_samples_per_second": 260.579,
1390
+ "eval_steps_per_second": 16.323,
1391
+ "step": 9681
1392
+ },
1393
+ {
1394
+ "epoch": 3.0058878215060427,
1395
+ "grad_norm": 9.228548049926758,
1396
+ "learning_rate": 2.3314740212787936e-05,
1397
+ "loss": 0.5385,
1398
+ "step": 9700
1399
+ },
1400
+ {
1401
+ "epoch": 3.021382088627208,
1402
+ "grad_norm": 4.332932472229004,
1403
+ "learning_rate": 2.3263092655717385e-05,
1404
+ "loss": 0.5513,
1405
+ "step": 9750
1406
+ },
1407
+ {
1408
+ "epoch": 3.036876355748373,
1409
+ "grad_norm": 6.478864669799805,
1410
+ "learning_rate": 2.3211445098646835e-05,
1411
+ "loss": 0.4542,
1412
+ "step": 9800
1413
+ },
1414
+ {
1415
+ "epoch": 3.0523706228695384,
1416
+ "grad_norm": 14.028499603271484,
1417
+ "learning_rate": 2.3159797541576285e-05,
1418
+ "loss": 0.4549,
1419
+ "step": 9850
1420
+ },
1421
+ {
1422
+ "epoch": 3.0678648899907035,
1423
+ "grad_norm": 5.590787887573242,
1424
+ "learning_rate": 2.310814998450573e-05,
1425
+ "loss": 0.4624,
1426
+ "step": 9900
1427
+ },
1428
+ {
1429
+ "epoch": 3.0833591571118686,
1430
+ "grad_norm": 5.623167514801025,
1431
+ "learning_rate": 2.305650242743518e-05,
1432
+ "loss": 0.4479,
1433
+ "step": 9950
1434
+ },
1435
+ {
1436
+ "epoch": 3.0988534242330337,
1437
+ "grad_norm": 10.343826293945312,
1438
+ "learning_rate": 2.3004854870364634e-05,
1439
+ "loss": 0.5079,
1440
+ "step": 10000
1441
+ },
1442
+ {
1443
+ "epoch": 3.1143476913541988,
1444
+ "grad_norm": 2.780686616897583,
1445
+ "learning_rate": 2.2953207313294084e-05,
1446
+ "loss": 0.43,
1447
+ "step": 10050
1448
+ },
1449
+ {
1450
+ "epoch": 3.1298419584753643,
1451
+ "grad_norm": 10.917914390563965,
1452
+ "learning_rate": 2.2901559756223533e-05,
1453
+ "loss": 0.4932,
1454
+ "step": 10100
1455
+ },
1456
+ {
1457
+ "epoch": 3.1453362255965294,
1458
+ "grad_norm": 14.870561599731445,
1459
+ "learning_rate": 2.284991219915298e-05,
1460
+ "loss": 0.483,
1461
+ "step": 10150
1462
+ },
1463
+ {
1464
+ "epoch": 3.1608304927176945,
1465
+ "grad_norm": 15.64564323425293,
1466
+ "learning_rate": 2.279826464208243e-05,
1467
+ "loss": 0.5047,
1468
+ "step": 10200
1469
+ },
1470
+ {
1471
+ "epoch": 3.1763247598388595,
1472
+ "grad_norm": 8.148391723632812,
1473
+ "learning_rate": 2.274661708501188e-05,
1474
+ "loss": 0.498,
1475
+ "step": 10250
1476
+ },
1477
+ {
1478
+ "epoch": 3.1918190269600246,
1479
+ "grad_norm": 9.916448593139648,
1480
+ "learning_rate": 2.269496952794133e-05,
1481
+ "loss": 0.4896,
1482
+ "step": 10300
1483
+ },
1484
+ {
1485
+ "epoch": 3.20731329408119,
1486
+ "grad_norm": 10.014134407043457,
1487
+ "learning_rate": 2.2643321970870778e-05,
1488
+ "loss": 0.4572,
1489
+ "step": 10350
1490
+ },
1491
+ {
1492
+ "epoch": 3.2228075612023552,
1493
+ "grad_norm": 9.647527694702148,
1494
+ "learning_rate": 2.2591674413800228e-05,
1495
+ "loss": 0.4965,
1496
+ "step": 10400
1497
+ },
1498
+ {
1499
+ "epoch": 3.2383018283235203,
1500
+ "grad_norm": 11.77087116241455,
1501
+ "learning_rate": 2.2540026856729678e-05,
1502
+ "loss": 0.512,
1503
+ "step": 10450
1504
+ },
1505
+ {
1506
+ "epoch": 3.2537960954446854,
1507
+ "grad_norm": 3.3613386154174805,
1508
+ "learning_rate": 2.2488379299659127e-05,
1509
+ "loss": 0.5522,
1510
+ "step": 10500
1511
+ },
1512
+ {
1513
+ "epoch": 3.2692903625658505,
1514
+ "grad_norm": 17.92693519592285,
1515
+ "learning_rate": 2.2436731742588577e-05,
1516
+ "loss": 0.4915,
1517
+ "step": 10550
1518
+ },
1519
+ {
1520
+ "epoch": 3.2847846296870156,
1521
+ "grad_norm": 8.389365196228027,
1522
+ "learning_rate": 2.2385084185518027e-05,
1523
+ "loss": 0.5343,
1524
+ "step": 10600
1525
+ },
1526
+ {
1527
+ "epoch": 3.300278896808181,
1528
+ "grad_norm": 9.849445343017578,
1529
+ "learning_rate": 2.2333436628447473e-05,
1530
+ "loss": 0.4925,
1531
+ "step": 10650
1532
+ },
1533
+ {
1534
+ "epoch": 3.315773163929346,
1535
+ "grad_norm": 7.494227886199951,
1536
+ "learning_rate": 2.2281789071376923e-05,
1537
+ "loss": 0.5242,
1538
+ "step": 10700
1539
+ },
1540
+ {
1541
+ "epoch": 3.3312674310505113,
1542
+ "grad_norm": 12.774617195129395,
1543
+ "learning_rate": 2.2230141514306372e-05,
1544
+ "loss": 0.522,
1545
+ "step": 10750
1546
+ },
1547
+ {
1548
+ "epoch": 3.3467616981716763,
1549
+ "grad_norm": 4.167229175567627,
1550
+ "learning_rate": 2.2178493957235822e-05,
1551
+ "loss": 0.4852,
1552
+ "step": 10800
1553
+ },
1554
+ {
1555
+ "epoch": 3.3622559652928414,
1556
+ "grad_norm": 7.823596000671387,
1557
+ "learning_rate": 2.2126846400165275e-05,
1558
+ "loss": 0.521,
1559
+ "step": 10850
1560
+ },
1561
+ {
1562
+ "epoch": 3.377750232414007,
1563
+ "grad_norm": 9.712186813354492,
1564
+ "learning_rate": 2.2075198843094725e-05,
1565
+ "loss": 0.4931,
1566
+ "step": 10900
1567
+ },
1568
+ {
1569
+ "epoch": 3.393244499535172,
1570
+ "grad_norm": 9.726935386657715,
1571
+ "learning_rate": 2.202355128602417e-05,
1572
+ "loss": 0.531,
1573
+ "step": 10950
1574
+ },
1575
+ {
1576
+ "epoch": 3.408738766656337,
1577
+ "grad_norm": 8.613348007202148,
1578
+ "learning_rate": 2.197190372895362e-05,
1579
+ "loss": 0.4902,
1580
+ "step": 11000
1581
+ },
1582
+ {
1583
+ "epoch": 3.424233033777502,
1584
+ "grad_norm": 17.698650360107422,
1585
+ "learning_rate": 2.192025617188307e-05,
1586
+ "loss": 0.4967,
1587
+ "step": 11050
1588
+ },
1589
+ {
1590
+ "epoch": 3.4397273008986673,
1591
+ "grad_norm": 13.304680824279785,
1592
+ "learning_rate": 2.186860861481252e-05,
1593
+ "loss": 0.4998,
1594
+ "step": 11100
1595
+ },
1596
+ {
1597
+ "epoch": 3.455221568019833,
1598
+ "grad_norm": 9.090615272521973,
1599
+ "learning_rate": 2.181696105774197e-05,
1600
+ "loss": 0.4797,
1601
+ "step": 11150
1602
+ },
1603
+ {
1604
+ "epoch": 3.470715835140998,
1605
+ "grad_norm": 6.544071197509766,
1606
+ "learning_rate": 2.176531350067142e-05,
1607
+ "loss": 0.5405,
1608
+ "step": 11200
1609
+ },
1610
+ {
1611
+ "epoch": 3.486210102262163,
1612
+ "grad_norm": 10.908158302307129,
1613
+ "learning_rate": 2.171366594360087e-05,
1614
+ "loss": 0.4663,
1615
+ "step": 11250
1616
+ },
1617
+ {
1618
+ "epoch": 3.501704369383328,
1619
+ "grad_norm": 9.044700622558594,
1620
+ "learning_rate": 2.166201838653032e-05,
1621
+ "loss": 0.4755,
1622
+ "step": 11300
1623
+ },
1624
+ {
1625
+ "epoch": 3.517198636504493,
1626
+ "grad_norm": 7.633232116699219,
1627
+ "learning_rate": 2.161037082945977e-05,
1628
+ "loss": 0.4182,
1629
+ "step": 11350
1630
+ },
1631
+ {
1632
+ "epoch": 3.5326929036256587,
1633
+ "grad_norm": 5.32473087310791,
1634
+ "learning_rate": 2.1558723272389218e-05,
1635
+ "loss": 0.4987,
1636
+ "step": 11400
1637
+ },
1638
+ {
1639
+ "epoch": 3.5481871707468238,
1640
+ "grad_norm": 9.8456392288208,
1641
+ "learning_rate": 2.1507075715318664e-05,
1642
+ "loss": 0.587,
1643
+ "step": 11450
1644
+ },
1645
+ {
1646
+ "epoch": 3.563681437867989,
1647
+ "grad_norm": 12.52115535736084,
1648
+ "learning_rate": 2.1455428158248114e-05,
1649
+ "loss": 0.5331,
1650
+ "step": 11500
1651
+ },
1652
+ {
1653
+ "epoch": 3.579175704989154,
1654
+ "grad_norm": 18.225566864013672,
1655
+ "learning_rate": 2.1403780601177564e-05,
1656
+ "loss": 0.4794,
1657
+ "step": 11550
1658
+ },
1659
+ {
1660
+ "epoch": 3.594669972110319,
1661
+ "grad_norm": 8.749368667602539,
1662
+ "learning_rate": 2.1352133044107013e-05,
1663
+ "loss": 0.4967,
1664
+ "step": 11600
1665
+ },
1666
+ {
1667
+ "epoch": 3.6101642392314846,
1668
+ "grad_norm": 8.760223388671875,
1669
+ "learning_rate": 2.1300485487036466e-05,
1670
+ "loss": 0.4963,
1671
+ "step": 11650
1672
+ },
1673
+ {
1674
+ "epoch": 3.6256585063526496,
1675
+ "grad_norm": 15.518270492553711,
1676
+ "learning_rate": 2.1248837929965913e-05,
1677
+ "loss": 0.4456,
1678
+ "step": 11700
1679
+ },
1680
+ {
1681
+ "epoch": 3.6411527734738147,
1682
+ "grad_norm": 9.451664924621582,
1683
+ "learning_rate": 2.1197190372895362e-05,
1684
+ "loss": 0.5235,
1685
+ "step": 11750
1686
+ },
1687
+ {
1688
+ "epoch": 3.65664704059498,
1689
+ "grad_norm": 17.736055374145508,
1690
+ "learning_rate": 2.1145542815824812e-05,
1691
+ "loss": 0.4876,
1692
+ "step": 11800
1693
+ },
1694
+ {
1695
+ "epoch": 3.672141307716145,
1696
+ "grad_norm": 24.323490142822266,
1697
+ "learning_rate": 2.1093895258754262e-05,
1698
+ "loss": 0.483,
1699
+ "step": 11850
1700
+ },
1701
+ {
1702
+ "epoch": 3.6876355748373104,
1703
+ "grad_norm": 15.389254570007324,
1704
+ "learning_rate": 2.104224770168371e-05,
1705
+ "loss": 0.5127,
1706
+ "step": 11900
1707
+ },
1708
+ {
1709
+ "epoch": 3.7031298419584755,
1710
+ "grad_norm": 11.283272743225098,
1711
+ "learning_rate": 2.0990600144613158e-05,
1712
+ "loss": 0.5282,
1713
+ "step": 11950
1714
+ },
1715
+ {
1716
+ "epoch": 3.7186241090796406,
1717
+ "grad_norm": 11.002310752868652,
1718
+ "learning_rate": 2.0938952587542607e-05,
1719
+ "loss": 0.5459,
1720
+ "step": 12000
1721
+ },
1722
+ {
1723
+ "epoch": 3.7341183762008057,
1724
+ "grad_norm": 6.972140312194824,
1725
+ "learning_rate": 2.088730503047206e-05,
1726
+ "loss": 0.536,
1727
+ "step": 12050
1728
+ },
1729
+ {
1730
+ "epoch": 3.7496126433219708,
1731
+ "grad_norm": 4.202858924865723,
1732
+ "learning_rate": 2.083565747340151e-05,
1733
+ "loss": 0.5736,
1734
+ "step": 12100
1735
+ },
1736
+ {
1737
+ "epoch": 3.7651069104431363,
1738
+ "grad_norm": 15.748515129089355,
1739
+ "learning_rate": 2.078400991633096e-05,
1740
+ "loss": 0.4715,
1741
+ "step": 12150
1742
+ },
1743
+ {
1744
+ "epoch": 3.7806011775643014,
1745
+ "grad_norm": 6.696774482727051,
1746
+ "learning_rate": 2.0732362359260406e-05,
1747
+ "loss": 0.5545,
1748
+ "step": 12200
1749
+ },
1750
+ {
1751
+ "epoch": 3.7960954446854664,
1752
+ "grad_norm": 7.366288661956787,
1753
+ "learning_rate": 2.0680714802189856e-05,
1754
+ "loss": 0.5736,
1755
+ "step": 12250
1756
+ },
1757
+ {
1758
+ "epoch": 3.8115897118066315,
1759
+ "grad_norm": 13.58438777923584,
1760
+ "learning_rate": 2.0629067245119306e-05,
1761
+ "loss": 0.4255,
1762
+ "step": 12300
1763
+ },
1764
+ {
1765
+ "epoch": 3.8270839789277966,
1766
+ "grad_norm": 9.109688758850098,
1767
+ "learning_rate": 2.0577419688048755e-05,
1768
+ "loss": 0.4565,
1769
+ "step": 12350
1770
+ },
1771
+ {
1772
+ "epoch": 3.842578246048962,
1773
+ "grad_norm": 11.448044776916504,
1774
+ "learning_rate": 2.0525772130978205e-05,
1775
+ "loss": 0.5117,
1776
+ "step": 12400
1777
+ },
1778
+ {
1779
+ "epoch": 3.858072513170127,
1780
+ "grad_norm": 6.876945495605469,
1781
+ "learning_rate": 2.0474124573907655e-05,
1782
+ "loss": 0.5543,
1783
+ "step": 12450
1784
+ },
1785
+ {
1786
+ "epoch": 3.8735667802912923,
1787
+ "grad_norm": 11.25009536743164,
1788
+ "learning_rate": 2.0422477016837104e-05,
1789
+ "loss": 0.456,
1790
+ "step": 12500
1791
+ },
1792
+ {
1793
+ "epoch": 3.8890610474124574,
1794
+ "grad_norm": 13.992502212524414,
1795
+ "learning_rate": 2.0370829459766554e-05,
1796
+ "loss": 0.4907,
1797
+ "step": 12550
1798
+ },
1799
+ {
1800
+ "epoch": 3.9045553145336225,
1801
+ "grad_norm": 11.92656421661377,
1802
+ "learning_rate": 2.0319181902696004e-05,
1803
+ "loss": 0.4841,
1804
+ "step": 12600
1805
+ },
1806
+ {
1807
+ "epoch": 3.9200495816547876,
1808
+ "grad_norm": 7.212582111358643,
1809
+ "learning_rate": 2.0267534345625453e-05,
1810
+ "loss": 0.5529,
1811
+ "step": 12650
1812
+ },
1813
+ {
1814
+ "epoch": 3.9355438487759526,
1815
+ "grad_norm": 14.616645812988281,
1816
+ "learning_rate": 2.02158867885549e-05,
1817
+ "loss": 0.5366,
1818
+ "step": 12700
1819
+ },
1820
+ {
1821
+ "epoch": 3.951038115897118,
1822
+ "grad_norm": 9.052292823791504,
1823
+ "learning_rate": 2.016423923148435e-05,
1824
+ "loss": 0.5459,
1825
+ "step": 12750
1826
+ },
1827
+ {
1828
+ "epoch": 3.9665323830182833,
1829
+ "grad_norm": 18.27539825439453,
1830
+ "learning_rate": 2.01125916744138e-05,
1831
+ "loss": 0.5631,
1832
+ "step": 12800
1833
+ },
1834
+ {
1835
+ "epoch": 3.9820266501394483,
1836
+ "grad_norm": 12.429372787475586,
1837
+ "learning_rate": 2.0060944117343252e-05,
1838
+ "loss": 0.4885,
1839
+ "step": 12850
1840
+ },
1841
+ {
1842
+ "epoch": 3.9975209172606134,
1843
+ "grad_norm": 4.481673240661621,
1844
+ "learning_rate": 2.00092965602727e-05,
1845
+ "loss": 0.4565,
1846
+ "step": 12900
1847
+ },
1848
+ {
1849
+ "epoch": 4.0,
1850
+ "eval_accuracy": 0.7863273603372986,
1851
+ "eval_f1": 0.7874334496964743,
1852
+ "eval_loss": 0.7491569519042969,
1853
+ "eval_runtime": 25.5609,
1854
+ "eval_samples_per_second": 259.811,
1855
+ "eval_steps_per_second": 16.275,
1856
+ "step": 12908
1857
+ },
1858
+ {
1859
+ "epoch": 4.0130151843817785,
1860
+ "grad_norm": 21.53974151611328,
1861
+ "learning_rate": 1.9957649003202148e-05,
1862
+ "loss": 0.2695,
1863
+ "step": 12950
1864
+ },
1865
+ {
1866
+ "epoch": 4.028509451502944,
1867
+ "grad_norm": 9.07942008972168,
1868
+ "learning_rate": 1.9906001446131598e-05,
1869
+ "loss": 0.3282,
1870
+ "step": 13000
1871
+ },
1872
+ {
1873
+ "epoch": 4.044003718624109,
1874
+ "grad_norm": 16.323549270629883,
1875
+ "learning_rate": 1.9854353889061047e-05,
1876
+ "loss": 0.3079,
1877
+ "step": 13050
1878
+ },
1879
+ {
1880
+ "epoch": 4.059497985745274,
1881
+ "grad_norm": 6.679697036743164,
1882
+ "learning_rate": 1.9802706331990497e-05,
1883
+ "loss": 0.3889,
1884
+ "step": 13100
1885
+ },
1886
+ {
1887
+ "epoch": 4.07499225286644,
1888
+ "grad_norm": 17.357574462890625,
1889
+ "learning_rate": 1.9751058774919947e-05,
1890
+ "loss": 0.3336,
1891
+ "step": 13150
1892
+ },
1893
+ {
1894
+ "epoch": 4.090486519987604,
1895
+ "grad_norm": 5.116195201873779,
1896
+ "learning_rate": 1.9699411217849393e-05,
1897
+ "loss": 0.3102,
1898
+ "step": 13200
1899
+ },
1900
+ {
1901
+ "epoch": 4.10598078710877,
1902
+ "grad_norm": 29.05538558959961,
1903
+ "learning_rate": 1.9647763660778846e-05,
1904
+ "loss": 0.2902,
1905
+ "step": 13250
1906
+ },
1907
+ {
1908
+ "epoch": 4.1214750542299345,
1909
+ "grad_norm": 6.254473686218262,
1910
+ "learning_rate": 1.9596116103708296e-05,
1911
+ "loss": 0.3816,
1912
+ "step": 13300
1913
+ },
1914
+ {
1915
+ "epoch": 4.1369693213511,
1916
+ "grad_norm": 11.854185104370117,
1917
+ "learning_rate": 1.9544468546637745e-05,
1918
+ "loss": 0.3455,
1919
+ "step": 13350
1920
+ },
1921
+ {
1922
+ "epoch": 4.152463588472266,
1923
+ "grad_norm": 16.399444580078125,
1924
+ "learning_rate": 1.9492820989567195e-05,
1925
+ "loss": 0.3713,
1926
+ "step": 13400
1927
+ },
1928
+ {
1929
+ "epoch": 4.16795785559343,
1930
+ "grad_norm": 18.26226234436035,
1931
+ "learning_rate": 1.9441173432496645e-05,
1932
+ "loss": 0.2957,
1933
+ "step": 13450
1934
+ },
1935
+ {
1936
+ "epoch": 4.183452122714596,
1937
+ "grad_norm": 6.590181350708008,
1938
+ "learning_rate": 1.938952587542609e-05,
1939
+ "loss": 0.2905,
1940
+ "step": 13500
1941
+ },
1942
+ {
1943
+ "epoch": 4.19894638983576,
1944
+ "grad_norm": 5.3814849853515625,
1945
+ "learning_rate": 1.933787831835554e-05,
1946
+ "loss": 0.3782,
1947
+ "step": 13550
1948
+ },
1949
+ {
1950
+ "epoch": 4.214440656956926,
1951
+ "grad_norm": 8.641956329345703,
1952
+ "learning_rate": 1.928623076128499e-05,
1953
+ "loss": 0.3211,
1954
+ "step": 13600
1955
+ },
1956
+ {
1957
+ "epoch": 4.2299349240780915,
1958
+ "grad_norm": 14.346405982971191,
1959
+ "learning_rate": 1.9234583204214443e-05,
1960
+ "loss": 0.3274,
1961
+ "step": 13650
1962
+ },
1963
+ {
1964
+ "epoch": 4.245429191199256,
1965
+ "grad_norm": 15.577725410461426,
1966
+ "learning_rate": 1.9182935647143893e-05,
1967
+ "loss": 0.3568,
1968
+ "step": 13700
1969
+ },
1970
+ {
1971
+ "epoch": 4.260923458320422,
1972
+ "grad_norm": 9.855398178100586,
1973
+ "learning_rate": 1.913128809007334e-05,
1974
+ "loss": 0.3008,
1975
+ "step": 13750
1976
+ },
1977
+ {
1978
+ "epoch": 4.276417725441586,
1979
+ "grad_norm": 15.720294952392578,
1980
+ "learning_rate": 1.907964053300279e-05,
1981
+ "loss": 0.2979,
1982
+ "step": 13800
1983
+ },
1984
+ {
1985
+ "epoch": 4.291911992562752,
1986
+ "grad_norm": 13.976778030395508,
1987
+ "learning_rate": 1.902799297593224e-05,
1988
+ "loss": 0.3389,
1989
+ "step": 13850
1990
+ },
1991
+ {
1992
+ "epoch": 4.307406259683917,
1993
+ "grad_norm": 19.255727767944336,
1994
+ "learning_rate": 1.897634541886169e-05,
1995
+ "loss": 0.3636,
1996
+ "step": 13900
1997
+ },
1998
+ {
1999
+ "epoch": 4.322900526805082,
2000
+ "grad_norm": 10.70836353302002,
2001
+ "learning_rate": 1.8924697861791138e-05,
2002
+ "loss": 0.3455,
2003
+ "step": 13950
2004
+ },
2005
+ {
2006
+ "epoch": 4.3383947939262475,
2007
+ "grad_norm": 0.9212763905525208,
2008
+ "learning_rate": 1.8873050304720584e-05,
2009
+ "loss": 0.3703,
2010
+ "step": 14000
2011
+ },
2012
+ {
2013
+ "epoch": 4.353889061047412,
2014
+ "grad_norm": 10.232623100280762,
2015
+ "learning_rate": 1.8821402747650037e-05,
2016
+ "loss": 0.3247,
2017
+ "step": 14050
2018
+ },
2019
+ {
2020
+ "epoch": 4.369383328168578,
2021
+ "grad_norm": 11.130922317504883,
2022
+ "learning_rate": 1.8769755190579487e-05,
2023
+ "loss": 0.3107,
2024
+ "step": 14100
2025
+ },
2026
+ {
2027
+ "epoch": 4.384877595289743,
2028
+ "grad_norm": 10.536752700805664,
2029
+ "learning_rate": 1.8718107633508937e-05,
2030
+ "loss": 0.3614,
2031
+ "step": 14150
2032
+ },
2033
+ {
2034
+ "epoch": 4.400371862410908,
2035
+ "grad_norm": 15.330968856811523,
2036
+ "learning_rate": 1.8666460076438386e-05,
2037
+ "loss": 0.3984,
2038
+ "step": 14200
2039
+ },
2040
+ {
2041
+ "epoch": 4.415866129532073,
2042
+ "grad_norm": 7.436588764190674,
2043
+ "learning_rate": 1.8614812519367833e-05,
2044
+ "loss": 0.3257,
2045
+ "step": 14250
2046
+ },
2047
+ {
2048
+ "epoch": 4.431360396653238,
2049
+ "grad_norm": 7.192384243011475,
2050
+ "learning_rate": 1.8563164962297282e-05,
2051
+ "loss": 0.3254,
2052
+ "step": 14300
2053
+ },
2054
+ {
2055
+ "epoch": 4.4468546637744035,
2056
+ "grad_norm": 7.792993545532227,
2057
+ "learning_rate": 1.8511517405226732e-05,
2058
+ "loss": 0.3392,
2059
+ "step": 14350
2060
+ },
2061
+ {
2062
+ "epoch": 4.462348930895569,
2063
+ "grad_norm": 12.411416053771973,
2064
+ "learning_rate": 1.8459869848156182e-05,
2065
+ "loss": 0.3383,
2066
+ "step": 14400
2067
+ },
2068
+ {
2069
+ "epoch": 4.477843198016734,
2070
+ "grad_norm": 17.897613525390625,
2071
+ "learning_rate": 1.8408222291085635e-05,
2072
+ "loss": 0.3392,
2073
+ "step": 14450
2074
+ },
2075
+ {
2076
+ "epoch": 4.493337465137899,
2077
+ "grad_norm": 23.59228515625,
2078
+ "learning_rate": 1.835657473401508e-05,
2079
+ "loss": 0.3055,
2080
+ "step": 14500
2081
+ },
2082
+ {
2083
+ "epoch": 4.508831732259064,
2084
+ "grad_norm": 13.722383499145508,
2085
+ "learning_rate": 1.830492717694453e-05,
2086
+ "loss": 0.3997,
2087
+ "step": 14550
2088
+ },
2089
+ {
2090
+ "epoch": 4.524325999380229,
2091
+ "grad_norm": 17.811538696289062,
2092
+ "learning_rate": 1.825327961987398e-05,
2093
+ "loss": 0.266,
2094
+ "step": 14600
2095
+ },
2096
+ {
2097
+ "epoch": 4.539820266501394,
2098
+ "grad_norm": 10.993431091308594,
2099
+ "learning_rate": 1.820163206280343e-05,
2100
+ "loss": 0.2634,
2101
+ "step": 14650
2102
+ },
2103
+ {
2104
+ "epoch": 4.55531453362256,
2105
+ "grad_norm": 5.25628137588501,
2106
+ "learning_rate": 1.814998450573288e-05,
2107
+ "loss": 0.3563,
2108
+ "step": 14700
2109
+ },
2110
+ {
2111
+ "epoch": 4.570808800743725,
2112
+ "grad_norm": 16.91241455078125,
2113
+ "learning_rate": 1.8098336948662326e-05,
2114
+ "loss": 0.3298,
2115
+ "step": 14750
2116
+ },
2117
+ {
2118
+ "epoch": 4.58630306786489,
2119
+ "grad_norm": 27.083995819091797,
2120
+ "learning_rate": 1.8046689391591776e-05,
2121
+ "loss": 0.36,
2122
+ "step": 14800
2123
+ },
2124
+ {
2125
+ "epoch": 4.601797334986055,
2126
+ "grad_norm": 19.726198196411133,
2127
+ "learning_rate": 1.799504183452123e-05,
2128
+ "loss": 0.3224,
2129
+ "step": 14850
2130
+ },
2131
+ {
2132
+ "epoch": 4.617291602107221,
2133
+ "grad_norm": 6.92859411239624,
2134
+ "learning_rate": 1.794339427745068e-05,
2135
+ "loss": 0.335,
2136
+ "step": 14900
2137
+ },
2138
+ {
2139
+ "epoch": 4.632785869228385,
2140
+ "grad_norm": 15.97644329071045,
2141
+ "learning_rate": 1.7891746720380128e-05,
2142
+ "loss": 0.3303,
2143
+ "step": 14950
2144
+ },
2145
+ {
2146
+ "epoch": 4.648280136349551,
2147
+ "grad_norm": 24.399837493896484,
2148
+ "learning_rate": 1.7840099163309575e-05,
2149
+ "loss": 0.3492,
2150
+ "step": 15000
2151
+ },
2152
+ {
2153
+ "epoch": 4.663774403470716,
2154
+ "grad_norm": 10.855368614196777,
2155
+ "learning_rate": 1.7788451606239024e-05,
2156
+ "loss": 0.3278,
2157
+ "step": 15050
2158
+ },
2159
+ {
2160
+ "epoch": 4.679268670591881,
2161
+ "grad_norm": 20.869380950927734,
2162
+ "learning_rate": 1.7736804049168474e-05,
2163
+ "loss": 0.2913,
2164
+ "step": 15100
2165
+ },
2166
+ {
2167
+ "epoch": 4.694762937713046,
2168
+ "grad_norm": 6.862913131713867,
2169
+ "learning_rate": 1.7685156492097924e-05,
2170
+ "loss": 0.3133,
2171
+ "step": 15150
2172
+ },
2173
+ {
2174
+ "epoch": 4.710257204834211,
2175
+ "grad_norm": 19.621482849121094,
2176
+ "learning_rate": 1.7633508935027373e-05,
2177
+ "loss": 0.3456,
2178
+ "step": 15200
2179
+ },
2180
+ {
2181
+ "epoch": 4.725751471955377,
2182
+ "grad_norm": 19.79738998413086,
2183
+ "learning_rate": 1.7581861377956823e-05,
2184
+ "loss": 0.3562,
2185
+ "step": 15250
2186
+ },
2187
+ {
2188
+ "epoch": 4.7412457390765415,
2189
+ "grad_norm": 3.2352957725524902,
2190
+ "learning_rate": 1.7530213820886273e-05,
2191
+ "loss": 0.3565,
2192
+ "step": 15300
2193
+ },
2194
+ {
2195
+ "epoch": 4.756740006197707,
2196
+ "grad_norm": 10.959282875061035,
2197
+ "learning_rate": 1.7478566263815722e-05,
2198
+ "loss": 0.3472,
2199
+ "step": 15350
2200
+ },
2201
+ {
2202
+ "epoch": 4.7722342733188725,
2203
+ "grad_norm": 3.22469162940979,
2204
+ "learning_rate": 1.7426918706745172e-05,
2205
+ "loss": 0.3367,
2206
+ "step": 15400
2207
+ },
2208
+ {
2209
+ "epoch": 4.787728540440037,
2210
+ "grad_norm": 7.619373798370361,
2211
+ "learning_rate": 1.737527114967462e-05,
2212
+ "loss": 0.319,
2213
+ "step": 15450
2214
+ },
2215
+ {
2216
+ "epoch": 4.803222807561203,
2217
+ "grad_norm": 24.706689834594727,
2218
+ "learning_rate": 1.7323623592604068e-05,
2219
+ "loss": 0.3939,
2220
+ "step": 15500
2221
+ },
2222
+ {
2223
+ "epoch": 4.818717074682367,
2224
+ "grad_norm": 15.918986320495605,
2225
+ "learning_rate": 1.7271976035533518e-05,
2226
+ "loss": 0.3618,
2227
+ "step": 15550
2228
+ },
2229
+ {
2230
+ "epoch": 4.834211341803533,
2231
+ "grad_norm": 14.518546104431152,
2232
+ "learning_rate": 1.7220328478462967e-05,
2233
+ "loss": 0.4082,
2234
+ "step": 15600
2235
+ },
2236
+ {
2237
+ "epoch": 4.8497056089246975,
2238
+ "grad_norm": 6.084866046905518,
2239
+ "learning_rate": 1.716868092139242e-05,
2240
+ "loss": 0.3594,
2241
+ "step": 15650
2242
+ },
2243
+ {
2244
+ "epoch": 4.865199876045863,
2245
+ "grad_norm": 18.435983657836914,
2246
+ "learning_rate": 1.711703336432187e-05,
2247
+ "loss": 0.3182,
2248
+ "step": 15700
2249
+ },
2250
+ {
2251
+ "epoch": 4.8806941431670285,
2252
+ "grad_norm": 14.745248794555664,
2253
+ "learning_rate": 1.7065385807251316e-05,
2254
+ "loss": 0.3375,
2255
+ "step": 15750
2256
+ },
2257
+ {
2258
+ "epoch": 4.896188410288193,
2259
+ "grad_norm": 11.518832206726074,
2260
+ "learning_rate": 1.7013738250180766e-05,
2261
+ "loss": 0.3371,
2262
+ "step": 15800
2263
+ },
2264
+ {
2265
+ "epoch": 4.911682677409359,
2266
+ "grad_norm": 17.58115005493164,
2267
+ "learning_rate": 1.6962090693110216e-05,
2268
+ "loss": 0.3851,
2269
+ "step": 15850
2270
+ },
2271
+ {
2272
+ "epoch": 4.927176944530523,
2273
+ "grad_norm": 16.769134521484375,
2274
+ "learning_rate": 1.6910443136039665e-05,
2275
+ "loss": 0.3009,
2276
+ "step": 15900
2277
+ },
2278
+ {
2279
+ "epoch": 4.942671211651689,
2280
+ "grad_norm": 21.518749237060547,
2281
+ "learning_rate": 1.6858795578969115e-05,
2282
+ "loss": 0.3155,
2283
+ "step": 15950
2284
+ },
2285
+ {
2286
+ "epoch": 4.958165478772854,
2287
+ "grad_norm": 11.044340133666992,
2288
+ "learning_rate": 1.6807148021898565e-05,
2289
+ "loss": 0.3648,
2290
+ "step": 16000
2291
+ },
2292
+ {
2293
+ "epoch": 4.973659745894019,
2294
+ "grad_norm": 3.9900588989257812,
2295
+ "learning_rate": 1.6755500464828014e-05,
2296
+ "loss": 0.3383,
2297
+ "step": 16050
2298
+ },
2299
+ {
2300
+ "epoch": 4.989154013015185,
2301
+ "grad_norm": 16.869041442871094,
2302
+ "learning_rate": 1.6703852907757464e-05,
2303
+ "loss": 0.3471,
2304
+ "step": 16100
2305
+ },
2306
+ {
2307
+ "epoch": 5.0,
2308
+ "eval_accuracy": 0.7964162023791598,
2309
+ "eval_f1": 0.7960859271865419,
2310
+ "eval_loss": 0.8064730167388916,
2311
+ "eval_runtime": 25.5199,
2312
+ "eval_samples_per_second": 260.228,
2313
+ "eval_steps_per_second": 16.301,
2314
+ "step": 16135
2315
+ }
2316
+ ],
2317
+ "logging_steps": 50,
2318
+ "max_steps": 32270,
2319
+ "num_input_tokens_seen": 0,
2320
+ "num_train_epochs": 10,
2321
+ "save_steps": 500,
2322
+ "stateful_callbacks": {
2323
+ "EarlyStoppingCallback": {
2324
+ "args": {
2325
+ "early_stopping_patience": 2,
2326
+ "early_stopping_threshold": 0.0
2327
+ },
2328
+ "attributes": {
2329
+ "early_stopping_patience_counter": 0
2330
+ }
2331
+ },
2332
+ "TrainerControl": {
2333
+ "args": {
2334
+ "should_epoch_stop": false,
2335
+ "should_evaluate": false,
2336
+ "should_log": false,
2337
+ "should_save": true,
2338
+ "should_training_stop": false
2339
+ },
2340
+ "attributes": {}
2341
+ }
2342
+ },
2343
+ "total_flos": 3.39622791611904e+16,
2344
+ "train_batch_size": 16,
2345
+ "trial_name": null,
2346
+ "trial_params": null
2347
+ }
checkpoint-16135/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10df86f20d44dbad5bcbd0936a460173513fcbc2b606339a5e32ef490cc53c86
3
+ size 5216
checkpoint-22589/config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ElectraForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "embedding_size": 768,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "\uae30\uc068",
13
+ "1": "\ub2f9\ud669",
14
+ "2": "\ubd84\ub178",
15
+ "3": "\ubd88\uc548",
16
+ "4": "\uc0c1\ucc98",
17
+ "5": "\uc2ac\ud514"
18
+ },
19
+ "initializer_range": 0.02,
20
+ "intermediate_size": 3072,
21
+ "label2id": {
22
+ "\uae30\uc068": 0,
23
+ "\ub2f9\ud669": 1,
24
+ "\ubd84\ub178": 2,
25
+ "\ubd88\uc548": 3,
26
+ "\uc0c1\ucc98": 4,
27
+ "\uc2ac\ud514": 5
28
+ },
29
+ "layer_norm_eps": 1e-12,
30
+ "max_position_embeddings": 512,
31
+ "model_type": "electra",
32
+ "num_attention_heads": 12,
33
+ "num_hidden_layers": 12,
34
+ "pad_token_id": 0,
35
+ "position_embedding_type": "absolute",
36
+ "problem_type": "single_label_classification",
37
+ "summary_activation": "gelu",
38
+ "summary_last_dropout": 0.1,
39
+ "summary_type": "first",
40
+ "summary_use_proj": true,
41
+ "tokenizer_class": "BertTokenizer",
42
+ "torch_dtype": "float32",
43
+ "transformers_version": "4.52.2",
44
+ "type_vocab_size": 2,
45
+ "use_cache": true,
46
+ "vocab_size": 54343
47
+ }
checkpoint-22589/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc7d0b1c232326da9579b888927ef52d71de1e5d37d09bfa1537cf9037d89bae
3
+ size 511149672
checkpoint-22589/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:256ef19d7fa444f94d2250f96a4e1eba05fb3af3461f6f03c0df8847276466fa
3
+ size 1022417532
checkpoint-22589/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78dab76e293cee2463acfd238eca1fc2a3f094fac6f39dea09231aa019a64dbe
3
+ size 14168
checkpoint-22589/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d6e678691ccf4ea8c4b3a9194d4b565a345b929928cce1da7d65cae2362dc0c
3
+ size 1056
checkpoint-22589/trainer_state.json ADDED
@@ -0,0 +1,3270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 16135,
3
+ "best_metric": 0.7960859271865419,
4
+ "best_model_checkpoint": "/content/drive/MyDrive/\uac10\uc815\ubd84\ub958/data/emotion_model/checkpoint-16135",
5
+ "epoch": 7.0,
6
+ "eval_steps": 500,
7
+ "global_step": 22589,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.015494267121165169,
14
+ "grad_norm": 1.9321871995925903,
15
+ "learning_rate": 4.5553145336225596e-07,
16
+ "loss": 1.7919,
17
+ "step": 50
18
+ },
19
+ {
20
+ "epoch": 0.030988534242330338,
21
+ "grad_norm": 1.7712626457214355,
22
+ "learning_rate": 9.203594669972111e-07,
23
+ "loss": 1.7865,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.04648280136349551,
28
+ "grad_norm": 1.9185744524002075,
29
+ "learning_rate": 1.385187480632166e-06,
30
+ "loss": 1.7864,
31
+ "step": 150
32
+ },
33
+ {
34
+ "epoch": 0.061977068484660676,
35
+ "grad_norm": 1.9046003818511963,
36
+ "learning_rate": 1.8500154942671213e-06,
37
+ "loss": 1.7864,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 0.07747133560582585,
42
+ "grad_norm": 2.0914034843444824,
43
+ "learning_rate": 2.3148435079020763e-06,
44
+ "loss": 1.7813,
45
+ "step": 250
46
+ },
47
+ {
48
+ "epoch": 0.09296560272699102,
49
+ "grad_norm": 2.088219165802002,
50
+ "learning_rate": 2.7796715215370313e-06,
51
+ "loss": 1.7773,
52
+ "step": 300
53
+ },
54
+ {
55
+ "epoch": 0.10845986984815618,
56
+ "grad_norm": 2.1377577781677246,
57
+ "learning_rate": 3.2444995351719864e-06,
58
+ "loss": 1.7587,
59
+ "step": 350
60
+ },
61
+ {
62
+ "epoch": 0.12395413696932135,
63
+ "grad_norm": 2.2140750885009766,
64
+ "learning_rate": 3.7093275488069414e-06,
65
+ "loss": 1.7388,
66
+ "step": 400
67
+ },
68
+ {
69
+ "epoch": 0.13944840409048653,
70
+ "grad_norm": 2.1278295516967773,
71
+ "learning_rate": 4.174155562441896e-06,
72
+ "loss": 1.6861,
73
+ "step": 450
74
+ },
75
+ {
76
+ "epoch": 0.1549426712116517,
77
+ "grad_norm": 4.734658241271973,
78
+ "learning_rate": 4.638983576076852e-06,
79
+ "loss": 1.6267,
80
+ "step": 500
81
+ },
82
+ {
83
+ "epoch": 0.17043693833281687,
84
+ "grad_norm": 4.140384197235107,
85
+ "learning_rate": 5.103811589711806e-06,
86
+ "loss": 1.5732,
87
+ "step": 550
88
+ },
89
+ {
90
+ "epoch": 0.18593120545398203,
91
+ "grad_norm": 2.705493688583374,
92
+ "learning_rate": 5.568639603346762e-06,
93
+ "loss": 1.5438,
94
+ "step": 600
95
+ },
96
+ {
97
+ "epoch": 0.2014254725751472,
98
+ "grad_norm": 5.3791422843933105,
99
+ "learning_rate": 6.0334676169817164e-06,
100
+ "loss": 1.4644,
101
+ "step": 650
102
+ },
103
+ {
104
+ "epoch": 0.21691973969631237,
105
+ "grad_norm": 2.8989250659942627,
106
+ "learning_rate": 6.498295630616672e-06,
107
+ "loss": 1.4743,
108
+ "step": 700
109
+ },
110
+ {
111
+ "epoch": 0.23241400681747754,
112
+ "grad_norm": 3.40291166305542,
113
+ "learning_rate": 6.963123644251627e-06,
114
+ "loss": 1.4187,
115
+ "step": 750
116
+ },
117
+ {
118
+ "epoch": 0.2479082739386427,
119
+ "grad_norm": 5.748068809509277,
120
+ "learning_rate": 7.427951657886583e-06,
121
+ "loss": 1.3533,
122
+ "step": 800
123
+ },
124
+ {
125
+ "epoch": 0.26340254105980787,
126
+ "grad_norm": 5.777422904968262,
127
+ "learning_rate": 7.892779671521537e-06,
128
+ "loss": 1.3099,
129
+ "step": 850
130
+ },
131
+ {
132
+ "epoch": 0.27889680818097307,
133
+ "grad_norm": 4.8046112060546875,
134
+ "learning_rate": 8.357607685156493e-06,
135
+ "loss": 1.2723,
136
+ "step": 900
137
+ },
138
+ {
139
+ "epoch": 0.2943910753021382,
140
+ "grad_norm": 5.549858570098877,
141
+ "learning_rate": 8.822435698791447e-06,
142
+ "loss": 1.2325,
143
+ "step": 950
144
+ },
145
+ {
146
+ "epoch": 0.3098853424233034,
147
+ "grad_norm": 6.851742744445801,
148
+ "learning_rate": 9.287263712426402e-06,
149
+ "loss": 1.1884,
150
+ "step": 1000
151
+ },
152
+ {
153
+ "epoch": 0.32537960954446854,
154
+ "grad_norm": 5.2497735023498535,
155
+ "learning_rate": 9.752091726061357e-06,
156
+ "loss": 1.2224,
157
+ "step": 1050
158
+ },
159
+ {
160
+ "epoch": 0.34087387666563373,
161
+ "grad_norm": 7.023674488067627,
162
+ "learning_rate": 1.0216919739696313e-05,
163
+ "loss": 1.177,
164
+ "step": 1100
165
+ },
166
+ {
167
+ "epoch": 0.3563681437867989,
168
+ "grad_norm": 4.888996124267578,
169
+ "learning_rate": 1.0681747753331269e-05,
170
+ "loss": 1.1577,
171
+ "step": 1150
172
+ },
173
+ {
174
+ "epoch": 0.37186241090796407,
175
+ "grad_norm": 6.2133660316467285,
176
+ "learning_rate": 1.1146575766966222e-05,
177
+ "loss": 1.1726,
178
+ "step": 1200
179
+ },
180
+ {
181
+ "epoch": 0.3873566780291292,
182
+ "grad_norm": 6.936697483062744,
183
+ "learning_rate": 1.1611403780601178e-05,
184
+ "loss": 1.1001,
185
+ "step": 1250
186
+ },
187
+ {
188
+ "epoch": 0.4028509451502944,
189
+ "grad_norm": 8.526293754577637,
190
+ "learning_rate": 1.2076231794236133e-05,
191
+ "loss": 1.0752,
192
+ "step": 1300
193
+ },
194
+ {
195
+ "epoch": 0.41834521227145954,
196
+ "grad_norm": 5.5933756828308105,
197
+ "learning_rate": 1.254105980787109e-05,
198
+ "loss": 1.0809,
199
+ "step": 1350
200
+ },
201
+ {
202
+ "epoch": 0.43383947939262474,
203
+ "grad_norm": 6.998812675476074,
204
+ "learning_rate": 1.3005887821506042e-05,
205
+ "loss": 1.0566,
206
+ "step": 1400
207
+ },
208
+ {
209
+ "epoch": 0.4493337465137899,
210
+ "grad_norm": 7.077617645263672,
211
+ "learning_rate": 1.3470715835140998e-05,
212
+ "loss": 1.0993,
213
+ "step": 1450
214
+ },
215
+ {
216
+ "epoch": 0.4648280136349551,
217
+ "grad_norm": 8.715201377868652,
218
+ "learning_rate": 1.3935543848775953e-05,
219
+ "loss": 1.0375,
220
+ "step": 1500
221
+ },
222
+ {
223
+ "epoch": 0.4803222807561202,
224
+ "grad_norm": 6.017217636108398,
225
+ "learning_rate": 1.440037186241091e-05,
226
+ "loss": 1.0388,
227
+ "step": 1550
228
+ },
229
+ {
230
+ "epoch": 0.4958165478772854,
231
+ "grad_norm": 5.349973201751709,
232
+ "learning_rate": 1.4865199876045862e-05,
233
+ "loss": 1.0354,
234
+ "step": 1600
235
+ },
236
+ {
237
+ "epoch": 0.5113108149984505,
238
+ "grad_norm": 12.728338241577148,
239
+ "learning_rate": 1.533002788968082e-05,
240
+ "loss": 1.0314,
241
+ "step": 1650
242
+ },
243
+ {
244
+ "epoch": 0.5268050821196157,
245
+ "grad_norm": 5.962468147277832,
246
+ "learning_rate": 1.5794855903315773e-05,
247
+ "loss": 1.03,
248
+ "step": 1700
249
+ },
250
+ {
251
+ "epoch": 0.5422993492407809,
252
+ "grad_norm": 5.971400260925293,
253
+ "learning_rate": 1.6259683916950726e-05,
254
+ "loss": 1.0541,
255
+ "step": 1750
256
+ },
257
+ {
258
+ "epoch": 0.5577936163619461,
259
+ "grad_norm": 6.260463714599609,
260
+ "learning_rate": 1.6724511930585682e-05,
261
+ "loss": 0.9831,
262
+ "step": 1800
263
+ },
264
+ {
265
+ "epoch": 0.5732878834831112,
266
+ "grad_norm": 7.8115010261535645,
267
+ "learning_rate": 1.718933994422064e-05,
268
+ "loss": 0.966,
269
+ "step": 1850
270
+ },
271
+ {
272
+ "epoch": 0.5887821506042764,
273
+ "grad_norm": 5.005403995513916,
274
+ "learning_rate": 1.7654167957855595e-05,
275
+ "loss": 0.9592,
276
+ "step": 1900
277
+ },
278
+ {
279
+ "epoch": 0.6042764177254416,
280
+ "grad_norm": 7.7732157707214355,
281
+ "learning_rate": 1.8118995971490548e-05,
282
+ "loss": 0.9766,
283
+ "step": 1950
284
+ },
285
+ {
286
+ "epoch": 0.6197706848466068,
287
+ "grad_norm": 7.265392303466797,
288
+ "learning_rate": 1.8583823985125504e-05,
289
+ "loss": 1.0171,
290
+ "step": 2000
291
+ },
292
+ {
293
+ "epoch": 0.6352649519677719,
294
+ "grad_norm": 15.946109771728516,
295
+ "learning_rate": 1.904865199876046e-05,
296
+ "loss": 0.9824,
297
+ "step": 2050
298
+ },
299
+ {
300
+ "epoch": 0.6507592190889371,
301
+ "grad_norm": 7.261445999145508,
302
+ "learning_rate": 1.9513480012395417e-05,
303
+ "loss": 0.9699,
304
+ "step": 2100
305
+ },
306
+ {
307
+ "epoch": 0.6662534862101023,
308
+ "grad_norm": 8.201744079589844,
309
+ "learning_rate": 1.997830802603037e-05,
310
+ "loss": 0.9957,
311
+ "step": 2150
312
+ },
313
+ {
314
+ "epoch": 0.6817477533312675,
315
+ "grad_norm": 6.183067798614502,
316
+ "learning_rate": 2.0443136039665322e-05,
317
+ "loss": 0.8554,
318
+ "step": 2200
319
+ },
320
+ {
321
+ "epoch": 0.6972420204524326,
322
+ "grad_norm": 7.481590270996094,
323
+ "learning_rate": 2.090796405330028e-05,
324
+ "loss": 0.9929,
325
+ "step": 2250
326
+ },
327
+ {
328
+ "epoch": 0.7127362875735977,
329
+ "grad_norm": 7.3274030685424805,
330
+ "learning_rate": 2.1372792066935235e-05,
331
+ "loss": 0.9438,
332
+ "step": 2300
333
+ },
334
+ {
335
+ "epoch": 0.7282305546947629,
336
+ "grad_norm": 11.69247055053711,
337
+ "learning_rate": 2.183762008057019e-05,
338
+ "loss": 0.9767,
339
+ "step": 2350
340
+ },
341
+ {
342
+ "epoch": 0.7437248218159281,
343
+ "grad_norm": 7.929721832275391,
344
+ "learning_rate": 2.2302448094205144e-05,
345
+ "loss": 1.0036,
346
+ "step": 2400
347
+ },
348
+ {
349
+ "epoch": 0.7592190889370932,
350
+ "grad_norm": 9.753717422485352,
351
+ "learning_rate": 2.27672761078401e-05,
352
+ "loss": 0.9726,
353
+ "step": 2450
354
+ },
355
+ {
356
+ "epoch": 0.7747133560582584,
357
+ "grad_norm": 7.797086715698242,
358
+ "learning_rate": 2.3232104121475057e-05,
359
+ "loss": 0.9322,
360
+ "step": 2500
361
+ },
362
+ {
363
+ "epoch": 0.7902076231794236,
364
+ "grad_norm": 6.927332878112793,
365
+ "learning_rate": 2.369693213511001e-05,
366
+ "loss": 0.9378,
367
+ "step": 2550
368
+ },
369
+ {
370
+ "epoch": 0.8057018903005888,
371
+ "grad_norm": 3.726092576980591,
372
+ "learning_rate": 2.4161760148744962e-05,
373
+ "loss": 0.958,
374
+ "step": 2600
375
+ },
376
+ {
377
+ "epoch": 0.821196157421754,
378
+ "grad_norm": 5.661774635314941,
379
+ "learning_rate": 2.462658816237992e-05,
380
+ "loss": 0.9651,
381
+ "step": 2650
382
+ },
383
+ {
384
+ "epoch": 0.8366904245429191,
385
+ "grad_norm": 6.513345718383789,
386
+ "learning_rate": 2.5091416176014875e-05,
387
+ "loss": 0.971,
388
+ "step": 2700
389
+ },
390
+ {
391
+ "epoch": 0.8521846916640843,
392
+ "grad_norm": 6.713255405426025,
393
+ "learning_rate": 2.555624418964983e-05,
394
+ "loss": 0.8616,
395
+ "step": 2750
396
+ },
397
+ {
398
+ "epoch": 0.8676789587852495,
399
+ "grad_norm": 8.527266502380371,
400
+ "learning_rate": 2.6021072203284784e-05,
401
+ "loss": 0.9413,
402
+ "step": 2800
403
+ },
404
+ {
405
+ "epoch": 0.8831732259064147,
406
+ "grad_norm": 6.599502086639404,
407
+ "learning_rate": 2.648590021691974e-05,
408
+ "loss": 0.9985,
409
+ "step": 2850
410
+ },
411
+ {
412
+ "epoch": 0.8986674930275798,
413
+ "grad_norm": 4.0680155754089355,
414
+ "learning_rate": 2.6950728230554697e-05,
415
+ "loss": 0.9415,
416
+ "step": 2900
417
+ },
418
+ {
419
+ "epoch": 0.914161760148745,
420
+ "grad_norm": 5.083493232727051,
421
+ "learning_rate": 2.741555624418965e-05,
422
+ "loss": 0.9805,
423
+ "step": 2950
424
+ },
425
+ {
426
+ "epoch": 0.9296560272699101,
427
+ "grad_norm": 4.0469069480896,
428
+ "learning_rate": 2.7880384257824606e-05,
429
+ "loss": 0.9547,
430
+ "step": 3000
431
+ },
432
+ {
433
+ "epoch": 0.9451502943910753,
434
+ "grad_norm": 6.075752258300781,
435
+ "learning_rate": 2.834521227145956e-05,
436
+ "loss": 0.9623,
437
+ "step": 3050
438
+ },
439
+ {
440
+ "epoch": 0.9606445615122404,
441
+ "grad_norm": 6.5252299308776855,
442
+ "learning_rate": 2.8810040285094515e-05,
443
+ "loss": 0.9838,
444
+ "step": 3100
445
+ },
446
+ {
447
+ "epoch": 0.9761388286334056,
448
+ "grad_norm": 6.530562877655029,
449
+ "learning_rate": 2.927486829872947e-05,
450
+ "loss": 0.911,
451
+ "step": 3150
452
+ },
453
+ {
454
+ "epoch": 0.9916330957545708,
455
+ "grad_norm": 8.217161178588867,
456
+ "learning_rate": 2.9739696312364428e-05,
457
+ "loss": 0.9457,
458
+ "step": 3200
459
+ },
460
+ {
461
+ "epoch": 1.0,
462
+ "eval_accuracy": 0.7087787983737389,
463
+ "eval_f1": 0.7069046924545164,
464
+ "eval_loss": 0.8573769330978394,
465
+ "eval_runtime": 25.5149,
466
+ "eval_samples_per_second": 260.279,
467
+ "eval_steps_per_second": 16.304,
468
+ "step": 3227
469
+ },
470
+ {
471
+ "epoch": 1.007127362875736,
472
+ "grad_norm": 4.367455005645752,
473
+ "learning_rate": 2.997727507488896e-05,
474
+ "loss": 0.9143,
475
+ "step": 3250
476
+ },
477
+ {
478
+ "epoch": 1.022621629996901,
479
+ "grad_norm": 4.636725902557373,
480
+ "learning_rate": 2.9925627517818407e-05,
481
+ "loss": 0.9304,
482
+ "step": 3300
483
+ },
484
+ {
485
+ "epoch": 1.0381158971180664,
486
+ "grad_norm": 4.251437664031982,
487
+ "learning_rate": 2.9873979960747857e-05,
488
+ "loss": 0.8549,
489
+ "step": 3350
490
+ },
491
+ {
492
+ "epoch": 1.0536101642392315,
493
+ "grad_norm": 6.648655414581299,
494
+ "learning_rate": 2.9822332403677307e-05,
495
+ "loss": 0.8698,
496
+ "step": 3400
497
+ },
498
+ {
499
+ "epoch": 1.0691044313603966,
500
+ "grad_norm": 7.102205276489258,
501
+ "learning_rate": 2.9770684846606757e-05,
502
+ "loss": 0.8597,
503
+ "step": 3450
504
+ },
505
+ {
506
+ "epoch": 1.0845986984815619,
507
+ "grad_norm": 10.821270942687988,
508
+ "learning_rate": 2.9719037289536206e-05,
509
+ "loss": 0.8457,
510
+ "step": 3500
511
+ },
512
+ {
513
+ "epoch": 1.100092965602727,
514
+ "grad_norm": 6.111588001251221,
515
+ "learning_rate": 2.9667389732465652e-05,
516
+ "loss": 0.8973,
517
+ "step": 3550
518
+ },
519
+ {
520
+ "epoch": 1.1155872327238923,
521
+ "grad_norm": 9.016953468322754,
522
+ "learning_rate": 2.9615742175395106e-05,
523
+ "loss": 0.8228,
524
+ "step": 3600
525
+ },
526
+ {
527
+ "epoch": 1.1310814998450573,
528
+ "grad_norm": 7.717069625854492,
529
+ "learning_rate": 2.9564094618324555e-05,
530
+ "loss": 0.8103,
531
+ "step": 3650
532
+ },
533
+ {
534
+ "epoch": 1.1465757669662224,
535
+ "grad_norm": 7.848579406738281,
536
+ "learning_rate": 2.9512447061254005e-05,
537
+ "loss": 0.8097,
538
+ "step": 3700
539
+ },
540
+ {
541
+ "epoch": 1.1620700340873877,
542
+ "grad_norm": 4.738124847412109,
543
+ "learning_rate": 2.9460799504183455e-05,
544
+ "loss": 0.9105,
545
+ "step": 3750
546
+ },
547
+ {
548
+ "epoch": 1.1775643012085528,
549
+ "grad_norm": 5.289875507354736,
550
+ "learning_rate": 2.94091519471129e-05,
551
+ "loss": 0.8876,
552
+ "step": 3800
553
+ },
554
+ {
555
+ "epoch": 1.1930585683297181,
556
+ "grad_norm": 6.445308685302734,
557
+ "learning_rate": 2.935750439004235e-05,
558
+ "loss": 0.8377,
559
+ "step": 3850
560
+ },
561
+ {
562
+ "epoch": 1.2085528354508832,
563
+ "grad_norm": 4.725327968597412,
564
+ "learning_rate": 2.93058568329718e-05,
565
+ "loss": 0.9332,
566
+ "step": 3900
567
+ },
568
+ {
569
+ "epoch": 1.2240471025720483,
570
+ "grad_norm": 53.85081481933594,
571
+ "learning_rate": 2.925420927590125e-05,
572
+ "loss": 0.8882,
573
+ "step": 3950
574
+ },
575
+ {
576
+ "epoch": 1.2395413696932136,
577
+ "grad_norm": 5.677978515625,
578
+ "learning_rate": 2.9202561718830703e-05,
579
+ "loss": 0.8481,
580
+ "step": 4000
581
+ },
582
+ {
583
+ "epoch": 1.2550356368143787,
584
+ "grad_norm": 3.941765785217285,
585
+ "learning_rate": 2.915091416176015e-05,
586
+ "loss": 0.835,
587
+ "step": 4050
588
+ },
589
+ {
590
+ "epoch": 1.2705299039355438,
591
+ "grad_norm": 8.099725723266602,
592
+ "learning_rate": 2.90992666046896e-05,
593
+ "loss": 0.8322,
594
+ "step": 4100
595
+ },
596
+ {
597
+ "epoch": 1.286024171056709,
598
+ "grad_norm": 6.59591007232666,
599
+ "learning_rate": 2.904761904761905e-05,
600
+ "loss": 0.8809,
601
+ "step": 4150
602
+ },
603
+ {
604
+ "epoch": 1.3015184381778742,
605
+ "grad_norm": 7.200226306915283,
606
+ "learning_rate": 2.8995971490548498e-05,
607
+ "loss": 0.8609,
608
+ "step": 4200
609
+ },
610
+ {
611
+ "epoch": 1.3170127052990392,
612
+ "grad_norm": 4.902937412261963,
613
+ "learning_rate": 2.8944323933477948e-05,
614
+ "loss": 0.8633,
615
+ "step": 4250
616
+ },
617
+ {
618
+ "epoch": 1.3325069724202045,
619
+ "grad_norm": 5.792146682739258,
620
+ "learning_rate": 2.8892676376407394e-05,
621
+ "loss": 0.8684,
622
+ "step": 4300
623
+ },
624
+ {
625
+ "epoch": 1.3480012395413696,
626
+ "grad_norm": 4.636809349060059,
627
+ "learning_rate": 2.8841028819336844e-05,
628
+ "loss": 0.8245,
629
+ "step": 4350
630
+ },
631
+ {
632
+ "epoch": 1.363495506662535,
633
+ "grad_norm": 5.28842306137085,
634
+ "learning_rate": 2.8789381262266297e-05,
635
+ "loss": 0.7859,
636
+ "step": 4400
637
+ },
638
+ {
639
+ "epoch": 1.3789897737837,
640
+ "grad_norm": 4.259128570556641,
641
+ "learning_rate": 2.8737733705195747e-05,
642
+ "loss": 0.8228,
643
+ "step": 4450
644
+ },
645
+ {
646
+ "epoch": 1.394484040904865,
647
+ "grad_norm": 7.914375305175781,
648
+ "learning_rate": 2.8686086148125196e-05,
649
+ "loss": 0.9448,
650
+ "step": 4500
651
+ },
652
+ {
653
+ "epoch": 1.4099783080260304,
654
+ "grad_norm": 7.636547088623047,
655
+ "learning_rate": 2.8634438591054643e-05,
656
+ "loss": 0.8781,
657
+ "step": 4550
658
+ },
659
+ {
660
+ "epoch": 1.4254725751471955,
661
+ "grad_norm": 8.681707382202148,
662
+ "learning_rate": 2.8582791033984092e-05,
663
+ "loss": 0.8367,
664
+ "step": 4600
665
+ },
666
+ {
667
+ "epoch": 1.4409668422683608,
668
+ "grad_norm": 7.864759922027588,
669
+ "learning_rate": 2.8531143476913542e-05,
670
+ "loss": 0.9088,
671
+ "step": 4650
672
+ },
673
+ {
674
+ "epoch": 1.4564611093895259,
675
+ "grad_norm": 4.892348289489746,
676
+ "learning_rate": 2.8479495919842992e-05,
677
+ "loss": 0.8993,
678
+ "step": 4700
679
+ },
680
+ {
681
+ "epoch": 1.471955376510691,
682
+ "grad_norm": 23.208873748779297,
683
+ "learning_rate": 2.842784836277244e-05,
684
+ "loss": 0.8542,
685
+ "step": 4750
686
+ },
687
+ {
688
+ "epoch": 1.4874496436318563,
689
+ "grad_norm": 8.983469009399414,
690
+ "learning_rate": 2.837620080570189e-05,
691
+ "loss": 0.949,
692
+ "step": 4800
693
+ },
694
+ {
695
+ "epoch": 1.5029439107530214,
696
+ "grad_norm": 10.706644058227539,
697
+ "learning_rate": 2.832455324863134e-05,
698
+ "loss": 0.9008,
699
+ "step": 4850
700
+ },
701
+ {
702
+ "epoch": 1.5184381778741867,
703
+ "grad_norm": 4.685935020446777,
704
+ "learning_rate": 2.827290569156079e-05,
705
+ "loss": 0.8613,
706
+ "step": 4900
707
+ },
708
+ {
709
+ "epoch": 1.5339324449953518,
710
+ "grad_norm": 5.286406993865967,
711
+ "learning_rate": 2.822125813449024e-05,
712
+ "loss": 0.8929,
713
+ "step": 4950
714
+ },
715
+ {
716
+ "epoch": 1.5494267121165168,
717
+ "grad_norm": 4.907707691192627,
718
+ "learning_rate": 2.816961057741969e-05,
719
+ "loss": 0.8321,
720
+ "step": 5000
721
+ },
722
+ {
723
+ "epoch": 1.564920979237682,
724
+ "grad_norm": 6.398087501525879,
725
+ "learning_rate": 2.8117963020349136e-05,
726
+ "loss": 0.8626,
727
+ "step": 5050
728
+ },
729
+ {
730
+ "epoch": 1.5804152463588472,
731
+ "grad_norm": 5.323617458343506,
732
+ "learning_rate": 2.8066315463278586e-05,
733
+ "loss": 0.8324,
734
+ "step": 5100
735
+ },
736
+ {
737
+ "epoch": 1.5959095134800125,
738
+ "grad_norm": 4.136271953582764,
739
+ "learning_rate": 2.8014667906208035e-05,
740
+ "loss": 0.879,
741
+ "step": 5150
742
+ },
743
+ {
744
+ "epoch": 1.6114037806011776,
745
+ "grad_norm": 6.873619556427002,
746
+ "learning_rate": 2.796302034913749e-05,
747
+ "loss": 0.9043,
748
+ "step": 5200
749
+ },
750
+ {
751
+ "epoch": 1.6268980477223427,
752
+ "grad_norm": 7.138693809509277,
753
+ "learning_rate": 2.7911372792066938e-05,
754
+ "loss": 0.8183,
755
+ "step": 5250
756
+ },
757
+ {
758
+ "epoch": 1.6423923148435078,
759
+ "grad_norm": 6.483767032623291,
760
+ "learning_rate": 2.7859725234996384e-05,
761
+ "loss": 0.8867,
762
+ "step": 5300
763
+ },
764
+ {
765
+ "epoch": 1.657886581964673,
766
+ "grad_norm": 3.2249104976654053,
767
+ "learning_rate": 2.7808077677925834e-05,
768
+ "loss": 0.8097,
769
+ "step": 5350
770
+ },
771
+ {
772
+ "epoch": 1.6733808490858384,
773
+ "grad_norm": 6.961575984954834,
774
+ "learning_rate": 2.7756430120855284e-05,
775
+ "loss": 0.8364,
776
+ "step": 5400
777
+ },
778
+ {
779
+ "epoch": 1.6888751162070035,
780
+ "grad_norm": 7.0920000076293945,
781
+ "learning_rate": 2.7704782563784733e-05,
782
+ "loss": 0.8283,
783
+ "step": 5450
784
+ },
785
+ {
786
+ "epoch": 1.7043693833281686,
787
+ "grad_norm": 5.436604976654053,
788
+ "learning_rate": 2.7653135006714183e-05,
789
+ "loss": 0.8786,
790
+ "step": 5500
791
+ },
792
+ {
793
+ "epoch": 1.7198636504493336,
794
+ "grad_norm": 4.0141282081604,
795
+ "learning_rate": 2.760148744964363e-05,
796
+ "loss": 0.8452,
797
+ "step": 5550
798
+ },
799
+ {
800
+ "epoch": 1.735357917570499,
801
+ "grad_norm": 5.783074378967285,
802
+ "learning_rate": 2.7549839892573083e-05,
803
+ "loss": 0.8168,
804
+ "step": 5600
805
+ },
806
+ {
807
+ "epoch": 1.750852184691664,
808
+ "grad_norm": 7.773756504058838,
809
+ "learning_rate": 2.7498192335502532e-05,
810
+ "loss": 0.8598,
811
+ "step": 5650
812
+ },
813
+ {
814
+ "epoch": 1.7663464518128293,
815
+ "grad_norm": 5.375339984893799,
816
+ "learning_rate": 2.7446544778431982e-05,
817
+ "loss": 0.8366,
818
+ "step": 5700
819
+ },
820
+ {
821
+ "epoch": 1.7818407189339944,
822
+ "grad_norm": 4.240859031677246,
823
+ "learning_rate": 2.739489722136143e-05,
824
+ "loss": 0.8136,
825
+ "step": 5750
826
+ },
827
+ {
828
+ "epoch": 1.7973349860551595,
829
+ "grad_norm": 6.107599258422852,
830
+ "learning_rate": 2.734324966429088e-05,
831
+ "loss": 0.8074,
832
+ "step": 5800
833
+ },
834
+ {
835
+ "epoch": 1.8128292531763246,
836
+ "grad_norm": 6.027589797973633,
837
+ "learning_rate": 2.7291602107220328e-05,
838
+ "loss": 0.7808,
839
+ "step": 5850
840
+ },
841
+ {
842
+ "epoch": 1.82832352029749,
843
+ "grad_norm": 4.829204559326172,
844
+ "learning_rate": 2.7239954550149777e-05,
845
+ "loss": 0.8473,
846
+ "step": 5900
847
+ },
848
+ {
849
+ "epoch": 1.8438177874186552,
850
+ "grad_norm": 5.385358810424805,
851
+ "learning_rate": 2.7188306993079227e-05,
852
+ "loss": 0.844,
853
+ "step": 5950
854
+ },
855
+ {
856
+ "epoch": 1.8593120545398203,
857
+ "grad_norm": 5.991063594818115,
858
+ "learning_rate": 2.713665943600868e-05,
859
+ "loss": 0.8667,
860
+ "step": 6000
861
+ },
862
+ {
863
+ "epoch": 1.8748063216609854,
864
+ "grad_norm": 4.269604682922363,
865
+ "learning_rate": 2.708501187893813e-05,
866
+ "loss": 0.8987,
867
+ "step": 6050
868
+ },
869
+ {
870
+ "epoch": 1.8903005887821505,
871
+ "grad_norm": 6.90878438949585,
872
+ "learning_rate": 2.7033364321867576e-05,
873
+ "loss": 0.8517,
874
+ "step": 6100
875
+ },
876
+ {
877
+ "epoch": 1.9057948559033158,
878
+ "grad_norm": 8.742233276367188,
879
+ "learning_rate": 2.6981716764797026e-05,
880
+ "loss": 0.8729,
881
+ "step": 6150
882
+ },
883
+ {
884
+ "epoch": 1.921289123024481,
885
+ "grad_norm": 9.10084342956543,
886
+ "learning_rate": 2.6930069207726475e-05,
887
+ "loss": 0.8803,
888
+ "step": 6200
889
+ },
890
+ {
891
+ "epoch": 1.9367833901456462,
892
+ "grad_norm": 4.210537433624268,
893
+ "learning_rate": 2.6878421650655925e-05,
894
+ "loss": 0.7938,
895
+ "step": 6250
896
+ },
897
+ {
898
+ "epoch": 1.9522776572668112,
899
+ "grad_norm": 6.604791641235352,
900
+ "learning_rate": 2.6826774093585375e-05,
901
+ "loss": 0.7958,
902
+ "step": 6300
903
+ },
904
+ {
905
+ "epoch": 1.9677719243879763,
906
+ "grad_norm": 6.213857173919678,
907
+ "learning_rate": 2.677512653651482e-05,
908
+ "loss": 0.8463,
909
+ "step": 6350
910
+ },
911
+ {
912
+ "epoch": 1.9832661915091416,
913
+ "grad_norm": 4.303800582885742,
914
+ "learning_rate": 2.6723478979444274e-05,
915
+ "loss": 0.7909,
916
+ "step": 6400
917
+ },
918
+ {
919
+ "epoch": 1.998760458630307,
920
+ "grad_norm": 4.933095932006836,
921
+ "learning_rate": 2.6671831422373724e-05,
922
+ "loss": 0.7888,
923
+ "step": 6450
924
+ },
925
+ {
926
+ "epoch": 2.0,
927
+ "eval_accuracy": 0.7590724288510766,
928
+ "eval_f1": 0.7587245577707713,
929
+ "eval_loss": 0.7010347247123718,
930
+ "eval_runtime": 25.4199,
931
+ "eval_samples_per_second": 261.252,
932
+ "eval_steps_per_second": 16.365,
933
+ "step": 6454
934
+ },
935
+ {
936
+ "epoch": 2.014254725751472,
937
+ "grad_norm": 4.0570244789123535,
938
+ "learning_rate": 2.6620183865303173e-05,
939
+ "loss": 0.7236,
940
+ "step": 6500
941
+ },
942
+ {
943
+ "epoch": 2.029748992872637,
944
+ "grad_norm": 5.307652473449707,
945
+ "learning_rate": 2.6568536308232623e-05,
946
+ "loss": 0.7213,
947
+ "step": 6550
948
+ },
949
+ {
950
+ "epoch": 2.045243259993802,
951
+ "grad_norm": 5.398072719573975,
952
+ "learning_rate": 2.651688875116207e-05,
953
+ "loss": 0.6839,
954
+ "step": 6600
955
+ },
956
+ {
957
+ "epoch": 2.0607375271149673,
958
+ "grad_norm": 5.296418190002441,
959
+ "learning_rate": 2.646524119409152e-05,
960
+ "loss": 0.6856,
961
+ "step": 6650
962
+ },
963
+ {
964
+ "epoch": 2.076231794236133,
965
+ "grad_norm": 4.173377990722656,
966
+ "learning_rate": 2.641359363702097e-05,
967
+ "loss": 0.7109,
968
+ "step": 6700
969
+ },
970
+ {
971
+ "epoch": 2.091726061357298,
972
+ "grad_norm": 5.590676784515381,
973
+ "learning_rate": 2.636194607995042e-05,
974
+ "loss": 0.6814,
975
+ "step": 6750
976
+ },
977
+ {
978
+ "epoch": 2.107220328478463,
979
+ "grad_norm": 8.112780570983887,
980
+ "learning_rate": 2.631029852287987e-05,
981
+ "loss": 0.7302,
982
+ "step": 6800
983
+ },
984
+ {
985
+ "epoch": 2.122714595599628,
986
+ "grad_norm": 6.514364242553711,
987
+ "learning_rate": 2.6258650965809318e-05,
988
+ "loss": 0.6917,
989
+ "step": 6850
990
+ },
991
+ {
992
+ "epoch": 2.138208862720793,
993
+ "grad_norm": 8.156841278076172,
994
+ "learning_rate": 2.6207003408738767e-05,
995
+ "loss": 0.6568,
996
+ "step": 6900
997
+ },
998
+ {
999
+ "epoch": 2.1537031298419587,
1000
+ "grad_norm": 7.641481876373291,
1001
+ "learning_rate": 2.6155355851668217e-05,
1002
+ "loss": 0.6132,
1003
+ "step": 6950
1004
+ },
1005
+ {
1006
+ "epoch": 2.1691973969631237,
1007
+ "grad_norm": 6.33613395690918,
1008
+ "learning_rate": 2.6103708294597667e-05,
1009
+ "loss": 0.6393,
1010
+ "step": 7000
1011
+ },
1012
+ {
1013
+ "epoch": 2.184691664084289,
1014
+ "grad_norm": 4.2916436195373535,
1015
+ "learning_rate": 2.6052060737527116e-05,
1016
+ "loss": 0.6709,
1017
+ "step": 7050
1018
+ },
1019
+ {
1020
+ "epoch": 2.200185931205454,
1021
+ "grad_norm": 4.763488292694092,
1022
+ "learning_rate": 2.6000413180456563e-05,
1023
+ "loss": 0.6919,
1024
+ "step": 7100
1025
+ },
1026
+ {
1027
+ "epoch": 2.215680198326619,
1028
+ "grad_norm": 8.614394187927246,
1029
+ "learning_rate": 2.5948765623386012e-05,
1030
+ "loss": 0.6501,
1031
+ "step": 7150
1032
+ },
1033
+ {
1034
+ "epoch": 2.2311744654477845,
1035
+ "grad_norm": 9.684426307678223,
1036
+ "learning_rate": 2.5897118066315465e-05,
1037
+ "loss": 0.6947,
1038
+ "step": 7200
1039
+ },
1040
+ {
1041
+ "epoch": 2.2466687325689496,
1042
+ "grad_norm": 6.210818767547607,
1043
+ "learning_rate": 2.5845470509244915e-05,
1044
+ "loss": 0.6873,
1045
+ "step": 7250
1046
+ },
1047
+ {
1048
+ "epoch": 2.2621629996901147,
1049
+ "grad_norm": 6.774372577667236,
1050
+ "learning_rate": 2.5793822952174365e-05,
1051
+ "loss": 0.7195,
1052
+ "step": 7300
1053
+ },
1054
+ {
1055
+ "epoch": 2.27765726681128,
1056
+ "grad_norm": 6.014688491821289,
1057
+ "learning_rate": 2.574217539510381e-05,
1058
+ "loss": 0.6298,
1059
+ "step": 7350
1060
+ },
1061
+ {
1062
+ "epoch": 2.293151533932445,
1063
+ "grad_norm": 14.994784355163574,
1064
+ "learning_rate": 2.569052783803326e-05,
1065
+ "loss": 0.7403,
1066
+ "step": 7400
1067
+ },
1068
+ {
1069
+ "epoch": 2.3086458010536104,
1070
+ "grad_norm": 6.315488815307617,
1071
+ "learning_rate": 2.563888028096271e-05,
1072
+ "loss": 0.6679,
1073
+ "step": 7450
1074
+ },
1075
+ {
1076
+ "epoch": 2.3241400681747755,
1077
+ "grad_norm": 8.482314109802246,
1078
+ "learning_rate": 2.558723272389216e-05,
1079
+ "loss": 0.7173,
1080
+ "step": 7500
1081
+ },
1082
+ {
1083
+ "epoch": 2.3396343352959406,
1084
+ "grad_norm": 10.161298751831055,
1085
+ "learning_rate": 2.553558516682161e-05,
1086
+ "loss": 0.732,
1087
+ "step": 7550
1088
+ },
1089
+ {
1090
+ "epoch": 2.3551286024171056,
1091
+ "grad_norm": 6.758267402648926,
1092
+ "learning_rate": 2.548393760975106e-05,
1093
+ "loss": 0.6192,
1094
+ "step": 7600
1095
+ },
1096
+ {
1097
+ "epoch": 2.3706228695382707,
1098
+ "grad_norm": 4.528532981872559,
1099
+ "learning_rate": 2.543229005268051e-05,
1100
+ "loss": 0.7614,
1101
+ "step": 7650
1102
+ },
1103
+ {
1104
+ "epoch": 2.3861171366594363,
1105
+ "grad_norm": 6.397975921630859,
1106
+ "learning_rate": 2.538064249560996e-05,
1107
+ "loss": 0.6951,
1108
+ "step": 7700
1109
+ },
1110
+ {
1111
+ "epoch": 2.4016114037806013,
1112
+ "grad_norm": 5.440258979797363,
1113
+ "learning_rate": 2.532899493853941e-05,
1114
+ "loss": 0.7188,
1115
+ "step": 7750
1116
+ },
1117
+ {
1118
+ "epoch": 2.4171056709017664,
1119
+ "grad_norm": 2.4531173706054688,
1120
+ "learning_rate": 2.5277347381468858e-05,
1121
+ "loss": 0.6347,
1122
+ "step": 7800
1123
+ },
1124
+ {
1125
+ "epoch": 2.4325999380229315,
1126
+ "grad_norm": 15.269991874694824,
1127
+ "learning_rate": 2.5225699824398304e-05,
1128
+ "loss": 0.6601,
1129
+ "step": 7850
1130
+ },
1131
+ {
1132
+ "epoch": 2.4480942051440966,
1133
+ "grad_norm": 6.438554286956787,
1134
+ "learning_rate": 2.5174052267327754e-05,
1135
+ "loss": 0.698,
1136
+ "step": 7900
1137
+ },
1138
+ {
1139
+ "epoch": 2.4635884722652617,
1140
+ "grad_norm": 8.922213554382324,
1141
+ "learning_rate": 2.5122404710257204e-05,
1142
+ "loss": 0.6958,
1143
+ "step": 7950
1144
+ },
1145
+ {
1146
+ "epoch": 2.479082739386427,
1147
+ "grad_norm": 6.724533557891846,
1148
+ "learning_rate": 2.5070757153186657e-05,
1149
+ "loss": 0.7131,
1150
+ "step": 8000
1151
+ },
1152
+ {
1153
+ "epoch": 2.4945770065075923,
1154
+ "grad_norm": 5.617169380187988,
1155
+ "learning_rate": 2.5019109596116107e-05,
1156
+ "loss": 0.7711,
1157
+ "step": 8050
1158
+ },
1159
+ {
1160
+ "epoch": 2.5100712736287574,
1161
+ "grad_norm": 6.441185474395752,
1162
+ "learning_rate": 2.4967462039045553e-05,
1163
+ "loss": 0.6612,
1164
+ "step": 8100
1165
+ },
1166
+ {
1167
+ "epoch": 2.5255655407499225,
1168
+ "grad_norm": 6.033916473388672,
1169
+ "learning_rate": 2.4915814481975003e-05,
1170
+ "loss": 0.698,
1171
+ "step": 8150
1172
+ },
1173
+ {
1174
+ "epoch": 2.5410598078710875,
1175
+ "grad_norm": 6.174665451049805,
1176
+ "learning_rate": 2.4864166924904452e-05,
1177
+ "loss": 0.6968,
1178
+ "step": 8200
1179
+ },
1180
+ {
1181
+ "epoch": 2.5565540749922526,
1182
+ "grad_norm": 20.01167869567871,
1183
+ "learning_rate": 2.4812519367833902e-05,
1184
+ "loss": 0.6456,
1185
+ "step": 8250
1186
+ },
1187
+ {
1188
+ "epoch": 2.572048342113418,
1189
+ "grad_norm": 10.404682159423828,
1190
+ "learning_rate": 2.476087181076335e-05,
1191
+ "loss": 0.6808,
1192
+ "step": 8300
1193
+ },
1194
+ {
1195
+ "epoch": 2.5875426092345832,
1196
+ "grad_norm": 5.160488128662109,
1197
+ "learning_rate": 2.47092242536928e-05,
1198
+ "loss": 0.6913,
1199
+ "step": 8350
1200
+ },
1201
+ {
1202
+ "epoch": 2.6030368763557483,
1203
+ "grad_norm": 6.452591896057129,
1204
+ "learning_rate": 2.465757669662225e-05,
1205
+ "loss": 0.6594,
1206
+ "step": 8400
1207
+ },
1208
+ {
1209
+ "epoch": 2.6185311434769134,
1210
+ "grad_norm": 12.436300277709961,
1211
+ "learning_rate": 2.46059291395517e-05,
1212
+ "loss": 0.7255,
1213
+ "step": 8450
1214
+ },
1215
+ {
1216
+ "epoch": 2.6340254105980785,
1217
+ "grad_norm": 6.132791042327881,
1218
+ "learning_rate": 2.455428158248115e-05,
1219
+ "loss": 0.6753,
1220
+ "step": 8500
1221
+ },
1222
+ {
1223
+ "epoch": 2.649519677719244,
1224
+ "grad_norm": 10.712909698486328,
1225
+ "learning_rate": 2.45026340254106e-05,
1226
+ "loss": 0.6445,
1227
+ "step": 8550
1228
+ },
1229
+ {
1230
+ "epoch": 2.665013944840409,
1231
+ "grad_norm": 12.122429847717285,
1232
+ "learning_rate": 2.445098646834005e-05,
1233
+ "loss": 0.6424,
1234
+ "step": 8600
1235
+ },
1236
+ {
1237
+ "epoch": 2.680508211961574,
1238
+ "grad_norm": 8.575897216796875,
1239
+ "learning_rate": 2.4399338911269496e-05,
1240
+ "loss": 0.7242,
1241
+ "step": 8650
1242
+ },
1243
+ {
1244
+ "epoch": 2.6960024790827393,
1245
+ "grad_norm": 8.740906715393066,
1246
+ "learning_rate": 2.4347691354198946e-05,
1247
+ "loss": 0.6949,
1248
+ "step": 8700
1249
+ },
1250
+ {
1251
+ "epoch": 2.7114967462039044,
1252
+ "grad_norm": 4.871994972229004,
1253
+ "learning_rate": 2.4296043797128395e-05,
1254
+ "loss": 0.787,
1255
+ "step": 8750
1256
+ },
1257
+ {
1258
+ "epoch": 2.72699101332507,
1259
+ "grad_norm": 6.642944812774658,
1260
+ "learning_rate": 2.424439624005785e-05,
1261
+ "loss": 0.6925,
1262
+ "step": 8800
1263
+ },
1264
+ {
1265
+ "epoch": 2.742485280446235,
1266
+ "grad_norm": 12.149236679077148,
1267
+ "learning_rate": 2.4192748682987298e-05,
1268
+ "loss": 0.6972,
1269
+ "step": 8850
1270
+ },
1271
+ {
1272
+ "epoch": 2.7579795475674,
1273
+ "grad_norm": 8.100613594055176,
1274
+ "learning_rate": 2.4141101125916744e-05,
1275
+ "loss": 0.7358,
1276
+ "step": 8900
1277
+ },
1278
+ {
1279
+ "epoch": 2.773473814688565,
1280
+ "grad_norm": 12.28987979888916,
1281
+ "learning_rate": 2.4089453568846194e-05,
1282
+ "loss": 0.7176,
1283
+ "step": 8950
1284
+ },
1285
+ {
1286
+ "epoch": 2.78896808180973,
1287
+ "grad_norm": 9.355488777160645,
1288
+ "learning_rate": 2.4037806011775644e-05,
1289
+ "loss": 0.6856,
1290
+ "step": 9000
1291
+ },
1292
+ {
1293
+ "epoch": 2.8044623489308957,
1294
+ "grad_norm": 11.875406265258789,
1295
+ "learning_rate": 2.3986158454705093e-05,
1296
+ "loss": 0.6501,
1297
+ "step": 9050
1298
+ },
1299
+ {
1300
+ "epoch": 2.819956616052061,
1301
+ "grad_norm": 8.061235427856445,
1302
+ "learning_rate": 2.3934510897634543e-05,
1303
+ "loss": 0.6823,
1304
+ "step": 9100
1305
+ },
1306
+ {
1307
+ "epoch": 2.835450883173226,
1308
+ "grad_norm": 7.949320316314697,
1309
+ "learning_rate": 2.388286334056399e-05,
1310
+ "loss": 0.6764,
1311
+ "step": 9150
1312
+ },
1313
+ {
1314
+ "epoch": 2.850945150294391,
1315
+ "grad_norm": 5.9249587059021,
1316
+ "learning_rate": 2.3831215783493442e-05,
1317
+ "loss": 0.6511,
1318
+ "step": 9200
1319
+ },
1320
+ {
1321
+ "epoch": 2.866439417415556,
1322
+ "grad_norm": 8.400185585021973,
1323
+ "learning_rate": 2.3779568226422892e-05,
1324
+ "loss": 0.6515,
1325
+ "step": 9250
1326
+ },
1327
+ {
1328
+ "epoch": 2.8819336845367216,
1329
+ "grad_norm": 11.487894058227539,
1330
+ "learning_rate": 2.3727920669352342e-05,
1331
+ "loss": 0.6719,
1332
+ "step": 9300
1333
+ },
1334
+ {
1335
+ "epoch": 2.8974279516578867,
1336
+ "grad_norm": 8.317901611328125,
1337
+ "learning_rate": 2.367627311228179e-05,
1338
+ "loss": 0.6697,
1339
+ "step": 9350
1340
+ },
1341
+ {
1342
+ "epoch": 2.9129222187790518,
1343
+ "grad_norm": 9.878332138061523,
1344
+ "learning_rate": 2.3624625555211238e-05,
1345
+ "loss": 0.6801,
1346
+ "step": 9400
1347
+ },
1348
+ {
1349
+ "epoch": 2.928416485900217,
1350
+ "grad_norm": 8.855628967285156,
1351
+ "learning_rate": 2.3572977998140687e-05,
1352
+ "loss": 0.6445,
1353
+ "step": 9450
1354
+ },
1355
+ {
1356
+ "epoch": 2.943910753021382,
1357
+ "grad_norm": 5.350094318389893,
1358
+ "learning_rate": 2.3521330441070137e-05,
1359
+ "loss": 0.6891,
1360
+ "step": 9500
1361
+ },
1362
+ {
1363
+ "epoch": 2.9594050201425475,
1364
+ "grad_norm": 8.540812492370605,
1365
+ "learning_rate": 2.3469682883999587e-05,
1366
+ "loss": 0.6556,
1367
+ "step": 9550
1368
+ },
1369
+ {
1370
+ "epoch": 2.9748992872637126,
1371
+ "grad_norm": 4.337664604187012,
1372
+ "learning_rate": 2.341803532692904e-05,
1373
+ "loss": 0.7013,
1374
+ "step": 9600
1375
+ },
1376
+ {
1377
+ "epoch": 2.9903935543848776,
1378
+ "grad_norm": 7.002617359161377,
1379
+ "learning_rate": 2.3366387769858486e-05,
1380
+ "loss": 0.6518,
1381
+ "step": 9650
1382
+ },
1383
+ {
1384
+ "epoch": 3.0,
1385
+ "eval_accuracy": 0.777292576419214,
1386
+ "eval_f1": 0.775972842719567,
1387
+ "eval_loss": 0.69657963514328,
1388
+ "eval_runtime": 25.4856,
1389
+ "eval_samples_per_second": 260.579,
1390
+ "eval_steps_per_second": 16.323,
1391
+ "step": 9681
1392
+ },
1393
+ {
1394
+ "epoch": 3.0058878215060427,
1395
+ "grad_norm": 9.228548049926758,
1396
+ "learning_rate": 2.3314740212787936e-05,
1397
+ "loss": 0.5385,
1398
+ "step": 9700
1399
+ },
1400
+ {
1401
+ "epoch": 3.021382088627208,
1402
+ "grad_norm": 4.332932472229004,
1403
+ "learning_rate": 2.3263092655717385e-05,
1404
+ "loss": 0.5513,
1405
+ "step": 9750
1406
+ },
1407
+ {
1408
+ "epoch": 3.036876355748373,
1409
+ "grad_norm": 6.478864669799805,
1410
+ "learning_rate": 2.3211445098646835e-05,
1411
+ "loss": 0.4542,
1412
+ "step": 9800
1413
+ },
1414
+ {
1415
+ "epoch": 3.0523706228695384,
1416
+ "grad_norm": 14.028499603271484,
1417
+ "learning_rate": 2.3159797541576285e-05,
1418
+ "loss": 0.4549,
1419
+ "step": 9850
1420
+ },
1421
+ {
1422
+ "epoch": 3.0678648899907035,
1423
+ "grad_norm": 5.590787887573242,
1424
+ "learning_rate": 2.310814998450573e-05,
1425
+ "loss": 0.4624,
1426
+ "step": 9900
1427
+ },
1428
+ {
1429
+ "epoch": 3.0833591571118686,
1430
+ "grad_norm": 5.623167514801025,
1431
+ "learning_rate": 2.305650242743518e-05,
1432
+ "loss": 0.4479,
1433
+ "step": 9950
1434
+ },
1435
+ {
1436
+ "epoch": 3.0988534242330337,
1437
+ "grad_norm": 10.343826293945312,
1438
+ "learning_rate": 2.3004854870364634e-05,
1439
+ "loss": 0.5079,
1440
+ "step": 10000
1441
+ },
1442
+ {
1443
+ "epoch": 3.1143476913541988,
1444
+ "grad_norm": 2.780686616897583,
1445
+ "learning_rate": 2.2953207313294084e-05,
1446
+ "loss": 0.43,
1447
+ "step": 10050
1448
+ },
1449
+ {
1450
+ "epoch": 3.1298419584753643,
1451
+ "grad_norm": 10.917914390563965,
1452
+ "learning_rate": 2.2901559756223533e-05,
1453
+ "loss": 0.4932,
1454
+ "step": 10100
1455
+ },
1456
+ {
1457
+ "epoch": 3.1453362255965294,
1458
+ "grad_norm": 14.870561599731445,
1459
+ "learning_rate": 2.284991219915298e-05,
1460
+ "loss": 0.483,
1461
+ "step": 10150
1462
+ },
1463
+ {
1464
+ "epoch": 3.1608304927176945,
1465
+ "grad_norm": 15.64564323425293,
1466
+ "learning_rate": 2.279826464208243e-05,
1467
+ "loss": 0.5047,
1468
+ "step": 10200
1469
+ },
1470
+ {
1471
+ "epoch": 3.1763247598388595,
1472
+ "grad_norm": 8.148391723632812,
1473
+ "learning_rate": 2.274661708501188e-05,
1474
+ "loss": 0.498,
1475
+ "step": 10250
1476
+ },
1477
+ {
1478
+ "epoch": 3.1918190269600246,
1479
+ "grad_norm": 9.916448593139648,
1480
+ "learning_rate": 2.269496952794133e-05,
1481
+ "loss": 0.4896,
1482
+ "step": 10300
1483
+ },
1484
+ {
1485
+ "epoch": 3.20731329408119,
1486
+ "grad_norm": 10.014134407043457,
1487
+ "learning_rate": 2.2643321970870778e-05,
1488
+ "loss": 0.4572,
1489
+ "step": 10350
1490
+ },
1491
+ {
1492
+ "epoch": 3.2228075612023552,
1493
+ "grad_norm": 9.647527694702148,
1494
+ "learning_rate": 2.2591674413800228e-05,
1495
+ "loss": 0.4965,
1496
+ "step": 10400
1497
+ },
1498
+ {
1499
+ "epoch": 3.2383018283235203,
1500
+ "grad_norm": 11.77087116241455,
1501
+ "learning_rate": 2.2540026856729678e-05,
1502
+ "loss": 0.512,
1503
+ "step": 10450
1504
+ },
1505
+ {
1506
+ "epoch": 3.2537960954446854,
1507
+ "grad_norm": 3.3613386154174805,
1508
+ "learning_rate": 2.2488379299659127e-05,
1509
+ "loss": 0.5522,
1510
+ "step": 10500
1511
+ },
1512
+ {
1513
+ "epoch": 3.2692903625658505,
1514
+ "grad_norm": 17.92693519592285,
1515
+ "learning_rate": 2.2436731742588577e-05,
1516
+ "loss": 0.4915,
1517
+ "step": 10550
1518
+ },
1519
+ {
1520
+ "epoch": 3.2847846296870156,
1521
+ "grad_norm": 8.389365196228027,
1522
+ "learning_rate": 2.2385084185518027e-05,
1523
+ "loss": 0.5343,
1524
+ "step": 10600
1525
+ },
1526
+ {
1527
+ "epoch": 3.300278896808181,
1528
+ "grad_norm": 9.849445343017578,
1529
+ "learning_rate": 2.2333436628447473e-05,
1530
+ "loss": 0.4925,
1531
+ "step": 10650
1532
+ },
1533
+ {
1534
+ "epoch": 3.315773163929346,
1535
+ "grad_norm": 7.494227886199951,
1536
+ "learning_rate": 2.2281789071376923e-05,
1537
+ "loss": 0.5242,
1538
+ "step": 10700
1539
+ },
1540
+ {
1541
+ "epoch": 3.3312674310505113,
1542
+ "grad_norm": 12.774617195129395,
1543
+ "learning_rate": 2.2230141514306372e-05,
1544
+ "loss": 0.522,
1545
+ "step": 10750
1546
+ },
1547
+ {
1548
+ "epoch": 3.3467616981716763,
1549
+ "grad_norm": 4.167229175567627,
1550
+ "learning_rate": 2.2178493957235822e-05,
1551
+ "loss": 0.4852,
1552
+ "step": 10800
1553
+ },
1554
+ {
1555
+ "epoch": 3.3622559652928414,
1556
+ "grad_norm": 7.823596000671387,
1557
+ "learning_rate": 2.2126846400165275e-05,
1558
+ "loss": 0.521,
1559
+ "step": 10850
1560
+ },
1561
+ {
1562
+ "epoch": 3.377750232414007,
1563
+ "grad_norm": 9.712186813354492,
1564
+ "learning_rate": 2.2075198843094725e-05,
1565
+ "loss": 0.4931,
1566
+ "step": 10900
1567
+ },
1568
+ {
1569
+ "epoch": 3.393244499535172,
1570
+ "grad_norm": 9.726935386657715,
1571
+ "learning_rate": 2.202355128602417e-05,
1572
+ "loss": 0.531,
1573
+ "step": 10950
1574
+ },
1575
+ {
1576
+ "epoch": 3.408738766656337,
1577
+ "grad_norm": 8.613348007202148,
1578
+ "learning_rate": 2.197190372895362e-05,
1579
+ "loss": 0.4902,
1580
+ "step": 11000
1581
+ },
1582
+ {
1583
+ "epoch": 3.424233033777502,
1584
+ "grad_norm": 17.698650360107422,
1585
+ "learning_rate": 2.192025617188307e-05,
1586
+ "loss": 0.4967,
1587
+ "step": 11050
1588
+ },
1589
+ {
1590
+ "epoch": 3.4397273008986673,
1591
+ "grad_norm": 13.304680824279785,
1592
+ "learning_rate": 2.186860861481252e-05,
1593
+ "loss": 0.4998,
1594
+ "step": 11100
1595
+ },
1596
+ {
1597
+ "epoch": 3.455221568019833,
1598
+ "grad_norm": 9.090615272521973,
1599
+ "learning_rate": 2.181696105774197e-05,
1600
+ "loss": 0.4797,
1601
+ "step": 11150
1602
+ },
1603
+ {
1604
+ "epoch": 3.470715835140998,
1605
+ "grad_norm": 6.544071197509766,
1606
+ "learning_rate": 2.176531350067142e-05,
1607
+ "loss": 0.5405,
1608
+ "step": 11200
1609
+ },
1610
+ {
1611
+ "epoch": 3.486210102262163,
1612
+ "grad_norm": 10.908158302307129,
1613
+ "learning_rate": 2.171366594360087e-05,
1614
+ "loss": 0.4663,
1615
+ "step": 11250
1616
+ },
1617
+ {
1618
+ "epoch": 3.501704369383328,
1619
+ "grad_norm": 9.044700622558594,
1620
+ "learning_rate": 2.166201838653032e-05,
1621
+ "loss": 0.4755,
1622
+ "step": 11300
1623
+ },
1624
+ {
1625
+ "epoch": 3.517198636504493,
1626
+ "grad_norm": 7.633232116699219,
1627
+ "learning_rate": 2.161037082945977e-05,
1628
+ "loss": 0.4182,
1629
+ "step": 11350
1630
+ },
1631
+ {
1632
+ "epoch": 3.5326929036256587,
1633
+ "grad_norm": 5.32473087310791,
1634
+ "learning_rate": 2.1558723272389218e-05,
1635
+ "loss": 0.4987,
1636
+ "step": 11400
1637
+ },
1638
+ {
1639
+ "epoch": 3.5481871707468238,
1640
+ "grad_norm": 9.8456392288208,
1641
+ "learning_rate": 2.1507075715318664e-05,
1642
+ "loss": 0.587,
1643
+ "step": 11450
1644
+ },
1645
+ {
1646
+ "epoch": 3.563681437867989,
1647
+ "grad_norm": 12.52115535736084,
1648
+ "learning_rate": 2.1455428158248114e-05,
1649
+ "loss": 0.5331,
1650
+ "step": 11500
1651
+ },
1652
+ {
1653
+ "epoch": 3.579175704989154,
1654
+ "grad_norm": 18.225566864013672,
1655
+ "learning_rate": 2.1403780601177564e-05,
1656
+ "loss": 0.4794,
1657
+ "step": 11550
1658
+ },
1659
+ {
1660
+ "epoch": 3.594669972110319,
1661
+ "grad_norm": 8.749368667602539,
1662
+ "learning_rate": 2.1352133044107013e-05,
1663
+ "loss": 0.4967,
1664
+ "step": 11600
1665
+ },
1666
+ {
1667
+ "epoch": 3.6101642392314846,
1668
+ "grad_norm": 8.760223388671875,
1669
+ "learning_rate": 2.1300485487036466e-05,
1670
+ "loss": 0.4963,
1671
+ "step": 11650
1672
+ },
1673
+ {
1674
+ "epoch": 3.6256585063526496,
1675
+ "grad_norm": 15.518270492553711,
1676
+ "learning_rate": 2.1248837929965913e-05,
1677
+ "loss": 0.4456,
1678
+ "step": 11700
1679
+ },
1680
+ {
1681
+ "epoch": 3.6411527734738147,
1682
+ "grad_norm": 9.451664924621582,
1683
+ "learning_rate": 2.1197190372895362e-05,
1684
+ "loss": 0.5235,
1685
+ "step": 11750
1686
+ },
1687
+ {
1688
+ "epoch": 3.65664704059498,
1689
+ "grad_norm": 17.736055374145508,
1690
+ "learning_rate": 2.1145542815824812e-05,
1691
+ "loss": 0.4876,
1692
+ "step": 11800
1693
+ },
1694
+ {
1695
+ "epoch": 3.672141307716145,
1696
+ "grad_norm": 24.323490142822266,
1697
+ "learning_rate": 2.1093895258754262e-05,
1698
+ "loss": 0.483,
1699
+ "step": 11850
1700
+ },
1701
+ {
1702
+ "epoch": 3.6876355748373104,
1703
+ "grad_norm": 15.389254570007324,
1704
+ "learning_rate": 2.104224770168371e-05,
1705
+ "loss": 0.5127,
1706
+ "step": 11900
1707
+ },
1708
+ {
1709
+ "epoch": 3.7031298419584755,
1710
+ "grad_norm": 11.283272743225098,
1711
+ "learning_rate": 2.0990600144613158e-05,
1712
+ "loss": 0.5282,
1713
+ "step": 11950
1714
+ },
1715
+ {
1716
+ "epoch": 3.7186241090796406,
1717
+ "grad_norm": 11.002310752868652,
1718
+ "learning_rate": 2.0938952587542607e-05,
1719
+ "loss": 0.5459,
1720
+ "step": 12000
1721
+ },
1722
+ {
1723
+ "epoch": 3.7341183762008057,
1724
+ "grad_norm": 6.972140312194824,
1725
+ "learning_rate": 2.088730503047206e-05,
1726
+ "loss": 0.536,
1727
+ "step": 12050
1728
+ },
1729
+ {
1730
+ "epoch": 3.7496126433219708,
1731
+ "grad_norm": 4.202858924865723,
1732
+ "learning_rate": 2.083565747340151e-05,
1733
+ "loss": 0.5736,
1734
+ "step": 12100
1735
+ },
1736
+ {
1737
+ "epoch": 3.7651069104431363,
1738
+ "grad_norm": 15.748515129089355,
1739
+ "learning_rate": 2.078400991633096e-05,
1740
+ "loss": 0.4715,
1741
+ "step": 12150
1742
+ },
1743
+ {
1744
+ "epoch": 3.7806011775643014,
1745
+ "grad_norm": 6.696774482727051,
1746
+ "learning_rate": 2.0732362359260406e-05,
1747
+ "loss": 0.5545,
1748
+ "step": 12200
1749
+ },
1750
+ {
1751
+ "epoch": 3.7960954446854664,
1752
+ "grad_norm": 7.366288661956787,
1753
+ "learning_rate": 2.0680714802189856e-05,
1754
+ "loss": 0.5736,
1755
+ "step": 12250
1756
+ },
1757
+ {
1758
+ "epoch": 3.8115897118066315,
1759
+ "grad_norm": 13.58438777923584,
1760
+ "learning_rate": 2.0629067245119306e-05,
1761
+ "loss": 0.4255,
1762
+ "step": 12300
1763
+ },
1764
+ {
1765
+ "epoch": 3.8270839789277966,
1766
+ "grad_norm": 9.109688758850098,
1767
+ "learning_rate": 2.0577419688048755e-05,
1768
+ "loss": 0.4565,
1769
+ "step": 12350
1770
+ },
1771
+ {
1772
+ "epoch": 3.842578246048962,
1773
+ "grad_norm": 11.448044776916504,
1774
+ "learning_rate": 2.0525772130978205e-05,
1775
+ "loss": 0.5117,
1776
+ "step": 12400
1777
+ },
1778
+ {
1779
+ "epoch": 3.858072513170127,
1780
+ "grad_norm": 6.876945495605469,
1781
+ "learning_rate": 2.0474124573907655e-05,
1782
+ "loss": 0.5543,
1783
+ "step": 12450
1784
+ },
1785
+ {
1786
+ "epoch": 3.8735667802912923,
1787
+ "grad_norm": 11.25009536743164,
1788
+ "learning_rate": 2.0422477016837104e-05,
1789
+ "loss": 0.456,
1790
+ "step": 12500
1791
+ },
1792
+ {
1793
+ "epoch": 3.8890610474124574,
1794
+ "grad_norm": 13.992502212524414,
1795
+ "learning_rate": 2.0370829459766554e-05,
1796
+ "loss": 0.4907,
1797
+ "step": 12550
1798
+ },
1799
+ {
1800
+ "epoch": 3.9045553145336225,
1801
+ "grad_norm": 11.92656421661377,
1802
+ "learning_rate": 2.0319181902696004e-05,
1803
+ "loss": 0.4841,
1804
+ "step": 12600
1805
+ },
1806
+ {
1807
+ "epoch": 3.9200495816547876,
1808
+ "grad_norm": 7.212582111358643,
1809
+ "learning_rate": 2.0267534345625453e-05,
1810
+ "loss": 0.5529,
1811
+ "step": 12650
1812
+ },
1813
+ {
1814
+ "epoch": 3.9355438487759526,
1815
+ "grad_norm": 14.616645812988281,
1816
+ "learning_rate": 2.02158867885549e-05,
1817
+ "loss": 0.5366,
1818
+ "step": 12700
1819
+ },
1820
+ {
1821
+ "epoch": 3.951038115897118,
1822
+ "grad_norm": 9.052292823791504,
1823
+ "learning_rate": 2.016423923148435e-05,
1824
+ "loss": 0.5459,
1825
+ "step": 12750
1826
+ },
1827
+ {
1828
+ "epoch": 3.9665323830182833,
1829
+ "grad_norm": 18.27539825439453,
1830
+ "learning_rate": 2.01125916744138e-05,
1831
+ "loss": 0.5631,
1832
+ "step": 12800
1833
+ },
1834
+ {
1835
+ "epoch": 3.9820266501394483,
1836
+ "grad_norm": 12.429372787475586,
1837
+ "learning_rate": 2.0060944117343252e-05,
1838
+ "loss": 0.4885,
1839
+ "step": 12850
1840
+ },
1841
+ {
1842
+ "epoch": 3.9975209172606134,
1843
+ "grad_norm": 4.481673240661621,
1844
+ "learning_rate": 2.00092965602727e-05,
1845
+ "loss": 0.4565,
1846
+ "step": 12900
1847
+ },
1848
+ {
1849
+ "epoch": 4.0,
1850
+ "eval_accuracy": 0.7863273603372986,
1851
+ "eval_f1": 0.7874334496964743,
1852
+ "eval_loss": 0.7491569519042969,
1853
+ "eval_runtime": 25.5609,
1854
+ "eval_samples_per_second": 259.811,
1855
+ "eval_steps_per_second": 16.275,
1856
+ "step": 12908
1857
+ },
1858
+ {
1859
+ "epoch": 4.0130151843817785,
1860
+ "grad_norm": 21.53974151611328,
1861
+ "learning_rate": 1.9957649003202148e-05,
1862
+ "loss": 0.2695,
1863
+ "step": 12950
1864
+ },
1865
+ {
1866
+ "epoch": 4.028509451502944,
1867
+ "grad_norm": 9.07942008972168,
1868
+ "learning_rate": 1.9906001446131598e-05,
1869
+ "loss": 0.3282,
1870
+ "step": 13000
1871
+ },
1872
+ {
1873
+ "epoch": 4.044003718624109,
1874
+ "grad_norm": 16.323549270629883,
1875
+ "learning_rate": 1.9854353889061047e-05,
1876
+ "loss": 0.3079,
1877
+ "step": 13050
1878
+ },
1879
+ {
1880
+ "epoch": 4.059497985745274,
1881
+ "grad_norm": 6.679697036743164,
1882
+ "learning_rate": 1.9802706331990497e-05,
1883
+ "loss": 0.3889,
1884
+ "step": 13100
1885
+ },
1886
+ {
1887
+ "epoch": 4.07499225286644,
1888
+ "grad_norm": 17.357574462890625,
1889
+ "learning_rate": 1.9751058774919947e-05,
1890
+ "loss": 0.3336,
1891
+ "step": 13150
1892
+ },
1893
+ {
1894
+ "epoch": 4.090486519987604,
1895
+ "grad_norm": 5.116195201873779,
1896
+ "learning_rate": 1.9699411217849393e-05,
1897
+ "loss": 0.3102,
1898
+ "step": 13200
1899
+ },
1900
+ {
1901
+ "epoch": 4.10598078710877,
1902
+ "grad_norm": 29.05538558959961,
1903
+ "learning_rate": 1.9647763660778846e-05,
1904
+ "loss": 0.2902,
1905
+ "step": 13250
1906
+ },
1907
+ {
1908
+ "epoch": 4.1214750542299345,
1909
+ "grad_norm": 6.254473686218262,
1910
+ "learning_rate": 1.9596116103708296e-05,
1911
+ "loss": 0.3816,
1912
+ "step": 13300
1913
+ },
1914
+ {
1915
+ "epoch": 4.1369693213511,
1916
+ "grad_norm": 11.854185104370117,
1917
+ "learning_rate": 1.9544468546637745e-05,
1918
+ "loss": 0.3455,
1919
+ "step": 13350
1920
+ },
1921
+ {
1922
+ "epoch": 4.152463588472266,
1923
+ "grad_norm": 16.399444580078125,
1924
+ "learning_rate": 1.9492820989567195e-05,
1925
+ "loss": 0.3713,
1926
+ "step": 13400
1927
+ },
1928
+ {
1929
+ "epoch": 4.16795785559343,
1930
+ "grad_norm": 18.26226234436035,
1931
+ "learning_rate": 1.9441173432496645e-05,
1932
+ "loss": 0.2957,
1933
+ "step": 13450
1934
+ },
1935
+ {
1936
+ "epoch": 4.183452122714596,
1937
+ "grad_norm": 6.590181350708008,
1938
+ "learning_rate": 1.938952587542609e-05,
1939
+ "loss": 0.2905,
1940
+ "step": 13500
1941
+ },
1942
+ {
1943
+ "epoch": 4.19894638983576,
1944
+ "grad_norm": 5.3814849853515625,
1945
+ "learning_rate": 1.933787831835554e-05,
1946
+ "loss": 0.3782,
1947
+ "step": 13550
1948
+ },
1949
+ {
1950
+ "epoch": 4.214440656956926,
1951
+ "grad_norm": 8.641956329345703,
1952
+ "learning_rate": 1.928623076128499e-05,
1953
+ "loss": 0.3211,
1954
+ "step": 13600
1955
+ },
1956
+ {
1957
+ "epoch": 4.2299349240780915,
1958
+ "grad_norm": 14.346405982971191,
1959
+ "learning_rate": 1.9234583204214443e-05,
1960
+ "loss": 0.3274,
1961
+ "step": 13650
1962
+ },
1963
+ {
1964
+ "epoch": 4.245429191199256,
1965
+ "grad_norm": 15.577725410461426,
1966
+ "learning_rate": 1.9182935647143893e-05,
1967
+ "loss": 0.3568,
1968
+ "step": 13700
1969
+ },
1970
+ {
1971
+ "epoch": 4.260923458320422,
1972
+ "grad_norm": 9.855398178100586,
1973
+ "learning_rate": 1.913128809007334e-05,
1974
+ "loss": 0.3008,
1975
+ "step": 13750
1976
+ },
1977
+ {
1978
+ "epoch": 4.276417725441586,
1979
+ "grad_norm": 15.720294952392578,
1980
+ "learning_rate": 1.907964053300279e-05,
1981
+ "loss": 0.2979,
1982
+ "step": 13800
1983
+ },
1984
+ {
1985
+ "epoch": 4.291911992562752,
1986
+ "grad_norm": 13.976778030395508,
1987
+ "learning_rate": 1.902799297593224e-05,
1988
+ "loss": 0.3389,
1989
+ "step": 13850
1990
+ },
1991
+ {
1992
+ "epoch": 4.307406259683917,
1993
+ "grad_norm": 19.255727767944336,
1994
+ "learning_rate": 1.897634541886169e-05,
1995
+ "loss": 0.3636,
1996
+ "step": 13900
1997
+ },
1998
+ {
1999
+ "epoch": 4.322900526805082,
2000
+ "grad_norm": 10.70836353302002,
2001
+ "learning_rate": 1.8924697861791138e-05,
2002
+ "loss": 0.3455,
2003
+ "step": 13950
2004
+ },
2005
+ {
2006
+ "epoch": 4.3383947939262475,
2007
+ "grad_norm": 0.9212763905525208,
2008
+ "learning_rate": 1.8873050304720584e-05,
2009
+ "loss": 0.3703,
2010
+ "step": 14000
2011
+ },
2012
+ {
2013
+ "epoch": 4.353889061047412,
2014
+ "grad_norm": 10.232623100280762,
2015
+ "learning_rate": 1.8821402747650037e-05,
2016
+ "loss": 0.3247,
2017
+ "step": 14050
2018
+ },
2019
+ {
2020
+ "epoch": 4.369383328168578,
2021
+ "grad_norm": 11.130922317504883,
2022
+ "learning_rate": 1.8769755190579487e-05,
2023
+ "loss": 0.3107,
2024
+ "step": 14100
2025
+ },
2026
+ {
2027
+ "epoch": 4.384877595289743,
2028
+ "grad_norm": 10.536752700805664,
2029
+ "learning_rate": 1.8718107633508937e-05,
2030
+ "loss": 0.3614,
2031
+ "step": 14150
2032
+ },
2033
+ {
2034
+ "epoch": 4.400371862410908,
2035
+ "grad_norm": 15.330968856811523,
2036
+ "learning_rate": 1.8666460076438386e-05,
2037
+ "loss": 0.3984,
2038
+ "step": 14200
2039
+ },
2040
+ {
2041
+ "epoch": 4.415866129532073,
2042
+ "grad_norm": 7.436588764190674,
2043
+ "learning_rate": 1.8614812519367833e-05,
2044
+ "loss": 0.3257,
2045
+ "step": 14250
2046
+ },
2047
+ {
2048
+ "epoch": 4.431360396653238,
2049
+ "grad_norm": 7.192384243011475,
2050
+ "learning_rate": 1.8563164962297282e-05,
2051
+ "loss": 0.3254,
2052
+ "step": 14300
2053
+ },
2054
+ {
2055
+ "epoch": 4.4468546637744035,
2056
+ "grad_norm": 7.792993545532227,
2057
+ "learning_rate": 1.8511517405226732e-05,
2058
+ "loss": 0.3392,
2059
+ "step": 14350
2060
+ },
2061
+ {
2062
+ "epoch": 4.462348930895569,
2063
+ "grad_norm": 12.411416053771973,
2064
+ "learning_rate": 1.8459869848156182e-05,
2065
+ "loss": 0.3383,
2066
+ "step": 14400
2067
+ },
2068
+ {
2069
+ "epoch": 4.477843198016734,
2070
+ "grad_norm": 17.897613525390625,
2071
+ "learning_rate": 1.8408222291085635e-05,
2072
+ "loss": 0.3392,
2073
+ "step": 14450
2074
+ },
2075
+ {
2076
+ "epoch": 4.493337465137899,
2077
+ "grad_norm": 23.59228515625,
2078
+ "learning_rate": 1.835657473401508e-05,
2079
+ "loss": 0.3055,
2080
+ "step": 14500
2081
+ },
2082
+ {
2083
+ "epoch": 4.508831732259064,
2084
+ "grad_norm": 13.722383499145508,
2085
+ "learning_rate": 1.830492717694453e-05,
2086
+ "loss": 0.3997,
2087
+ "step": 14550
2088
+ },
2089
+ {
2090
+ "epoch": 4.524325999380229,
2091
+ "grad_norm": 17.811538696289062,
2092
+ "learning_rate": 1.825327961987398e-05,
2093
+ "loss": 0.266,
2094
+ "step": 14600
2095
+ },
2096
+ {
2097
+ "epoch": 4.539820266501394,
2098
+ "grad_norm": 10.993431091308594,
2099
+ "learning_rate": 1.820163206280343e-05,
2100
+ "loss": 0.2634,
2101
+ "step": 14650
2102
+ },
2103
+ {
2104
+ "epoch": 4.55531453362256,
2105
+ "grad_norm": 5.25628137588501,
2106
+ "learning_rate": 1.814998450573288e-05,
2107
+ "loss": 0.3563,
2108
+ "step": 14700
2109
+ },
2110
+ {
2111
+ "epoch": 4.570808800743725,
2112
+ "grad_norm": 16.91241455078125,
2113
+ "learning_rate": 1.8098336948662326e-05,
2114
+ "loss": 0.3298,
2115
+ "step": 14750
2116
+ },
2117
+ {
2118
+ "epoch": 4.58630306786489,
2119
+ "grad_norm": 27.083995819091797,
2120
+ "learning_rate": 1.8046689391591776e-05,
2121
+ "loss": 0.36,
2122
+ "step": 14800
2123
+ },
2124
+ {
2125
+ "epoch": 4.601797334986055,
2126
+ "grad_norm": 19.726198196411133,
2127
+ "learning_rate": 1.799504183452123e-05,
2128
+ "loss": 0.3224,
2129
+ "step": 14850
2130
+ },
2131
+ {
2132
+ "epoch": 4.617291602107221,
2133
+ "grad_norm": 6.92859411239624,
2134
+ "learning_rate": 1.794339427745068e-05,
2135
+ "loss": 0.335,
2136
+ "step": 14900
2137
+ },
2138
+ {
2139
+ "epoch": 4.632785869228385,
2140
+ "grad_norm": 15.97644329071045,
2141
+ "learning_rate": 1.7891746720380128e-05,
2142
+ "loss": 0.3303,
2143
+ "step": 14950
2144
+ },
2145
+ {
2146
+ "epoch": 4.648280136349551,
2147
+ "grad_norm": 24.399837493896484,
2148
+ "learning_rate": 1.7840099163309575e-05,
2149
+ "loss": 0.3492,
2150
+ "step": 15000
2151
+ },
2152
+ {
2153
+ "epoch": 4.663774403470716,
2154
+ "grad_norm": 10.855368614196777,
2155
+ "learning_rate": 1.7788451606239024e-05,
2156
+ "loss": 0.3278,
2157
+ "step": 15050
2158
+ },
2159
+ {
2160
+ "epoch": 4.679268670591881,
2161
+ "grad_norm": 20.869380950927734,
2162
+ "learning_rate": 1.7736804049168474e-05,
2163
+ "loss": 0.2913,
2164
+ "step": 15100
2165
+ },
2166
+ {
2167
+ "epoch": 4.694762937713046,
2168
+ "grad_norm": 6.862913131713867,
2169
+ "learning_rate": 1.7685156492097924e-05,
2170
+ "loss": 0.3133,
2171
+ "step": 15150
2172
+ },
2173
+ {
2174
+ "epoch": 4.710257204834211,
2175
+ "grad_norm": 19.621482849121094,
2176
+ "learning_rate": 1.7633508935027373e-05,
2177
+ "loss": 0.3456,
2178
+ "step": 15200
2179
+ },
2180
+ {
2181
+ "epoch": 4.725751471955377,
2182
+ "grad_norm": 19.79738998413086,
2183
+ "learning_rate": 1.7581861377956823e-05,
2184
+ "loss": 0.3562,
2185
+ "step": 15250
2186
+ },
2187
+ {
2188
+ "epoch": 4.7412457390765415,
2189
+ "grad_norm": 3.2352957725524902,
2190
+ "learning_rate": 1.7530213820886273e-05,
2191
+ "loss": 0.3565,
2192
+ "step": 15300
2193
+ },
2194
+ {
2195
+ "epoch": 4.756740006197707,
2196
+ "grad_norm": 10.959282875061035,
2197
+ "learning_rate": 1.7478566263815722e-05,
2198
+ "loss": 0.3472,
2199
+ "step": 15350
2200
+ },
2201
+ {
2202
+ "epoch": 4.7722342733188725,
2203
+ "grad_norm": 3.22469162940979,
2204
+ "learning_rate": 1.7426918706745172e-05,
2205
+ "loss": 0.3367,
2206
+ "step": 15400
2207
+ },
2208
+ {
2209
+ "epoch": 4.787728540440037,
2210
+ "grad_norm": 7.619373798370361,
2211
+ "learning_rate": 1.737527114967462e-05,
2212
+ "loss": 0.319,
2213
+ "step": 15450
2214
+ },
2215
+ {
2216
+ "epoch": 4.803222807561203,
2217
+ "grad_norm": 24.706689834594727,
2218
+ "learning_rate": 1.7323623592604068e-05,
2219
+ "loss": 0.3939,
2220
+ "step": 15500
2221
+ },
2222
+ {
2223
+ "epoch": 4.818717074682367,
2224
+ "grad_norm": 15.918986320495605,
2225
+ "learning_rate": 1.7271976035533518e-05,
2226
+ "loss": 0.3618,
2227
+ "step": 15550
2228
+ },
2229
+ {
2230
+ "epoch": 4.834211341803533,
2231
+ "grad_norm": 14.518546104431152,
2232
+ "learning_rate": 1.7220328478462967e-05,
2233
+ "loss": 0.4082,
2234
+ "step": 15600
2235
+ },
2236
+ {
2237
+ "epoch": 4.8497056089246975,
2238
+ "grad_norm": 6.084866046905518,
2239
+ "learning_rate": 1.716868092139242e-05,
2240
+ "loss": 0.3594,
2241
+ "step": 15650
2242
+ },
2243
+ {
2244
+ "epoch": 4.865199876045863,
2245
+ "grad_norm": 18.435983657836914,
2246
+ "learning_rate": 1.711703336432187e-05,
2247
+ "loss": 0.3182,
2248
+ "step": 15700
2249
+ },
2250
+ {
2251
+ "epoch": 4.8806941431670285,
2252
+ "grad_norm": 14.745248794555664,
2253
+ "learning_rate": 1.7065385807251316e-05,
2254
+ "loss": 0.3375,
2255
+ "step": 15750
2256
+ },
2257
+ {
2258
+ "epoch": 4.896188410288193,
2259
+ "grad_norm": 11.518832206726074,
2260
+ "learning_rate": 1.7013738250180766e-05,
2261
+ "loss": 0.3371,
2262
+ "step": 15800
2263
+ },
2264
+ {
2265
+ "epoch": 4.911682677409359,
2266
+ "grad_norm": 17.58115005493164,
2267
+ "learning_rate": 1.6962090693110216e-05,
2268
+ "loss": 0.3851,
2269
+ "step": 15850
2270
+ },
2271
+ {
2272
+ "epoch": 4.927176944530523,
2273
+ "grad_norm": 16.769134521484375,
2274
+ "learning_rate": 1.6910443136039665e-05,
2275
+ "loss": 0.3009,
2276
+ "step": 15900
2277
+ },
2278
+ {
2279
+ "epoch": 4.942671211651689,
2280
+ "grad_norm": 21.518749237060547,
2281
+ "learning_rate": 1.6858795578969115e-05,
2282
+ "loss": 0.3155,
2283
+ "step": 15950
2284
+ },
2285
+ {
2286
+ "epoch": 4.958165478772854,
2287
+ "grad_norm": 11.044340133666992,
2288
+ "learning_rate": 1.6807148021898565e-05,
2289
+ "loss": 0.3648,
2290
+ "step": 16000
2291
+ },
2292
+ {
2293
+ "epoch": 4.973659745894019,
2294
+ "grad_norm": 3.9900588989257812,
2295
+ "learning_rate": 1.6755500464828014e-05,
2296
+ "loss": 0.3383,
2297
+ "step": 16050
2298
+ },
2299
+ {
2300
+ "epoch": 4.989154013015185,
2301
+ "grad_norm": 16.869041442871094,
2302
+ "learning_rate": 1.6703852907757464e-05,
2303
+ "loss": 0.3471,
2304
+ "step": 16100
2305
+ },
2306
+ {
2307
+ "epoch": 5.0,
2308
+ "eval_accuracy": 0.7964162023791598,
2309
+ "eval_f1": 0.7960859271865419,
2310
+ "eval_loss": 0.8064730167388916,
2311
+ "eval_runtime": 25.5199,
2312
+ "eval_samples_per_second": 260.228,
2313
+ "eval_steps_per_second": 16.301,
2314
+ "step": 16135
2315
+ },
2316
+ {
2317
+ "epoch": 5.004648280136349,
2318
+ "grad_norm": 6.430090427398682,
2319
+ "learning_rate": 1.6652205350686914e-05,
2320
+ "loss": 0.2737,
2321
+ "step": 16150
2322
+ },
2323
+ {
2324
+ "epoch": 5.020142547257515,
2325
+ "grad_norm": 5.049603462219238,
2326
+ "learning_rate": 1.6600557793616363e-05,
2327
+ "loss": 0.2016,
2328
+ "step": 16200
2329
+ },
2330
+ {
2331
+ "epoch": 5.03563681437868,
2332
+ "grad_norm": 19.016332626342773,
2333
+ "learning_rate": 1.6548910236545813e-05,
2334
+ "loss": 0.256,
2335
+ "step": 16250
2336
+ },
2337
+ {
2338
+ "epoch": 5.051131081499845,
2339
+ "grad_norm": 26.316680908203125,
2340
+ "learning_rate": 1.649726267947526e-05,
2341
+ "loss": 0.2052,
2342
+ "step": 16300
2343
+ },
2344
+ {
2345
+ "epoch": 5.06662534862101,
2346
+ "grad_norm": 3.5623199939727783,
2347
+ "learning_rate": 1.644561512240471e-05,
2348
+ "loss": 0.1912,
2349
+ "step": 16350
2350
+ },
2351
+ {
2352
+ "epoch": 5.082119615742175,
2353
+ "grad_norm": 17.87260627746582,
2354
+ "learning_rate": 1.639396756533416e-05,
2355
+ "loss": 0.1743,
2356
+ "step": 16400
2357
+ },
2358
+ {
2359
+ "epoch": 5.097613882863341,
2360
+ "grad_norm": 25.36100959777832,
2361
+ "learning_rate": 1.6342320008263612e-05,
2362
+ "loss": 0.2196,
2363
+ "step": 16450
2364
+ },
2365
+ {
2366
+ "epoch": 5.113108149984506,
2367
+ "grad_norm": 0.5424543023109436,
2368
+ "learning_rate": 1.629067245119306e-05,
2369
+ "loss": 0.2298,
2370
+ "step": 16500
2371
+ },
2372
+ {
2373
+ "epoch": 5.128602417105671,
2374
+ "grad_norm": 11.180691719055176,
2375
+ "learning_rate": 1.6239024894122508e-05,
2376
+ "loss": 0.2107,
2377
+ "step": 16550
2378
+ },
2379
+ {
2380
+ "epoch": 5.144096684226836,
2381
+ "grad_norm": 20.951330184936523,
2382
+ "learning_rate": 1.6187377337051957e-05,
2383
+ "loss": 0.2708,
2384
+ "step": 16600
2385
+ },
2386
+ {
2387
+ "epoch": 5.159590951348001,
2388
+ "grad_norm": 11.13031005859375,
2389
+ "learning_rate": 1.6135729779981407e-05,
2390
+ "loss": 0.1913,
2391
+ "step": 16650
2392
+ },
2393
+ {
2394
+ "epoch": 5.1750852184691665,
2395
+ "grad_norm": 0.4118750989437103,
2396
+ "learning_rate": 1.6084082222910857e-05,
2397
+ "loss": 0.2151,
2398
+ "step": 16700
2399
+ },
2400
+ {
2401
+ "epoch": 5.190579485590332,
2402
+ "grad_norm": 37.649024963378906,
2403
+ "learning_rate": 1.6032434665840307e-05,
2404
+ "loss": 0.2281,
2405
+ "step": 16750
2406
+ },
2407
+ {
2408
+ "epoch": 5.206073752711497,
2409
+ "grad_norm": 13.980716705322266,
2410
+ "learning_rate": 1.5980787108769753e-05,
2411
+ "loss": 0.241,
2412
+ "step": 16800
2413
+ },
2414
+ {
2415
+ "epoch": 5.221568019832662,
2416
+ "grad_norm": 34.06684112548828,
2417
+ "learning_rate": 1.5929139551699206e-05,
2418
+ "loss": 0.3001,
2419
+ "step": 16850
2420
+ },
2421
+ {
2422
+ "epoch": 5.237062286953827,
2423
+ "grad_norm": 26.94174575805664,
2424
+ "learning_rate": 1.5877491994628656e-05,
2425
+ "loss": 0.2121,
2426
+ "step": 16900
2427
+ },
2428
+ {
2429
+ "epoch": 5.252556554074992,
2430
+ "grad_norm": 30.45203399658203,
2431
+ "learning_rate": 1.5825844437558105e-05,
2432
+ "loss": 0.2403,
2433
+ "step": 16950
2434
+ },
2435
+ {
2436
+ "epoch": 5.268050821196157,
2437
+ "grad_norm": 10.348464965820312,
2438
+ "learning_rate": 1.5774196880487555e-05,
2439
+ "loss": 0.2228,
2440
+ "step": 17000
2441
+ },
2442
+ {
2443
+ "epoch": 5.2835450883173225,
2444
+ "grad_norm": 35.869544982910156,
2445
+ "learning_rate": 1.5722549323417e-05,
2446
+ "loss": 0.2325,
2447
+ "step": 17050
2448
+ },
2449
+ {
2450
+ "epoch": 5.299039355438488,
2451
+ "grad_norm": 11.559053421020508,
2452
+ "learning_rate": 1.567090176634645e-05,
2453
+ "loss": 0.1989,
2454
+ "step": 17100
2455
+ },
2456
+ {
2457
+ "epoch": 5.314533622559653,
2458
+ "grad_norm": 14.056360244750977,
2459
+ "learning_rate": 1.56192542092759e-05,
2460
+ "loss": 0.242,
2461
+ "step": 17150
2462
+ },
2463
+ {
2464
+ "epoch": 5.330027889680818,
2465
+ "grad_norm": 9.175251960754395,
2466
+ "learning_rate": 1.556760665220535e-05,
2467
+ "loss": 0.2942,
2468
+ "step": 17200
2469
+ },
2470
+ {
2471
+ "epoch": 5.345522156801984,
2472
+ "grad_norm": 5.0493483543396,
2473
+ "learning_rate": 1.5515959095134803e-05,
2474
+ "loss": 0.217,
2475
+ "step": 17250
2476
+ },
2477
+ {
2478
+ "epoch": 5.361016423923148,
2479
+ "grad_norm": 14.81401252746582,
2480
+ "learning_rate": 1.546431153806425e-05,
2481
+ "loss": 0.2217,
2482
+ "step": 17300
2483
+ },
2484
+ {
2485
+ "epoch": 5.376510691044314,
2486
+ "grad_norm": 11.582941055297852,
2487
+ "learning_rate": 1.54126639809937e-05,
2488
+ "loss": 0.2175,
2489
+ "step": 17350
2490
+ },
2491
+ {
2492
+ "epoch": 5.3920049581654785,
2493
+ "grad_norm": 16.075231552124023,
2494
+ "learning_rate": 1.536101642392315e-05,
2495
+ "loss": 0.2031,
2496
+ "step": 17400
2497
+ },
2498
+ {
2499
+ "epoch": 5.407499225286644,
2500
+ "grad_norm": 4.986071586608887,
2501
+ "learning_rate": 1.53093688668526e-05,
2502
+ "loss": 0.2476,
2503
+ "step": 17450
2504
+ },
2505
+ {
2506
+ "epoch": 5.422993492407809,
2507
+ "grad_norm": 12.152420997619629,
2508
+ "learning_rate": 1.5257721309782048e-05,
2509
+ "loss": 0.2839,
2510
+ "step": 17500
2511
+ },
2512
+ {
2513
+ "epoch": 5.438487759528974,
2514
+ "grad_norm": 34.93170928955078,
2515
+ "learning_rate": 1.5206073752711496e-05,
2516
+ "loss": 0.2486,
2517
+ "step": 17550
2518
+ },
2519
+ {
2520
+ "epoch": 5.45398202665014,
2521
+ "grad_norm": 19.539745330810547,
2522
+ "learning_rate": 1.5154426195640946e-05,
2523
+ "loss": 0.2799,
2524
+ "step": 17600
2525
+ },
2526
+ {
2527
+ "epoch": 5.469476293771304,
2528
+ "grad_norm": 0.6790868639945984,
2529
+ "learning_rate": 1.5102778638570396e-05,
2530
+ "loss": 0.255,
2531
+ "step": 17650
2532
+ },
2533
+ {
2534
+ "epoch": 5.48497056089247,
2535
+ "grad_norm": 10.917562484741211,
2536
+ "learning_rate": 1.5051131081499845e-05,
2537
+ "loss": 0.2371,
2538
+ "step": 17700
2539
+ },
2540
+ {
2541
+ "epoch": 5.5004648280136355,
2542
+ "grad_norm": 4.378004550933838,
2543
+ "learning_rate": 1.4999483524429295e-05,
2544
+ "loss": 0.1697,
2545
+ "step": 17750
2546
+ },
2547
+ {
2548
+ "epoch": 5.5159590951348,
2549
+ "grad_norm": 31.345949172973633,
2550
+ "learning_rate": 1.4947835967358745e-05,
2551
+ "loss": 0.2571,
2552
+ "step": 17800
2553
+ },
2554
+ {
2555
+ "epoch": 5.531453362255966,
2556
+ "grad_norm": 4.986736297607422,
2557
+ "learning_rate": 1.4896188410288194e-05,
2558
+ "loss": 0.2366,
2559
+ "step": 17850
2560
+ },
2561
+ {
2562
+ "epoch": 5.54694762937713,
2563
+ "grad_norm": 47.6024169921875,
2564
+ "learning_rate": 1.4844540853217642e-05,
2565
+ "loss": 0.1891,
2566
+ "step": 17900
2567
+ },
2568
+ {
2569
+ "epoch": 5.562441896498296,
2570
+ "grad_norm": 30.087663650512695,
2571
+ "learning_rate": 1.4792893296147094e-05,
2572
+ "loss": 0.2772,
2573
+ "step": 17950
2574
+ },
2575
+ {
2576
+ "epoch": 5.57793616361946,
2577
+ "grad_norm": 0.9499745965003967,
2578
+ "learning_rate": 1.4741245739076542e-05,
2579
+ "loss": 0.2005,
2580
+ "step": 18000
2581
+ },
2582
+ {
2583
+ "epoch": 5.593430430740626,
2584
+ "grad_norm": 7.3200578689575195,
2585
+ "learning_rate": 1.4689598182005991e-05,
2586
+ "loss": 0.2846,
2587
+ "step": 18050
2588
+ },
2589
+ {
2590
+ "epoch": 5.6089246978617915,
2591
+ "grad_norm": 47.46051025390625,
2592
+ "learning_rate": 1.4637950624935441e-05,
2593
+ "loss": 0.2339,
2594
+ "step": 18100
2595
+ },
2596
+ {
2597
+ "epoch": 5.624418964982956,
2598
+ "grad_norm": 0.13382229208946228,
2599
+ "learning_rate": 1.458630306786489e-05,
2600
+ "loss": 0.2502,
2601
+ "step": 18150
2602
+ },
2603
+ {
2604
+ "epoch": 5.639913232104122,
2605
+ "grad_norm": 36.12116622924805,
2606
+ "learning_rate": 1.453465551079434e-05,
2607
+ "loss": 0.2231,
2608
+ "step": 18200
2609
+ },
2610
+ {
2611
+ "epoch": 5.655407499225286,
2612
+ "grad_norm": 0.8564043045043945,
2613
+ "learning_rate": 1.4483007953723788e-05,
2614
+ "loss": 0.2678,
2615
+ "step": 18250
2616
+ },
2617
+ {
2618
+ "epoch": 5.670901766346452,
2619
+ "grad_norm": 5.724666118621826,
2620
+ "learning_rate": 1.4431360396653238e-05,
2621
+ "loss": 0.2279,
2622
+ "step": 18300
2623
+ },
2624
+ {
2625
+ "epoch": 5.686396033467617,
2626
+ "grad_norm": 31.30683708190918,
2627
+ "learning_rate": 1.437971283958269e-05,
2628
+ "loss": 0.2131,
2629
+ "step": 18350
2630
+ },
2631
+ {
2632
+ "epoch": 5.701890300588782,
2633
+ "grad_norm": 1.6962803602218628,
2634
+ "learning_rate": 1.4328065282512137e-05,
2635
+ "loss": 0.2052,
2636
+ "step": 18400
2637
+ },
2638
+ {
2639
+ "epoch": 5.7173845677099475,
2640
+ "grad_norm": 15.251322746276855,
2641
+ "learning_rate": 1.4276417725441587e-05,
2642
+ "loss": 0.2358,
2643
+ "step": 18450
2644
+ },
2645
+ {
2646
+ "epoch": 5.732878834831112,
2647
+ "grad_norm": 7.961274147033691,
2648
+ "learning_rate": 1.4224770168371035e-05,
2649
+ "loss": 0.2551,
2650
+ "step": 18500
2651
+ },
2652
+ {
2653
+ "epoch": 5.748373101952278,
2654
+ "grad_norm": 1.336027979850769,
2655
+ "learning_rate": 1.4173122611300486e-05,
2656
+ "loss": 0.2878,
2657
+ "step": 18550
2658
+ },
2659
+ {
2660
+ "epoch": 5.763867369073443,
2661
+ "grad_norm": 9.183156967163086,
2662
+ "learning_rate": 1.4121475054229936e-05,
2663
+ "loss": 0.2737,
2664
+ "step": 18600
2665
+ },
2666
+ {
2667
+ "epoch": 5.779361636194608,
2668
+ "grad_norm": 15.864567756652832,
2669
+ "learning_rate": 1.4069827497159384e-05,
2670
+ "loss": 0.1805,
2671
+ "step": 18650
2672
+ },
2673
+ {
2674
+ "epoch": 5.794855903315773,
2675
+ "grad_norm": 23.6713924407959,
2676
+ "learning_rate": 1.4018179940088834e-05,
2677
+ "loss": 0.2226,
2678
+ "step": 18700
2679
+ },
2680
+ {
2681
+ "epoch": 5.810350170436938,
2682
+ "grad_norm": 44.18366241455078,
2683
+ "learning_rate": 1.3966532383018283e-05,
2684
+ "loss": 0.2275,
2685
+ "step": 18750
2686
+ },
2687
+ {
2688
+ "epoch": 5.8258444375581036,
2689
+ "grad_norm": 2.445971965789795,
2690
+ "learning_rate": 1.3914884825947733e-05,
2691
+ "loss": 0.2182,
2692
+ "step": 18800
2693
+ },
2694
+ {
2695
+ "epoch": 5.841338704679268,
2696
+ "grad_norm": 19.132349014282227,
2697
+ "learning_rate": 1.3863237268877183e-05,
2698
+ "loss": 0.1776,
2699
+ "step": 18850
2700
+ },
2701
+ {
2702
+ "epoch": 5.856832971800434,
2703
+ "grad_norm": 14.281071662902832,
2704
+ "learning_rate": 1.381158971180663e-05,
2705
+ "loss": 0.2388,
2706
+ "step": 18900
2707
+ },
2708
+ {
2709
+ "epoch": 5.872327238921599,
2710
+ "grad_norm": 9.464559555053711,
2711
+ "learning_rate": 1.3759942154736082e-05,
2712
+ "loss": 0.2955,
2713
+ "step": 18950
2714
+ },
2715
+ {
2716
+ "epoch": 5.887821506042764,
2717
+ "grad_norm": 9.839083671569824,
2718
+ "learning_rate": 1.370829459766553e-05,
2719
+ "loss": 0.2653,
2720
+ "step": 19000
2721
+ },
2722
+ {
2723
+ "epoch": 5.903315773163929,
2724
+ "grad_norm": 21.549816131591797,
2725
+ "learning_rate": 1.365664704059498e-05,
2726
+ "loss": 0.1973,
2727
+ "step": 19050
2728
+ },
2729
+ {
2730
+ "epoch": 5.918810040285095,
2731
+ "grad_norm": 17.4051456451416,
2732
+ "learning_rate": 1.360499948352443e-05,
2733
+ "loss": 0.2111,
2734
+ "step": 19100
2735
+ },
2736
+ {
2737
+ "epoch": 5.93430430740626,
2738
+ "grad_norm": 12.166346549987793,
2739
+ "learning_rate": 1.355335192645388e-05,
2740
+ "loss": 0.2649,
2741
+ "step": 19150
2742
+ },
2743
+ {
2744
+ "epoch": 5.949798574527425,
2745
+ "grad_norm": 2.162867784500122,
2746
+ "learning_rate": 1.3501704369383329e-05,
2747
+ "loss": 0.1966,
2748
+ "step": 19200
2749
+ },
2750
+ {
2751
+ "epoch": 5.96529284164859,
2752
+ "grad_norm": 19.864362716674805,
2753
+ "learning_rate": 1.3450056812312779e-05,
2754
+ "loss": 0.2891,
2755
+ "step": 19250
2756
+ },
2757
+ {
2758
+ "epoch": 5.980787108769755,
2759
+ "grad_norm": 6.605401992797852,
2760
+ "learning_rate": 1.3398409255242227e-05,
2761
+ "loss": 0.2163,
2762
+ "step": 19300
2763
+ },
2764
+ {
2765
+ "epoch": 5.99628137589092,
2766
+ "grad_norm": 0.3219479024410248,
2767
+ "learning_rate": 1.3346761698171678e-05,
2768
+ "loss": 0.2835,
2769
+ "step": 19350
2770
+ },
2771
+ {
2772
+ "epoch": 6.0,
2773
+ "eval_accuracy": 0.7876825779250113,
2774
+ "eval_f1": 0.7883424547160977,
2775
+ "eval_loss": 0.9514418840408325,
2776
+ "eval_runtime": 25.521,
2777
+ "eval_samples_per_second": 260.217,
2778
+ "eval_steps_per_second": 16.3,
2779
+ "step": 19362
2780
+ },
2781
+ {
2782
+ "epoch": 6.0117756430120854,
2783
+ "grad_norm": 24.46638298034668,
2784
+ "learning_rate": 1.3295114141101126e-05,
2785
+ "loss": 0.1478,
2786
+ "step": 19400
2787
+ },
2788
+ {
2789
+ "epoch": 6.027269910133251,
2790
+ "grad_norm": 22.210508346557617,
2791
+ "learning_rate": 1.3243466584030576e-05,
2792
+ "loss": 0.1154,
2793
+ "step": 19450
2794
+ },
2795
+ {
2796
+ "epoch": 6.042764177254416,
2797
+ "grad_norm": 1.0985016822814941,
2798
+ "learning_rate": 1.3191819026960025e-05,
2799
+ "loss": 0.1032,
2800
+ "step": 19500
2801
+ },
2802
+ {
2803
+ "epoch": 6.058258444375581,
2804
+ "grad_norm": 18.337181091308594,
2805
+ "learning_rate": 1.3140171469889475e-05,
2806
+ "loss": 0.1289,
2807
+ "step": 19550
2808
+ },
2809
+ {
2810
+ "epoch": 6.073752711496746,
2811
+ "grad_norm": 16.683774948120117,
2812
+ "learning_rate": 1.3088523912818925e-05,
2813
+ "loss": 0.1279,
2814
+ "step": 19600
2815
+ },
2816
+ {
2817
+ "epoch": 6.089246978617911,
2818
+ "grad_norm": 27.718416213989258,
2819
+ "learning_rate": 1.3036876355748373e-05,
2820
+ "loss": 0.144,
2821
+ "step": 19650
2822
+ },
2823
+ {
2824
+ "epoch": 6.104741245739077,
2825
+ "grad_norm": 33.05785369873047,
2826
+ "learning_rate": 1.2985228798677822e-05,
2827
+ "loss": 0.1584,
2828
+ "step": 19700
2829
+ },
2830
+ {
2831
+ "epoch": 6.1202355128602415,
2832
+ "grad_norm": 0.15720854699611664,
2833
+ "learning_rate": 1.2933581241607274e-05,
2834
+ "loss": 0.1348,
2835
+ "step": 19750
2836
+ },
2837
+ {
2838
+ "epoch": 6.135729779981407,
2839
+ "grad_norm": 0.5664418935775757,
2840
+ "learning_rate": 1.2881933684536722e-05,
2841
+ "loss": 0.1434,
2842
+ "step": 19800
2843
+ },
2844
+ {
2845
+ "epoch": 6.151224047102572,
2846
+ "grad_norm": 0.36847808957099915,
2847
+ "learning_rate": 1.2830286127466171e-05,
2848
+ "loss": 0.2238,
2849
+ "step": 19850
2850
+ },
2851
+ {
2852
+ "epoch": 6.166718314223737,
2853
+ "grad_norm": 18.649120330810547,
2854
+ "learning_rate": 1.277863857039562e-05,
2855
+ "loss": 0.187,
2856
+ "step": 19900
2857
+ },
2858
+ {
2859
+ "epoch": 6.182212581344903,
2860
+ "grad_norm": 0.17821761965751648,
2861
+ "learning_rate": 1.272699101332507e-05,
2862
+ "loss": 0.1547,
2863
+ "step": 19950
2864
+ },
2865
+ {
2866
+ "epoch": 6.197706848466067,
2867
+ "grad_norm": 8.831562995910645,
2868
+ "learning_rate": 1.267534345625452e-05,
2869
+ "loss": 0.1539,
2870
+ "step": 20000
2871
+ },
2872
+ {
2873
+ "epoch": 6.213201115587233,
2874
+ "grad_norm": 1.2256168127059937,
2875
+ "learning_rate": 1.2623695899183968e-05,
2876
+ "loss": 0.136,
2877
+ "step": 20050
2878
+ },
2879
+ {
2880
+ "epoch": 6.2286953827083975,
2881
+ "grad_norm": 53.65285873413086,
2882
+ "learning_rate": 1.2572048342113418e-05,
2883
+ "loss": 0.1483,
2884
+ "step": 20100
2885
+ },
2886
+ {
2887
+ "epoch": 6.244189649829563,
2888
+ "grad_norm": 26.486663818359375,
2889
+ "learning_rate": 1.2520400785042868e-05,
2890
+ "loss": 0.1845,
2891
+ "step": 20150
2892
+ },
2893
+ {
2894
+ "epoch": 6.259683916950729,
2895
+ "grad_norm": 5.685501575469971,
2896
+ "learning_rate": 1.2468753227972317e-05,
2897
+ "loss": 0.1549,
2898
+ "step": 20200
2899
+ },
2900
+ {
2901
+ "epoch": 6.275178184071893,
2902
+ "grad_norm": 38.161312103271484,
2903
+ "learning_rate": 1.2417105670901767e-05,
2904
+ "loss": 0.154,
2905
+ "step": 20250
2906
+ },
2907
+ {
2908
+ "epoch": 6.290672451193059,
2909
+ "grad_norm": 8.335810661315918,
2910
+ "learning_rate": 1.2365458113831215e-05,
2911
+ "loss": 0.17,
2912
+ "step": 20300
2913
+ },
2914
+ {
2915
+ "epoch": 6.306166718314223,
2916
+ "grad_norm": 0.48542681336402893,
2917
+ "learning_rate": 1.2313810556760666e-05,
2918
+ "loss": 0.1399,
2919
+ "step": 20350
2920
+ },
2921
+ {
2922
+ "epoch": 6.321660985435389,
2923
+ "grad_norm": 58.425865173339844,
2924
+ "learning_rate": 1.2262162999690116e-05,
2925
+ "loss": 0.2096,
2926
+ "step": 20400
2927
+ },
2928
+ {
2929
+ "epoch": 6.337155252556554,
2930
+ "grad_norm": 1.698177456855774,
2931
+ "learning_rate": 1.2210515442619564e-05,
2932
+ "loss": 0.1646,
2933
+ "step": 20450
2934
+ },
2935
+ {
2936
+ "epoch": 6.352649519677719,
2937
+ "grad_norm": 9.479565620422363,
2938
+ "learning_rate": 1.2158867885549014e-05,
2939
+ "loss": 0.1261,
2940
+ "step": 20500
2941
+ },
2942
+ {
2943
+ "epoch": 6.368143786798885,
2944
+ "grad_norm": 29.26807975769043,
2945
+ "learning_rate": 1.2107220328478463e-05,
2946
+ "loss": 0.1258,
2947
+ "step": 20550
2948
+ },
2949
+ {
2950
+ "epoch": 6.383638053920049,
2951
+ "grad_norm": 17.38056182861328,
2952
+ "learning_rate": 1.2055572771407913e-05,
2953
+ "loss": 0.1663,
2954
+ "step": 20600
2955
+ },
2956
+ {
2957
+ "epoch": 6.399132321041215,
2958
+ "grad_norm": 0.11625684797763824,
2959
+ "learning_rate": 1.2003925214337363e-05,
2960
+ "loss": 0.1632,
2961
+ "step": 20650
2962
+ },
2963
+ {
2964
+ "epoch": 6.41462658816238,
2965
+ "grad_norm": 50.295257568359375,
2966
+ "learning_rate": 1.195227765726681e-05,
2967
+ "loss": 0.1805,
2968
+ "step": 20700
2969
+ },
2970
+ {
2971
+ "epoch": 6.430120855283545,
2972
+ "grad_norm": 8.677082061767578,
2973
+ "learning_rate": 1.1900630100196262e-05,
2974
+ "loss": 0.1145,
2975
+ "step": 20750
2976
+ },
2977
+ {
2978
+ "epoch": 6.4456151224047105,
2979
+ "grad_norm": 25.766963958740234,
2980
+ "learning_rate": 1.184898254312571e-05,
2981
+ "loss": 0.179,
2982
+ "step": 20800
2983
+ },
2984
+ {
2985
+ "epoch": 6.461109389525875,
2986
+ "grad_norm": 9.217619895935059,
2987
+ "learning_rate": 1.179733498605516e-05,
2988
+ "loss": 0.1661,
2989
+ "step": 20850
2990
+ },
2991
+ {
2992
+ "epoch": 6.476603656647041,
2993
+ "grad_norm": 26.130367279052734,
2994
+ "learning_rate": 1.174568742898461e-05,
2995
+ "loss": 0.1842,
2996
+ "step": 20900
2997
+ },
2998
+ {
2999
+ "epoch": 6.492097923768206,
3000
+ "grad_norm": 0.5569332838058472,
3001
+ "learning_rate": 1.169403987191406e-05,
3002
+ "loss": 0.1311,
3003
+ "step": 20950
3004
+ },
3005
+ {
3006
+ "epoch": 6.507592190889371,
3007
+ "grad_norm": 24.217166900634766,
3008
+ "learning_rate": 1.1642392314843509e-05,
3009
+ "loss": 0.1402,
3010
+ "step": 21000
3011
+ },
3012
+ {
3013
+ "epoch": 6.523086458010536,
3014
+ "grad_norm": 0.9757774472236633,
3015
+ "learning_rate": 1.1590744757772957e-05,
3016
+ "loss": 0.151,
3017
+ "step": 21050
3018
+ },
3019
+ {
3020
+ "epoch": 6.538580725131701,
3021
+ "grad_norm": 1.751021146774292,
3022
+ "learning_rate": 1.1539097200702407e-05,
3023
+ "loss": 0.2169,
3024
+ "step": 21100
3025
+ },
3026
+ {
3027
+ "epoch": 6.5540749922528665,
3028
+ "grad_norm": 1.1032994985580444,
3029
+ "learning_rate": 1.1487449643631858e-05,
3030
+ "loss": 0.1998,
3031
+ "step": 21150
3032
+ },
3033
+ {
3034
+ "epoch": 6.569569259374031,
3035
+ "grad_norm": 7.012996196746826,
3036
+ "learning_rate": 1.1435802086561306e-05,
3037
+ "loss": 0.1461,
3038
+ "step": 21200
3039
+ },
3040
+ {
3041
+ "epoch": 6.585063526495197,
3042
+ "grad_norm": 1.7933677434921265,
3043
+ "learning_rate": 1.1384154529490756e-05,
3044
+ "loss": 0.2137,
3045
+ "step": 21250
3046
+ },
3047
+ {
3048
+ "epoch": 6.600557793616362,
3049
+ "grad_norm": 0.28755664825439453,
3050
+ "learning_rate": 1.1332506972420204e-05,
3051
+ "loss": 0.1666,
3052
+ "step": 21300
3053
+ },
3054
+ {
3055
+ "epoch": 6.616052060737527,
3056
+ "grad_norm": 69.47583770751953,
3057
+ "learning_rate": 1.1280859415349655e-05,
3058
+ "loss": 0.1056,
3059
+ "step": 21350
3060
+ },
3061
+ {
3062
+ "epoch": 6.631546327858692,
3063
+ "grad_norm": 59.0959587097168,
3064
+ "learning_rate": 1.1229211858279105e-05,
3065
+ "loss": 0.2245,
3066
+ "step": 21400
3067
+ },
3068
+ {
3069
+ "epoch": 6.647040594979858,
3070
+ "grad_norm": 9.836857795715332,
3071
+ "learning_rate": 1.1177564301208553e-05,
3072
+ "loss": 0.1524,
3073
+ "step": 21450
3074
+ },
3075
+ {
3076
+ "epoch": 6.6625348621010225,
3077
+ "grad_norm": 26.670324325561523,
3078
+ "learning_rate": 1.1125916744138002e-05,
3079
+ "loss": 0.1188,
3080
+ "step": 21500
3081
+ },
3082
+ {
3083
+ "epoch": 6.678029129222188,
3084
+ "grad_norm": 62.01319122314453,
3085
+ "learning_rate": 1.1074269187067452e-05,
3086
+ "loss": 0.1745,
3087
+ "step": 21550
3088
+ },
3089
+ {
3090
+ "epoch": 6.693523396343353,
3091
+ "grad_norm": 0.7209033966064453,
3092
+ "learning_rate": 1.1022621629996902e-05,
3093
+ "loss": 0.1847,
3094
+ "step": 21600
3095
+ },
3096
+ {
3097
+ "epoch": 6.709017663464518,
3098
+ "grad_norm": 62.61027145385742,
3099
+ "learning_rate": 1.0970974072926351e-05,
3100
+ "loss": 0.1954,
3101
+ "step": 21650
3102
+ },
3103
+ {
3104
+ "epoch": 6.724511930585683,
3105
+ "grad_norm": 27.388124465942383,
3106
+ "learning_rate": 1.09193265158558e-05,
3107
+ "loss": 0.1611,
3108
+ "step": 21700
3109
+ },
3110
+ {
3111
+ "epoch": 6.740006197706848,
3112
+ "grad_norm": 0.4555682837963104,
3113
+ "learning_rate": 1.086767895878525e-05,
3114
+ "loss": 0.2174,
3115
+ "step": 21750
3116
+ },
3117
+ {
3118
+ "epoch": 6.755500464828014,
3119
+ "grad_norm": 3.9981863498687744,
3120
+ "learning_rate": 1.08160314017147e-05,
3121
+ "loss": 0.1812,
3122
+ "step": 21800
3123
+ },
3124
+ {
3125
+ "epoch": 6.770994731949179,
3126
+ "grad_norm": 34.40354919433594,
3127
+ "learning_rate": 1.0764383844644148e-05,
3128
+ "loss": 0.1399,
3129
+ "step": 21850
3130
+ },
3131
+ {
3132
+ "epoch": 6.786488999070344,
3133
+ "grad_norm": 14.363764762878418,
3134
+ "learning_rate": 1.0712736287573598e-05,
3135
+ "loss": 0.1776,
3136
+ "step": 21900
3137
+ },
3138
+ {
3139
+ "epoch": 6.80198326619151,
3140
+ "grad_norm": 30.52642822265625,
3141
+ "learning_rate": 1.0661088730503048e-05,
3142
+ "loss": 0.1588,
3143
+ "step": 21950
3144
+ },
3145
+ {
3146
+ "epoch": 6.817477533312674,
3147
+ "grad_norm": 14.077266693115234,
3148
+ "learning_rate": 1.0609441173432497e-05,
3149
+ "loss": 0.1954,
3150
+ "step": 22000
3151
+ },
3152
+ {
3153
+ "epoch": 6.83297180043384,
3154
+ "grad_norm": 18.22222328186035,
3155
+ "learning_rate": 1.0557793616361947e-05,
3156
+ "loss": 0.1691,
3157
+ "step": 22050
3158
+ },
3159
+ {
3160
+ "epoch": 6.848466067555004,
3161
+ "grad_norm": 3.799182176589966,
3162
+ "learning_rate": 1.0506146059291395e-05,
3163
+ "loss": 0.1695,
3164
+ "step": 22100
3165
+ },
3166
+ {
3167
+ "epoch": 6.86396033467617,
3168
+ "grad_norm": 22.45465087890625,
3169
+ "learning_rate": 1.0454498502220846e-05,
3170
+ "loss": 0.1558,
3171
+ "step": 22150
3172
+ },
3173
+ {
3174
+ "epoch": 6.879454601797335,
3175
+ "grad_norm": 42.03032302856445,
3176
+ "learning_rate": 1.0402850945150294e-05,
3177
+ "loss": 0.1663,
3178
+ "step": 22200
3179
+ },
3180
+ {
3181
+ "epoch": 6.8949488689185,
3182
+ "grad_norm": 0.39429527521133423,
3183
+ "learning_rate": 1.0351203388079744e-05,
3184
+ "loss": 0.1494,
3185
+ "step": 22250
3186
+ },
3187
+ {
3188
+ "epoch": 6.910443136039666,
3189
+ "grad_norm": 18.078174591064453,
3190
+ "learning_rate": 1.0299555831009194e-05,
3191
+ "loss": 0.1776,
3192
+ "step": 22300
3193
+ },
3194
+ {
3195
+ "epoch": 6.92593740316083,
3196
+ "grad_norm": 0.8698641061782837,
3197
+ "learning_rate": 1.0247908273938643e-05,
3198
+ "loss": 0.1575,
3199
+ "step": 22350
3200
+ },
3201
+ {
3202
+ "epoch": 6.941431670281996,
3203
+ "grad_norm": 12.034942626953125,
3204
+ "learning_rate": 1.0196260716868093e-05,
3205
+ "loss": 0.15,
3206
+ "step": 22400
3207
+ },
3208
+ {
3209
+ "epoch": 6.9569259374031605,
3210
+ "grad_norm": 18.04237937927246,
3211
+ "learning_rate": 1.0144613159797541e-05,
3212
+ "loss": 0.1581,
3213
+ "step": 22450
3214
+ },
3215
+ {
3216
+ "epoch": 6.972420204524326,
3217
+ "grad_norm": 11.662799835205078,
3218
+ "learning_rate": 1.009296560272699e-05,
3219
+ "loss": 0.1876,
3220
+ "step": 22500
3221
+ },
3222
+ {
3223
+ "epoch": 6.9879144716454915,
3224
+ "grad_norm": 13.070639610290527,
3225
+ "learning_rate": 1.0041318045656442e-05,
3226
+ "loss": 0.2079,
3227
+ "step": 22550
3228
+ },
3229
+ {
3230
+ "epoch": 7.0,
3231
+ "eval_accuracy": 0.7888872157807559,
3232
+ "eval_f1": 0.7894450189158516,
3233
+ "eval_loss": 1.1703307628631592,
3234
+ "eval_runtime": 25.562,
3235
+ "eval_samples_per_second": 259.799,
3236
+ "eval_steps_per_second": 16.274,
3237
+ "step": 22589
3238
+ }
3239
+ ],
3240
+ "logging_steps": 50,
3241
+ "max_steps": 32270,
3242
+ "num_input_tokens_seen": 0,
3243
+ "num_train_epochs": 10,
3244
+ "save_steps": 500,
3245
+ "stateful_callbacks": {
3246
+ "EarlyStoppingCallback": {
3247
+ "args": {
3248
+ "early_stopping_patience": 2,
3249
+ "early_stopping_threshold": 0.0
3250
+ },
3251
+ "attributes": {
3252
+ "early_stopping_patience_counter": 2
3253
+ }
3254
+ },
3255
+ "TrainerControl": {
3256
+ "args": {
3257
+ "should_epoch_stop": false,
3258
+ "should_evaluate": false,
3259
+ "should_log": false,
3260
+ "should_save": true,
3261
+ "should_training_stop": true
3262
+ },
3263
+ "attributes": {}
3264
+ }
3265
+ },
3266
+ "total_flos": 4.754719082566656e+16,
3267
+ "train_batch_size": 16,
3268
+ "trial_name": null,
3269
+ "trial_params": null
3270
+ }
checkpoint-22589/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10df86f20d44dbad5bcbd0936a460173513fcbc2b606339a5e32ef490cc53c86
3
+ size 5216
config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ElectraForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "embedding_size": 768,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "\uae30\uc068",
13
+ "1": "\ub2f9\ud669",
14
+ "2": "\ubd84\ub178",
15
+ "3": "\ubd88\uc548",
16
+ "4": "\uc0c1\ucc98",
17
+ "5": "\uc2ac\ud514"
18
+ },
19
+ "initializer_range": 0.02,
20
+ "intermediate_size": 3072,
21
+ "label2id": {
22
+ "\uae30\uc068": 0,
23
+ "\ub2f9\ud669": 1,
24
+ "\ubd84\ub178": 2,
25
+ "\ubd88\uc548": 3,
26
+ "\uc0c1\ucc98": 4,
27
+ "\uc2ac\ud514": 5
28
+ },
29
+ "layer_norm_eps": 1e-12,
30
+ "max_position_embeddings": 512,
31
+ "model_type": "electra",
32
+ "num_attention_heads": 12,
33
+ "num_hidden_layers": 12,
34
+ "pad_token_id": 0,
35
+ "position_embedding_type": "absolute",
36
+ "problem_type": "single_label_classification",
37
+ "summary_activation": "gelu",
38
+ "summary_last_dropout": 0.1,
39
+ "summary_type": "first",
40
+ "summary_use_proj": true,
41
+ "tokenizer_class": "BertTokenizer",
42
+ "torch_dtype": "float32",
43
+ "transformers_version": "4.52.2",
44
+ "type_vocab_size": 2,
45
+ "use_cache": true,
46
+ "vocab_size": 54343
47
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:149e0f950d5f1601cc510fbd6b354de2f1e128f9f787c2d32ed18a6ec9b5f743
3
+ size 511149672
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": false,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10df86f20d44dbad5bcbd0936a460173513fcbc2b606339a5e32ef490cc53c86
3
+ size 5216
vocab.txt ADDED
The diff for this file is too large to render. See raw diff