Leonel-Maia commited on
Commit
83d274b
·
verified ·
1 Parent(s): 3e1e4cd

End of training

Browse files
Files changed (5) hide show
  1. README.md +16 -4
  2. all_results.json +15 -0
  3. eval_results.json +9 -0
  4. train_results.json +9 -0
  5. trainer_state.json +1691 -0
README.md CHANGED
@@ -4,11 +4,23 @@ license: apache-2.0
4
  base_model: Leonel-Maia/fongbe-whisper-small
5
  tags:
6
  - generated_from_trainer
 
 
7
  metrics:
8
  - wer
9
  model-index:
10
  - name: whisper-small-transfer
11
- results: []
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -16,10 +28,10 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # whisper-small-transfer
18
 
19
- This model is a fine-tuned version of [Leonel-Maia/fongbe-whisper-small](https://huggingface.co/Leonel-Maia/fongbe-whisper-small) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.2720
22
- - Wer: 0.2185
23
 
24
  ## Model description
25
 
 
4
  base_model: Leonel-Maia/fongbe-whisper-small
5
  tags:
6
  - generated_from_trainer
7
+ datasets:
8
+ - Leonel-Maia/ewe_dataset_splitted
9
  metrics:
10
  - wer
11
  model-index:
12
  - name: whisper-small-transfer
13
+ results:
14
+ - task:
15
+ name: Automatic Speech Recognition
16
+ type: automatic-speech-recognition
17
+ dataset:
18
+ name: Leonel-Maia/ewe_dataset_splitted
19
+ type: Leonel-Maia/ewe_dataset_splitted
20
+ metrics:
21
+ - name: Wer
22
+ type: wer
23
+ value: 0.21356341934578732
24
  ---
25
 
26
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
28
 
29
  # whisper-small-transfer
30
 
31
+ This model is a fine-tuned version of [Leonel-Maia/fongbe-whisper-small](https://huggingface.co/Leonel-Maia/fongbe-whisper-small) on the Leonel-Maia/ewe_dataset_splitted dataset.
32
  It achieves the following results on the evaluation set:
33
+ - Loss: 0.2392
34
+ - Wer: 0.2136
35
 
36
  ## Model description
37
 
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 6.641930618401207,
3
+ "eval_loss": 0.23916485905647278,
4
+ "eval_runtime": 2446.3074,
5
+ "eval_samples": 3315,
6
+ "eval_samples_per_second": 1.355,
7
+ "eval_steps_per_second": 0.339,
8
+ "eval_wer": 0.21356341934578732,
9
+ "total_flos": 5.08256607043584e+19,
10
+ "train_loss": 0.25465128779411317,
11
+ "train_runtime": 113391.8356,
12
+ "train_samples": 26517,
13
+ "train_samples_per_second": 14.031,
14
+ "train_steps_per_second": 0.438
15
+ }
eval_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 6.641930618401207,
3
+ "eval_loss": 0.23916485905647278,
4
+ "eval_runtime": 2446.3074,
5
+ "eval_samples": 3315,
6
+ "eval_samples_per_second": 1.355,
7
+ "eval_steps_per_second": 0.339,
8
+ "eval_wer": 0.21356341934578732
9
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 6.641930618401207,
3
+ "total_flos": 5.08256607043584e+19,
4
+ "train_loss": 0.25465128779411317,
5
+ "train_runtime": 113391.8356,
6
+ "train_samples": 26517,
7
+ "train_samples_per_second": 14.031,
8
+ "train_steps_per_second": 0.438
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1691 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 3000,
3
+ "best_metric": 0.23916485905647278,
4
+ "best_model_checkpoint": "./whisper-small-transfer/checkpoint-3000",
5
+ "epoch": 6.641930618401207,
6
+ "eval_steps": 500,
7
+ "global_step": 5500,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.030165912518853696,
14
+ "grad_norm": 26.211389541625977,
15
+ "learning_rate": 4.800000000000001e-07,
16
+ "loss": 4.3528,
17
+ "step": 25
18
+ },
19
+ {
20
+ "epoch": 0.06033182503770739,
21
+ "grad_norm": 16.204137802124023,
22
+ "learning_rate": 9.800000000000001e-07,
23
+ "loss": 3.4167,
24
+ "step": 50
25
+ },
26
+ {
27
+ "epoch": 0.09049773755656108,
28
+ "grad_norm": 10.840243339538574,
29
+ "learning_rate": 1.48e-06,
30
+ "loss": 2.3923,
31
+ "step": 75
32
+ },
33
+ {
34
+ "epoch": 0.12066365007541478,
35
+ "grad_norm": 7.455395698547363,
36
+ "learning_rate": 1.98e-06,
37
+ "loss": 1.7404,
38
+ "step": 100
39
+ },
40
+ {
41
+ "epoch": 0.15082956259426847,
42
+ "grad_norm": 5.6613593101501465,
43
+ "learning_rate": 2.4800000000000004e-06,
44
+ "loss": 1.3135,
45
+ "step": 125
46
+ },
47
+ {
48
+ "epoch": 0.18099547511312217,
49
+ "grad_norm": 4.958397388458252,
50
+ "learning_rate": 2.9800000000000003e-06,
51
+ "loss": 1.0967,
52
+ "step": 150
53
+ },
54
+ {
55
+ "epoch": 0.21116138763197587,
56
+ "grad_norm": 4.388453006744385,
57
+ "learning_rate": 3.48e-06,
58
+ "loss": 0.9164,
59
+ "step": 175
60
+ },
61
+ {
62
+ "epoch": 0.24132730015082957,
63
+ "grad_norm": 4.023995399475098,
64
+ "learning_rate": 3.980000000000001e-06,
65
+ "loss": 0.8129,
66
+ "step": 200
67
+ },
68
+ {
69
+ "epoch": 0.27149321266968324,
70
+ "grad_norm": 3.9679815769195557,
71
+ "learning_rate": 4.48e-06,
72
+ "loss": 0.7049,
73
+ "step": 225
74
+ },
75
+ {
76
+ "epoch": 0.30165912518853694,
77
+ "grad_norm": 4.201240062713623,
78
+ "learning_rate": 4.980000000000001e-06,
79
+ "loss": 0.6658,
80
+ "step": 250
81
+ },
82
+ {
83
+ "epoch": 0.33182503770739064,
84
+ "grad_norm": 4.384641647338867,
85
+ "learning_rate": 5.480000000000001e-06,
86
+ "loss": 0.6323,
87
+ "step": 275
88
+ },
89
+ {
90
+ "epoch": 0.36199095022624433,
91
+ "grad_norm": 3.6530189514160156,
92
+ "learning_rate": 5.98e-06,
93
+ "loss": 0.5616,
94
+ "step": 300
95
+ },
96
+ {
97
+ "epoch": 0.39215686274509803,
98
+ "grad_norm": 4.075361728668213,
99
+ "learning_rate": 6.480000000000001e-06,
100
+ "loss": 0.5227,
101
+ "step": 325
102
+ },
103
+ {
104
+ "epoch": 0.42232277526395173,
105
+ "grad_norm": 4.275745868682861,
106
+ "learning_rate": 6.98e-06,
107
+ "loss": 0.5387,
108
+ "step": 350
109
+ },
110
+ {
111
+ "epoch": 0.45248868778280543,
112
+ "grad_norm": 3.494539737701416,
113
+ "learning_rate": 7.48e-06,
114
+ "loss": 0.5019,
115
+ "step": 375
116
+ },
117
+ {
118
+ "epoch": 0.48265460030165913,
119
+ "grad_norm": 3.5438411235809326,
120
+ "learning_rate": 7.980000000000002e-06,
121
+ "loss": 0.4708,
122
+ "step": 400
123
+ },
124
+ {
125
+ "epoch": 0.5128205128205128,
126
+ "grad_norm": 3.6554458141326904,
127
+ "learning_rate": 8.48e-06,
128
+ "loss": 0.441,
129
+ "step": 425
130
+ },
131
+ {
132
+ "epoch": 0.5429864253393665,
133
+ "grad_norm": 3.1188902854919434,
134
+ "learning_rate": 8.98e-06,
135
+ "loss": 0.4363,
136
+ "step": 450
137
+ },
138
+ {
139
+ "epoch": 0.5731523378582202,
140
+ "grad_norm": 3.156195878982544,
141
+ "learning_rate": 9.48e-06,
142
+ "loss": 0.4245,
143
+ "step": 475
144
+ },
145
+ {
146
+ "epoch": 0.6033182503770739,
147
+ "grad_norm": 3.006938934326172,
148
+ "learning_rate": 9.980000000000001e-06,
149
+ "loss": 0.3978,
150
+ "step": 500
151
+ },
152
+ {
153
+ "epoch": 0.6033182503770739,
154
+ "eval_loss": 0.3847188353538513,
155
+ "eval_runtime": 2461.5392,
156
+ "eval_samples_per_second": 1.347,
157
+ "eval_steps_per_second": 0.337,
158
+ "eval_wer": 0.34837741876301787,
159
+ "step": 500
160
+ },
161
+ {
162
+ "epoch": 0.6334841628959276,
163
+ "grad_norm": 3.300306797027588,
164
+ "learning_rate": 9.99511996746645e-06,
165
+ "loss": 0.4099,
166
+ "step": 525
167
+ },
168
+ {
169
+ "epoch": 0.6636500754147813,
170
+ "grad_norm": 2.924450635910034,
171
+ "learning_rate": 9.990036600244002e-06,
172
+ "loss": 0.3736,
173
+ "step": 550
174
+ },
175
+ {
176
+ "epoch": 0.693815987933635,
177
+ "grad_norm": 3.143441677093506,
178
+ "learning_rate": 9.984953233021555e-06,
179
+ "loss": 0.3785,
180
+ "step": 575
181
+ },
182
+ {
183
+ "epoch": 0.7239819004524887,
184
+ "grad_norm": 3.0799312591552734,
185
+ "learning_rate": 9.979869865799107e-06,
186
+ "loss": 0.3559,
187
+ "step": 600
188
+ },
189
+ {
190
+ "epoch": 0.7541478129713424,
191
+ "grad_norm": 2.933881998062134,
192
+ "learning_rate": 9.974786498576659e-06,
193
+ "loss": 0.3554,
194
+ "step": 625
195
+ },
196
+ {
197
+ "epoch": 0.7843137254901961,
198
+ "grad_norm": 2.6328113079071045,
199
+ "learning_rate": 9.96970313135421e-06,
200
+ "loss": 0.3393,
201
+ "step": 650
202
+ },
203
+ {
204
+ "epoch": 0.8144796380090498,
205
+ "grad_norm": 2.767172336578369,
206
+ "learning_rate": 9.964619764131762e-06,
207
+ "loss": 0.3539,
208
+ "step": 675
209
+ },
210
+ {
211
+ "epoch": 0.8446455505279035,
212
+ "grad_norm": 3.040672779083252,
213
+ "learning_rate": 9.959536396909314e-06,
214
+ "loss": 0.3347,
215
+ "step": 700
216
+ },
217
+ {
218
+ "epoch": 0.8748114630467572,
219
+ "grad_norm": 2.708042860031128,
220
+ "learning_rate": 9.954453029686866e-06,
221
+ "loss": 0.3208,
222
+ "step": 725
223
+ },
224
+ {
225
+ "epoch": 0.9049773755656109,
226
+ "grad_norm": 2.599907875061035,
227
+ "learning_rate": 9.949369662464417e-06,
228
+ "loss": 0.3314,
229
+ "step": 750
230
+ },
231
+ {
232
+ "epoch": 0.9351432880844646,
233
+ "grad_norm": 2.7246906757354736,
234
+ "learning_rate": 9.944286295241969e-06,
235
+ "loss": 0.3212,
236
+ "step": 775
237
+ },
238
+ {
239
+ "epoch": 0.9653092006033183,
240
+ "grad_norm": 2.789820909500122,
241
+ "learning_rate": 9.93920292801952e-06,
242
+ "loss": 0.3197,
243
+ "step": 800
244
+ },
245
+ {
246
+ "epoch": 0.995475113122172,
247
+ "grad_norm": 2.7562570571899414,
248
+ "learning_rate": 9.934119560797073e-06,
249
+ "loss": 0.3075,
250
+ "step": 825
251
+ },
252
+ {
253
+ "epoch": 1.0265460030165912,
254
+ "grad_norm": 2.808879852294922,
255
+ "learning_rate": 9.929036193574624e-06,
256
+ "loss": 0.2848,
257
+ "step": 850
258
+ },
259
+ {
260
+ "epoch": 1.056711915535445,
261
+ "grad_norm": 2.3943305015563965,
262
+ "learning_rate": 9.923952826352176e-06,
263
+ "loss": 0.2741,
264
+ "step": 875
265
+ },
266
+ {
267
+ "epoch": 1.0868778280542986,
268
+ "grad_norm": 2.515653133392334,
269
+ "learning_rate": 9.91886945912973e-06,
270
+ "loss": 0.2812,
271
+ "step": 900
272
+ },
273
+ {
274
+ "epoch": 1.1170437405731524,
275
+ "grad_norm": 2.7121636867523193,
276
+ "learning_rate": 9.913786091907281e-06,
277
+ "loss": 0.2805,
278
+ "step": 925
279
+ },
280
+ {
281
+ "epoch": 1.147209653092006,
282
+ "grad_norm": 2.8646414279937744,
283
+ "learning_rate": 9.908702724684833e-06,
284
+ "loss": 0.2726,
285
+ "step": 950
286
+ },
287
+ {
288
+ "epoch": 1.1773755656108598,
289
+ "grad_norm": 2.6251399517059326,
290
+ "learning_rate": 9.903619357462384e-06,
291
+ "loss": 0.2747,
292
+ "step": 975
293
+ },
294
+ {
295
+ "epoch": 1.2075414781297134,
296
+ "grad_norm": 2.397496461868286,
297
+ "learning_rate": 9.898535990239936e-06,
298
+ "loss": 0.249,
299
+ "step": 1000
300
+ },
301
+ {
302
+ "epoch": 1.2075414781297134,
303
+ "eval_loss": 0.28903013467788696,
304
+ "eval_runtime": 2466.5989,
305
+ "eval_samples_per_second": 1.344,
306
+ "eval_steps_per_second": 0.336,
307
+ "eval_wer": 0.25850141915153085,
308
+ "step": 1000
309
+ },
310
+ {
311
+ "epoch": 1.2377073906485672,
312
+ "grad_norm": 2.6788296699523926,
313
+ "learning_rate": 9.893452623017488e-06,
314
+ "loss": 0.2737,
315
+ "step": 1025
316
+ },
317
+ {
318
+ "epoch": 1.2678733031674208,
319
+ "grad_norm": 2.5809574127197266,
320
+ "learning_rate": 9.88836925579504e-06,
321
+ "loss": 0.2609,
322
+ "step": 1050
323
+ },
324
+ {
325
+ "epoch": 1.2980392156862746,
326
+ "grad_norm": 2.419163227081299,
327
+ "learning_rate": 9.883285888572591e-06,
328
+ "loss": 0.2673,
329
+ "step": 1075
330
+ },
331
+ {
332
+ "epoch": 1.3282051282051281,
333
+ "grad_norm": 2.5561363697052,
334
+ "learning_rate": 9.878202521350143e-06,
335
+ "loss": 0.2535,
336
+ "step": 1100
337
+ },
338
+ {
339
+ "epoch": 1.358371040723982,
340
+ "grad_norm": 2.6312174797058105,
341
+ "learning_rate": 9.873119154127695e-06,
342
+ "loss": 0.2693,
343
+ "step": 1125
344
+ },
345
+ {
346
+ "epoch": 1.3885369532428355,
347
+ "grad_norm": 2.4085021018981934,
348
+ "learning_rate": 9.868035786905246e-06,
349
+ "loss": 0.2598,
350
+ "step": 1150
351
+ },
352
+ {
353
+ "epoch": 1.4187028657616894,
354
+ "grad_norm": 2.352027654647827,
355
+ "learning_rate": 9.862952419682798e-06,
356
+ "loss": 0.2713,
357
+ "step": 1175
358
+ },
359
+ {
360
+ "epoch": 1.448868778280543,
361
+ "grad_norm": 2.478139877319336,
362
+ "learning_rate": 9.85786905246035e-06,
363
+ "loss": 0.2783,
364
+ "step": 1200
365
+ },
366
+ {
367
+ "epoch": 1.4790346907993968,
368
+ "grad_norm": 2.542982816696167,
369
+ "learning_rate": 9.852785685237901e-06,
370
+ "loss": 0.2445,
371
+ "step": 1225
372
+ },
373
+ {
374
+ "epoch": 1.5092006033182503,
375
+ "grad_norm": 2.4919915199279785,
376
+ "learning_rate": 9.847702318015455e-06,
377
+ "loss": 0.2654,
378
+ "step": 1250
379
+ },
380
+ {
381
+ "epoch": 1.539366515837104,
382
+ "grad_norm": 2.59543776512146,
383
+ "learning_rate": 9.842618950793007e-06,
384
+ "loss": 0.2477,
385
+ "step": 1275
386
+ },
387
+ {
388
+ "epoch": 1.5695324283559577,
389
+ "grad_norm": 2.4258153438568115,
390
+ "learning_rate": 9.837535583570558e-06,
391
+ "loss": 0.2544,
392
+ "step": 1300
393
+ },
394
+ {
395
+ "epoch": 1.5996983408748116,
396
+ "grad_norm": 2.4069764614105225,
397
+ "learning_rate": 9.83245221634811e-06,
398
+ "loss": 0.2383,
399
+ "step": 1325
400
+ },
401
+ {
402
+ "epoch": 1.6298642533936651,
403
+ "grad_norm": 2.3214974403381348,
404
+ "learning_rate": 9.827368849125662e-06,
405
+ "loss": 0.2652,
406
+ "step": 1350
407
+ },
408
+ {
409
+ "epoch": 1.6600301659125187,
410
+ "grad_norm": 2.626494884490967,
411
+ "learning_rate": 9.822285481903213e-06,
412
+ "loss": 0.2501,
413
+ "step": 1375
414
+ },
415
+ {
416
+ "epoch": 1.6901960784313725,
417
+ "grad_norm": 2.589961290359497,
418
+ "learning_rate": 9.817202114680765e-06,
419
+ "loss": 0.2465,
420
+ "step": 1400
421
+ },
422
+ {
423
+ "epoch": 1.7203619909502263,
424
+ "grad_norm": 2.246035099029541,
425
+ "learning_rate": 9.812118747458317e-06,
426
+ "loss": 0.2399,
427
+ "step": 1425
428
+ },
429
+ {
430
+ "epoch": 1.75052790346908,
431
+ "grad_norm": 2.430635452270508,
432
+ "learning_rate": 9.807035380235868e-06,
433
+ "loss": 0.2421,
434
+ "step": 1450
435
+ },
436
+ {
437
+ "epoch": 1.7806938159879335,
438
+ "grad_norm": 2.1688945293426514,
439
+ "learning_rate": 9.80195201301342e-06,
440
+ "loss": 0.2433,
441
+ "step": 1475
442
+ },
443
+ {
444
+ "epoch": 1.8108597285067873,
445
+ "grad_norm": 2.5152928829193115,
446
+ "learning_rate": 9.796868645790972e-06,
447
+ "loss": 0.2481,
448
+ "step": 1500
449
+ },
450
+ {
451
+ "epoch": 1.8108597285067873,
452
+ "eval_loss": 0.2585034966468811,
453
+ "eval_runtime": 2401.9254,
454
+ "eval_samples_per_second": 1.38,
455
+ "eval_steps_per_second": 0.345,
456
+ "eval_wer": 0.23873042596129979,
457
+ "step": 1500
458
+ },
459
+ {
460
+ "epoch": 1.8410256410256411,
461
+ "grad_norm": 2.3025524616241455,
462
+ "learning_rate": 9.791785278568524e-06,
463
+ "loss": 0.2371,
464
+ "step": 1525
465
+ },
466
+ {
467
+ "epoch": 1.8711915535444947,
468
+ "grad_norm": 2.131627082824707,
469
+ "learning_rate": 9.786701911346075e-06,
470
+ "loss": 0.2294,
471
+ "step": 1550
472
+ },
473
+ {
474
+ "epoch": 1.9013574660633483,
475
+ "grad_norm": 2.1070120334625244,
476
+ "learning_rate": 9.781618544123627e-06,
477
+ "loss": 0.2593,
478
+ "step": 1575
479
+ },
480
+ {
481
+ "epoch": 1.9315233785822021,
482
+ "grad_norm": 2.209496021270752,
483
+ "learning_rate": 9.77653517690118e-06,
484
+ "loss": 0.2415,
485
+ "step": 1600
486
+ },
487
+ {
488
+ "epoch": 1.961689291101056,
489
+ "grad_norm": 2.3032071590423584,
490
+ "learning_rate": 9.771451809678732e-06,
491
+ "loss": 0.2419,
492
+ "step": 1625
493
+ },
494
+ {
495
+ "epoch": 1.9918552036199095,
496
+ "grad_norm": 2.4967117309570312,
497
+ "learning_rate": 9.766368442456284e-06,
498
+ "loss": 0.2474,
499
+ "step": 1650
500
+ },
501
+ {
502
+ "epoch": 2.0229260935143287,
503
+ "grad_norm": 2.2161521911621094,
504
+ "learning_rate": 9.761285075233836e-06,
505
+ "loss": 0.2136,
506
+ "step": 1675
507
+ },
508
+ {
509
+ "epoch": 2.0530920060331823,
510
+ "grad_norm": 2.219045400619507,
511
+ "learning_rate": 9.756201708011387e-06,
512
+ "loss": 0.1956,
513
+ "step": 1700
514
+ },
515
+ {
516
+ "epoch": 2.0832579185520363,
517
+ "grad_norm": 2.001629590988159,
518
+ "learning_rate": 9.75111834078894e-06,
519
+ "loss": 0.1886,
520
+ "step": 1725
521
+ },
522
+ {
523
+ "epoch": 2.11342383107089,
524
+ "grad_norm": 2.072310209274292,
525
+ "learning_rate": 9.746034973566492e-06,
526
+ "loss": 0.2012,
527
+ "step": 1750
528
+ },
529
+ {
530
+ "epoch": 2.1435897435897435,
531
+ "grad_norm": 2.7330238819122314,
532
+ "learning_rate": 9.740951606344044e-06,
533
+ "loss": 0.2154,
534
+ "step": 1775
535
+ },
536
+ {
537
+ "epoch": 2.173755656108597,
538
+ "grad_norm": 2.282186985015869,
539
+ "learning_rate": 9.735868239121596e-06,
540
+ "loss": 0.2028,
541
+ "step": 1800
542
+ },
543
+ {
544
+ "epoch": 2.203921568627451,
545
+ "grad_norm": 2.1114888191223145,
546
+ "learning_rate": 9.730784871899147e-06,
547
+ "loss": 0.2026,
548
+ "step": 1825
549
+ },
550
+ {
551
+ "epoch": 2.2340874811463047,
552
+ "grad_norm": 2.320906400680542,
553
+ "learning_rate": 9.725701504676699e-06,
554
+ "loss": 0.211,
555
+ "step": 1850
556
+ },
557
+ {
558
+ "epoch": 2.2642533936651583,
559
+ "grad_norm": 2.435915231704712,
560
+ "learning_rate": 9.72061813745425e-06,
561
+ "loss": 0.1983,
562
+ "step": 1875
563
+ },
564
+ {
565
+ "epoch": 2.294419306184012,
566
+ "grad_norm": 2.0894196033477783,
567
+ "learning_rate": 9.715534770231803e-06,
568
+ "loss": 0.1852,
569
+ "step": 1900
570
+ },
571
+ {
572
+ "epoch": 2.324585218702866,
573
+ "grad_norm": 1.7594900131225586,
574
+ "learning_rate": 9.710451403009354e-06,
575
+ "loss": 0.1864,
576
+ "step": 1925
577
+ },
578
+ {
579
+ "epoch": 2.3547511312217195,
580
+ "grad_norm": 2.0977182388305664,
581
+ "learning_rate": 9.705368035786906e-06,
582
+ "loss": 0.1942,
583
+ "step": 1950
584
+ },
585
+ {
586
+ "epoch": 2.384917043740573,
587
+ "grad_norm": 2.048466205596924,
588
+ "learning_rate": 9.700284668564458e-06,
589
+ "loss": 0.2083,
590
+ "step": 1975
591
+ },
592
+ {
593
+ "epoch": 2.4150829562594267,
594
+ "grad_norm": 2.275794506072998,
595
+ "learning_rate": 9.69520130134201e-06,
596
+ "loss": 0.1996,
597
+ "step": 2000
598
+ },
599
+ {
600
+ "epoch": 2.4150829562594267,
601
+ "eval_loss": 0.24703706800937653,
602
+ "eval_runtime": 2360.2803,
603
+ "eval_samples_per_second": 1.404,
604
+ "eval_steps_per_second": 0.351,
605
+ "eval_wer": 0.2232870355381444,
606
+ "step": 2000
607
+ },
608
+ {
609
+ "epoch": 2.4452488687782807,
610
+ "grad_norm": 2.1452808380126953,
611
+ "learning_rate": 9.690117934119561e-06,
612
+ "loss": 0.1945,
613
+ "step": 2025
614
+ },
615
+ {
616
+ "epoch": 2.4754147812971343,
617
+ "grad_norm": 2.2425966262817383,
618
+ "learning_rate": 9.685034566897113e-06,
619
+ "loss": 0.1885,
620
+ "step": 2050
621
+ },
622
+ {
623
+ "epoch": 2.505580693815988,
624
+ "grad_norm": 2.123422861099243,
625
+ "learning_rate": 9.679951199674666e-06,
626
+ "loss": 0.1817,
627
+ "step": 2075
628
+ },
629
+ {
630
+ "epoch": 2.5357466063348415,
631
+ "grad_norm": 1.747390866279602,
632
+ "learning_rate": 9.674867832452218e-06,
633
+ "loss": 0.1999,
634
+ "step": 2100
635
+ },
636
+ {
637
+ "epoch": 2.565912518853695,
638
+ "grad_norm": 1.9593695402145386,
639
+ "learning_rate": 9.66978446522977e-06,
640
+ "loss": 0.188,
641
+ "step": 2125
642
+ },
643
+ {
644
+ "epoch": 2.596078431372549,
645
+ "grad_norm": 2.0385262966156006,
646
+ "learning_rate": 9.664701098007321e-06,
647
+ "loss": 0.2004,
648
+ "step": 2150
649
+ },
650
+ {
651
+ "epoch": 2.6262443438914027,
652
+ "grad_norm": 1.9144082069396973,
653
+ "learning_rate": 9.659617730784873e-06,
654
+ "loss": 0.2062,
655
+ "step": 2175
656
+ },
657
+ {
658
+ "epoch": 2.6564102564102563,
659
+ "grad_norm": 2.052720308303833,
660
+ "learning_rate": 9.654534363562425e-06,
661
+ "loss": 0.1839,
662
+ "step": 2200
663
+ },
664
+ {
665
+ "epoch": 2.6865761689291103,
666
+ "grad_norm": 1.9951667785644531,
667
+ "learning_rate": 9.649450996339976e-06,
668
+ "loss": 0.1798,
669
+ "step": 2225
670
+ },
671
+ {
672
+ "epoch": 2.716742081447964,
673
+ "grad_norm": 2.5916664600372314,
674
+ "learning_rate": 9.644367629117528e-06,
675
+ "loss": 0.2058,
676
+ "step": 2250
677
+ },
678
+ {
679
+ "epoch": 2.7469079939668175,
680
+ "grad_norm": 2.3864409923553467,
681
+ "learning_rate": 9.63928426189508e-06,
682
+ "loss": 0.1993,
683
+ "step": 2275
684
+ },
685
+ {
686
+ "epoch": 2.777073906485671,
687
+ "grad_norm": 2.1544036865234375,
688
+ "learning_rate": 9.634200894672631e-06,
689
+ "loss": 0.1916,
690
+ "step": 2300
691
+ },
692
+ {
693
+ "epoch": 2.8072398190045247,
694
+ "grad_norm": 2.513857364654541,
695
+ "learning_rate": 9.629117527450183e-06,
696
+ "loss": 0.1979,
697
+ "step": 2325
698
+ },
699
+ {
700
+ "epoch": 2.8374057315233787,
701
+ "grad_norm": 2.702158212661743,
702
+ "learning_rate": 9.624034160227735e-06,
703
+ "loss": 0.1976,
704
+ "step": 2350
705
+ },
706
+ {
707
+ "epoch": 2.8675716440422323,
708
+ "grad_norm": 1.9018394947052002,
709
+ "learning_rate": 9.618950793005287e-06,
710
+ "loss": 0.2007,
711
+ "step": 2375
712
+ },
713
+ {
714
+ "epoch": 2.897737556561086,
715
+ "grad_norm": 2.1460797786712646,
716
+ "learning_rate": 9.613867425782838e-06,
717
+ "loss": 0.191,
718
+ "step": 2400
719
+ },
720
+ {
721
+ "epoch": 2.92790346907994,
722
+ "grad_norm": 2.0922977924346924,
723
+ "learning_rate": 9.608784058560392e-06,
724
+ "loss": 0.1956,
725
+ "step": 2425
726
+ },
727
+ {
728
+ "epoch": 2.9580693815987935,
729
+ "grad_norm": 2.323761463165283,
730
+ "learning_rate": 9.603700691337943e-06,
731
+ "loss": 0.1927,
732
+ "step": 2450
733
+ },
734
+ {
735
+ "epoch": 2.988235294117647,
736
+ "grad_norm": 2.0030786991119385,
737
+ "learning_rate": 9.598617324115495e-06,
738
+ "loss": 0.1923,
739
+ "step": 2475
740
+ },
741
+ {
742
+ "epoch": 3.0193061840120663,
743
+ "grad_norm": 1.9809656143188477,
744
+ "learning_rate": 9.593533956893047e-06,
745
+ "loss": 0.1669,
746
+ "step": 2500
747
+ },
748
+ {
749
+ "epoch": 3.0193061840120663,
750
+ "eval_loss": 0.241033673286438,
751
+ "eval_runtime": 2297.6931,
752
+ "eval_samples_per_second": 1.443,
753
+ "eval_steps_per_second": 0.361,
754
+ "eval_wer": 0.2156894486353482,
755
+ "step": 2500
756
+ },
757
+ {
758
+ "epoch": 3.04947209653092,
759
+ "grad_norm": 1.9472345113754272,
760
+ "learning_rate": 9.588450589670599e-06,
761
+ "loss": 0.1578,
762
+ "step": 2525
763
+ },
764
+ {
765
+ "epoch": 3.079638009049774,
766
+ "grad_norm": 2.215965509414673,
767
+ "learning_rate": 9.58336722244815e-06,
768
+ "loss": 0.1564,
769
+ "step": 2550
770
+ },
771
+ {
772
+ "epoch": 3.1098039215686275,
773
+ "grad_norm": 1.8719356060028076,
774
+ "learning_rate": 9.578283855225702e-06,
775
+ "loss": 0.1646,
776
+ "step": 2575
777
+ },
778
+ {
779
+ "epoch": 3.139969834087481,
780
+ "grad_norm": 1.7623282670974731,
781
+ "learning_rate": 9.573200488003254e-06,
782
+ "loss": 0.149,
783
+ "step": 2600
784
+ },
785
+ {
786
+ "epoch": 3.1701357466063347,
787
+ "grad_norm": 1.8403270244598389,
788
+ "learning_rate": 9.568117120780805e-06,
789
+ "loss": 0.1543,
790
+ "step": 2625
791
+ },
792
+ {
793
+ "epoch": 3.2003016591251887,
794
+ "grad_norm": 2.092792510986328,
795
+ "learning_rate": 9.563033753558359e-06,
796
+ "loss": 0.1626,
797
+ "step": 2650
798
+ },
799
+ {
800
+ "epoch": 3.2304675716440423,
801
+ "grad_norm": 1.7507753372192383,
802
+ "learning_rate": 9.55795038633591e-06,
803
+ "loss": 0.1516,
804
+ "step": 2675
805
+ },
806
+ {
807
+ "epoch": 3.260633484162896,
808
+ "grad_norm": 1.7713559865951538,
809
+ "learning_rate": 9.552867019113462e-06,
810
+ "loss": 0.1565,
811
+ "step": 2700
812
+ },
813
+ {
814
+ "epoch": 3.2907993966817495,
815
+ "grad_norm": 2.3640658855438232,
816
+ "learning_rate": 9.547783651891014e-06,
817
+ "loss": 0.1583,
818
+ "step": 2725
819
+ },
820
+ {
821
+ "epoch": 3.3209653092006035,
822
+ "grad_norm": 1.9633855819702148,
823
+ "learning_rate": 9.542700284668566e-06,
824
+ "loss": 0.1612,
825
+ "step": 2750
826
+ },
827
+ {
828
+ "epoch": 3.351131221719457,
829
+ "grad_norm": 1.8450348377227783,
830
+ "learning_rate": 9.537616917446117e-06,
831
+ "loss": 0.1573,
832
+ "step": 2775
833
+ },
834
+ {
835
+ "epoch": 3.3812971342383107,
836
+ "grad_norm": 1.913599967956543,
837
+ "learning_rate": 9.532533550223669e-06,
838
+ "loss": 0.1623,
839
+ "step": 2800
840
+ },
841
+ {
842
+ "epoch": 3.4114630467571643,
843
+ "grad_norm": 2.1844587326049805,
844
+ "learning_rate": 9.52745018300122e-06,
845
+ "loss": 0.1562,
846
+ "step": 2825
847
+ },
848
+ {
849
+ "epoch": 3.4416289592760183,
850
+ "grad_norm": 1.827317237854004,
851
+ "learning_rate": 9.522366815778772e-06,
852
+ "loss": 0.1598,
853
+ "step": 2850
854
+ },
855
+ {
856
+ "epoch": 3.471794871794872,
857
+ "grad_norm": 1.7931983470916748,
858
+ "learning_rate": 9.517283448556324e-06,
859
+ "loss": 0.1557,
860
+ "step": 2875
861
+ },
862
+ {
863
+ "epoch": 3.5019607843137255,
864
+ "grad_norm": 1.85126793384552,
865
+ "learning_rate": 9.512200081333877e-06,
866
+ "loss": 0.1588,
867
+ "step": 2900
868
+ },
869
+ {
870
+ "epoch": 3.532126696832579,
871
+ "grad_norm": 2.0126168727874756,
872
+ "learning_rate": 9.50711671411143e-06,
873
+ "loss": 0.161,
874
+ "step": 2925
875
+ },
876
+ {
877
+ "epoch": 3.5622926093514327,
878
+ "grad_norm": 1.982439637184143,
879
+ "learning_rate": 9.502033346888981e-06,
880
+ "loss": 0.1477,
881
+ "step": 2950
882
+ },
883
+ {
884
+ "epoch": 3.5924585218702867,
885
+ "grad_norm": 2.315864086151123,
886
+ "learning_rate": 9.496949979666533e-06,
887
+ "loss": 0.157,
888
+ "step": 2975
889
+ },
890
+ {
891
+ "epoch": 3.6226244343891403,
892
+ "grad_norm": 1.939032793045044,
893
+ "learning_rate": 9.491866612444084e-06,
894
+ "loss": 0.1535,
895
+ "step": 3000
896
+ },
897
+ {
898
+ "epoch": 3.6226244343891403,
899
+ "eval_loss": 0.23916485905647278,
900
+ "eval_runtime": 2282.7909,
901
+ "eval_samples_per_second": 1.452,
902
+ "eval_steps_per_second": 0.363,
903
+ "eval_wer": 0.21356341934578732,
904
+ "step": 3000
905
+ },
906
+ {
907
+ "epoch": 3.652790346907994,
908
+ "grad_norm": 1.8340437412261963,
909
+ "learning_rate": 9.486783245221636e-06,
910
+ "loss": 0.1555,
911
+ "step": 3025
912
+ },
913
+ {
914
+ "epoch": 3.682956259426848,
915
+ "grad_norm": 2.115091562271118,
916
+ "learning_rate": 9.481699877999188e-06,
917
+ "loss": 0.1589,
918
+ "step": 3050
919
+ },
920
+ {
921
+ "epoch": 3.7131221719457015,
922
+ "grad_norm": 2.074758768081665,
923
+ "learning_rate": 9.47661651077674e-06,
924
+ "loss": 0.1506,
925
+ "step": 3075
926
+ },
927
+ {
928
+ "epoch": 3.743288084464555,
929
+ "grad_norm": 2.0023365020751953,
930
+ "learning_rate": 9.471533143554291e-06,
931
+ "loss": 0.1537,
932
+ "step": 3100
933
+ },
934
+ {
935
+ "epoch": 3.7734539969834087,
936
+ "grad_norm": 1.883928656578064,
937
+ "learning_rate": 9.466449776331843e-06,
938
+ "loss": 0.1598,
939
+ "step": 3125
940
+ },
941
+ {
942
+ "epoch": 3.8036199095022623,
943
+ "grad_norm": 2.637164354324341,
944
+ "learning_rate": 9.461366409109394e-06,
945
+ "loss": 0.1725,
946
+ "step": 3150
947
+ },
948
+ {
949
+ "epoch": 3.8337858220211163,
950
+ "grad_norm": 2.0999221801757812,
951
+ "learning_rate": 9.456283041886946e-06,
952
+ "loss": 0.1591,
953
+ "step": 3175
954
+ },
955
+ {
956
+ "epoch": 3.86395173453997,
957
+ "grad_norm": 2.033339500427246,
958
+ "learning_rate": 9.451199674664498e-06,
959
+ "loss": 0.1622,
960
+ "step": 3200
961
+ },
962
+ {
963
+ "epoch": 3.8941176470588235,
964
+ "grad_norm": 2.1740732192993164,
965
+ "learning_rate": 9.44611630744205e-06,
966
+ "loss": 0.1606,
967
+ "step": 3225
968
+ },
969
+ {
970
+ "epoch": 3.9242835595776775,
971
+ "grad_norm": 2.3770601749420166,
972
+ "learning_rate": 9.441032940219603e-06,
973
+ "loss": 0.1593,
974
+ "step": 3250
975
+ },
976
+ {
977
+ "epoch": 3.954449472096531,
978
+ "grad_norm": 1.9333163499832153,
979
+ "learning_rate": 9.435949572997155e-06,
980
+ "loss": 0.1579,
981
+ "step": 3275
982
+ },
983
+ {
984
+ "epoch": 3.9846153846153847,
985
+ "grad_norm": 2.231935501098633,
986
+ "learning_rate": 9.430866205774706e-06,
987
+ "loss": 0.165,
988
+ "step": 3300
989
+ },
990
+ {
991
+ "epoch": 4.015686274509804,
992
+ "grad_norm": 1.8732832670211792,
993
+ "learning_rate": 9.425782838552258e-06,
994
+ "loss": 0.1401,
995
+ "step": 3325
996
+ },
997
+ {
998
+ "epoch": 4.0458521870286575,
999
+ "grad_norm": 1.96370530128479,
1000
+ "learning_rate": 9.42069947132981e-06,
1001
+ "loss": 0.1241,
1002
+ "step": 3350
1003
+ },
1004
+ {
1005
+ "epoch": 4.076018099547511,
1006
+ "grad_norm": 2.0420496463775635,
1007
+ "learning_rate": 9.415616104107362e-06,
1008
+ "loss": 0.1198,
1009
+ "step": 3375
1010
+ },
1011
+ {
1012
+ "epoch": 4.106184012066365,
1013
+ "grad_norm": 1.689680576324463,
1014
+ "learning_rate": 9.410532736884913e-06,
1015
+ "loss": 0.1225,
1016
+ "step": 3400
1017
+ },
1018
+ {
1019
+ "epoch": 4.136349924585219,
1020
+ "grad_norm": 2.194132089614868,
1021
+ "learning_rate": 9.405449369662465e-06,
1022
+ "loss": 0.1333,
1023
+ "step": 3425
1024
+ },
1025
+ {
1026
+ "epoch": 4.166515837104073,
1027
+ "grad_norm": 1.8354153633117676,
1028
+ "learning_rate": 9.400366002440017e-06,
1029
+ "loss": 0.119,
1030
+ "step": 3450
1031
+ },
1032
+ {
1033
+ "epoch": 4.196681749622926,
1034
+ "grad_norm": 1.5466539859771729,
1035
+ "learning_rate": 9.395282635217568e-06,
1036
+ "loss": 0.1245,
1037
+ "step": 3475
1038
+ },
1039
+ {
1040
+ "epoch": 4.22684766214178,
1041
+ "grad_norm": 2.1299631595611572,
1042
+ "learning_rate": 9.39019926799512e-06,
1043
+ "loss": 0.1272,
1044
+ "step": 3500
1045
+ },
1046
+ {
1047
+ "epoch": 4.22684766214178,
1048
+ "eval_loss": 0.24591179192066193,
1049
+ "eval_runtime": 2249.976,
1050
+ "eval_samples_per_second": 1.473,
1051
+ "eval_steps_per_second": 0.368,
1052
+ "eval_wer": 0.21497717486321105,
1053
+ "step": 3500
1054
+ },
1055
+ {
1056
+ "epoch": 4.2570135746606335,
1057
+ "grad_norm": 2.316577911376953,
1058
+ "learning_rate": 9.385115900772672e-06,
1059
+ "loss": 0.1274,
1060
+ "step": 3525
1061
+ },
1062
+ {
1063
+ "epoch": 4.287179487179487,
1064
+ "grad_norm": 1.9104264974594116,
1065
+ "learning_rate": 9.380032533550223e-06,
1066
+ "loss": 0.1231,
1067
+ "step": 3550
1068
+ },
1069
+ {
1070
+ "epoch": 4.317345399698341,
1071
+ "grad_norm": 2.457646369934082,
1072
+ "learning_rate": 9.374949166327775e-06,
1073
+ "loss": 0.1342,
1074
+ "step": 3575
1075
+ },
1076
+ {
1077
+ "epoch": 4.347511312217194,
1078
+ "grad_norm": 2.089953660964966,
1079
+ "learning_rate": 9.369865799105327e-06,
1080
+ "loss": 0.1186,
1081
+ "step": 3600
1082
+ },
1083
+ {
1084
+ "epoch": 4.377677224736049,
1085
+ "grad_norm": 2.036520004272461,
1086
+ "learning_rate": 9.36478243188288e-06,
1087
+ "loss": 0.1309,
1088
+ "step": 3625
1089
+ },
1090
+ {
1091
+ "epoch": 4.407843137254902,
1092
+ "grad_norm": 1.8452465534210205,
1093
+ "learning_rate": 9.359699064660432e-06,
1094
+ "loss": 0.1337,
1095
+ "step": 3650
1096
+ },
1097
+ {
1098
+ "epoch": 4.438009049773756,
1099
+ "grad_norm": 2.151616096496582,
1100
+ "learning_rate": 9.354615697437984e-06,
1101
+ "loss": 0.1238,
1102
+ "step": 3675
1103
+ },
1104
+ {
1105
+ "epoch": 4.4681749622926095,
1106
+ "grad_norm": 2.0973825454711914,
1107
+ "learning_rate": 9.349532330215535e-06,
1108
+ "loss": 0.1329,
1109
+ "step": 3700
1110
+ },
1111
+ {
1112
+ "epoch": 4.498340874811463,
1113
+ "grad_norm": 2.1247801780700684,
1114
+ "learning_rate": 9.344448962993089e-06,
1115
+ "loss": 0.1289,
1116
+ "step": 3725
1117
+ },
1118
+ {
1119
+ "epoch": 4.528506787330317,
1120
+ "grad_norm": 2.375617027282715,
1121
+ "learning_rate": 9.33936559577064e-06,
1122
+ "loss": 0.1306,
1123
+ "step": 3750
1124
+ },
1125
+ {
1126
+ "epoch": 4.55867269984917,
1127
+ "grad_norm": 1.6577335596084595,
1128
+ "learning_rate": 9.334282228548192e-06,
1129
+ "loss": 0.1225,
1130
+ "step": 3775
1131
+ },
1132
+ {
1133
+ "epoch": 4.588838612368024,
1134
+ "grad_norm": 1.7088857889175415,
1135
+ "learning_rate": 9.329198861325744e-06,
1136
+ "loss": 0.1199,
1137
+ "step": 3800
1138
+ },
1139
+ {
1140
+ "epoch": 4.619004524886877,
1141
+ "grad_norm": 2.1655218601226807,
1142
+ "learning_rate": 9.324115494103296e-06,
1143
+ "loss": 0.1214,
1144
+ "step": 3825
1145
+ },
1146
+ {
1147
+ "epoch": 4.649170437405732,
1148
+ "grad_norm": 1.614488959312439,
1149
+ "learning_rate": 9.319032126880847e-06,
1150
+ "loss": 0.1129,
1151
+ "step": 3850
1152
+ },
1153
+ {
1154
+ "epoch": 4.6793363499245855,
1155
+ "grad_norm": 2.1687254905700684,
1156
+ "learning_rate": 9.313948759658399e-06,
1157
+ "loss": 0.1329,
1158
+ "step": 3875
1159
+ },
1160
+ {
1161
+ "epoch": 4.709502262443439,
1162
+ "grad_norm": 1.8908534049987793,
1163
+ "learning_rate": 9.30886539243595e-06,
1164
+ "loss": 0.1273,
1165
+ "step": 3900
1166
+ },
1167
+ {
1168
+ "epoch": 4.739668174962293,
1169
+ "grad_norm": 1.9584071636199951,
1170
+ "learning_rate": 9.303782025213502e-06,
1171
+ "loss": 0.1234,
1172
+ "step": 3925
1173
+ },
1174
+ {
1175
+ "epoch": 4.769834087481146,
1176
+ "grad_norm": 1.8992727994918823,
1177
+ "learning_rate": 9.298698657991054e-06,
1178
+ "loss": 0.1202,
1179
+ "step": 3950
1180
+ },
1181
+ {
1182
+ "epoch": 4.8,
1183
+ "grad_norm": 2.2295639514923096,
1184
+ "learning_rate": 9.293615290768606e-06,
1185
+ "loss": 0.12,
1186
+ "step": 3975
1187
+ },
1188
+ {
1189
+ "epoch": 4.830165912518853,
1190
+ "grad_norm": 1.7892181873321533,
1191
+ "learning_rate": 9.288531923546157e-06,
1192
+ "loss": 0.1226,
1193
+ "step": 4000
1194
+ },
1195
+ {
1196
+ "epoch": 4.830165912518853,
1197
+ "eval_loss": 0.24282999336719513,
1198
+ "eval_runtime": 2246.6955,
1199
+ "eval_samples_per_second": 1.476,
1200
+ "eval_steps_per_second": 0.369,
1201
+ "eval_wer": 0.21186907113024897,
1202
+ "step": 4000
1203
+ },
1204
+ {
1205
+ "epoch": 4.860331825037708,
1206
+ "grad_norm": 1.6192424297332764,
1207
+ "learning_rate": 9.28344855632371e-06,
1208
+ "loss": 0.1275,
1209
+ "step": 4025
1210
+ },
1211
+ {
1212
+ "epoch": 4.8904977375565615,
1213
+ "grad_norm": 2.2194361686706543,
1214
+ "learning_rate": 9.278365189101261e-06,
1215
+ "loss": 0.1344,
1216
+ "step": 4050
1217
+ },
1218
+ {
1219
+ "epoch": 4.920663650075415,
1220
+ "grad_norm": 1.8603276014328003,
1221
+ "learning_rate": 9.273281821878813e-06,
1222
+ "loss": 0.1271,
1223
+ "step": 4075
1224
+ },
1225
+ {
1226
+ "epoch": 4.950829562594269,
1227
+ "grad_norm": 2.0720746517181396,
1228
+ "learning_rate": 9.268198454656366e-06,
1229
+ "loss": 0.1348,
1230
+ "step": 4100
1231
+ },
1232
+ {
1233
+ "epoch": 4.980995475113122,
1234
+ "grad_norm": 2.0119757652282715,
1235
+ "learning_rate": 9.263115087433918e-06,
1236
+ "loss": 0.1246,
1237
+ "step": 4125
1238
+ },
1239
+ {
1240
+ "epoch": 5.012066365007541,
1241
+ "grad_norm": 1.3096059560775757,
1242
+ "learning_rate": 9.25803172021147e-06,
1243
+ "loss": 0.1155,
1244
+ "step": 4150
1245
+ },
1246
+ {
1247
+ "epoch": 5.042232277526395,
1248
+ "grad_norm": 1.489650011062622,
1249
+ "learning_rate": 9.252948352989021e-06,
1250
+ "loss": 0.0973,
1251
+ "step": 4175
1252
+ },
1253
+ {
1254
+ "epoch": 5.072398190045249,
1255
+ "grad_norm": 2.003122568130493,
1256
+ "learning_rate": 9.247864985766573e-06,
1257
+ "loss": 0.0949,
1258
+ "step": 4200
1259
+ },
1260
+ {
1261
+ "epoch": 5.102564102564102,
1262
+ "grad_norm": 1.655964970588684,
1263
+ "learning_rate": 9.242781618544125e-06,
1264
+ "loss": 0.0907,
1265
+ "step": 4225
1266
+ },
1267
+ {
1268
+ "epoch": 5.132730015082957,
1269
+ "grad_norm": 2.134763717651367,
1270
+ "learning_rate": 9.237698251321676e-06,
1271
+ "loss": 0.0955,
1272
+ "step": 4250
1273
+ },
1274
+ {
1275
+ "epoch": 5.16289592760181,
1276
+ "grad_norm": 2.039686918258667,
1277
+ "learning_rate": 9.232614884099228e-06,
1278
+ "loss": 0.0959,
1279
+ "step": 4275
1280
+ },
1281
+ {
1282
+ "epoch": 5.193061840120664,
1283
+ "grad_norm": 2.1623027324676514,
1284
+ "learning_rate": 9.22753151687678e-06,
1285
+ "loss": 0.0971,
1286
+ "step": 4300
1287
+ },
1288
+ {
1289
+ "epoch": 5.223227752639517,
1290
+ "grad_norm": 2.3452537059783936,
1291
+ "learning_rate": 9.222448149654331e-06,
1292
+ "loss": 0.0882,
1293
+ "step": 4325
1294
+ },
1295
+ {
1296
+ "epoch": 5.253393665158371,
1297
+ "grad_norm": 1.7960082292556763,
1298
+ "learning_rate": 9.217364782431883e-06,
1299
+ "loss": 0.0936,
1300
+ "step": 4350
1301
+ },
1302
+ {
1303
+ "epoch": 5.283559577677225,
1304
+ "grad_norm": 1.9322994947433472,
1305
+ "learning_rate": 9.212281415209435e-06,
1306
+ "loss": 0.099,
1307
+ "step": 4375
1308
+ },
1309
+ {
1310
+ "epoch": 5.313725490196078,
1311
+ "grad_norm": 2.040149688720703,
1312
+ "learning_rate": 9.207198047986986e-06,
1313
+ "loss": 0.0974,
1314
+ "step": 4400
1315
+ },
1316
+ {
1317
+ "epoch": 5.343891402714932,
1318
+ "grad_norm": 2.1404929161071777,
1319
+ "learning_rate": 9.202114680764538e-06,
1320
+ "loss": 0.0886,
1321
+ "step": 4425
1322
+ },
1323
+ {
1324
+ "epoch": 5.374057315233786,
1325
+ "grad_norm": 1.9556715488433838,
1326
+ "learning_rate": 9.197031313542092e-06,
1327
+ "loss": 0.1003,
1328
+ "step": 4450
1329
+ },
1330
+ {
1331
+ "epoch": 5.40422322775264,
1332
+ "grad_norm": 2.070523500442505,
1333
+ "learning_rate": 9.191947946319643e-06,
1334
+ "loss": 0.0972,
1335
+ "step": 4475
1336
+ },
1337
+ {
1338
+ "epoch": 5.4343891402714934,
1339
+ "grad_norm": 1.8254801034927368,
1340
+ "learning_rate": 9.186864579097195e-06,
1341
+ "loss": 0.0939,
1342
+ "step": 4500
1343
+ },
1344
+ {
1345
+ "epoch": 5.4343891402714934,
1346
+ "eval_loss": 0.2541460692882538,
1347
+ "eval_runtime": 2247.1988,
1348
+ "eval_samples_per_second": 1.475,
1349
+ "eval_steps_per_second": 0.369,
1350
+ "eval_wer": 0.2141785648762694,
1351
+ "step": 4500
1352
+ },
1353
+ {
1354
+ "epoch": 5.464555052790347,
1355
+ "grad_norm": 2.025676965713501,
1356
+ "learning_rate": 9.181781211874747e-06,
1357
+ "loss": 0.0941,
1358
+ "step": 4525
1359
+ },
1360
+ {
1361
+ "epoch": 5.494720965309201,
1362
+ "grad_norm": 2.0886032581329346,
1363
+ "learning_rate": 9.176697844652298e-06,
1364
+ "loss": 0.0976,
1365
+ "step": 4550
1366
+ },
1367
+ {
1368
+ "epoch": 5.524886877828054,
1369
+ "grad_norm": 2.048823118209839,
1370
+ "learning_rate": 9.17161447742985e-06,
1371
+ "loss": 0.1008,
1372
+ "step": 4575
1373
+ },
1374
+ {
1375
+ "epoch": 5.555052790346908,
1376
+ "grad_norm": 2.661174774169922,
1377
+ "learning_rate": 9.166531110207402e-06,
1378
+ "loss": 0.0971,
1379
+ "step": 4600
1380
+ },
1381
+ {
1382
+ "epoch": 5.585218702865761,
1383
+ "grad_norm": 2.3181405067443848,
1384
+ "learning_rate": 9.161447742984953e-06,
1385
+ "loss": 0.1005,
1386
+ "step": 4625
1387
+ },
1388
+ {
1389
+ "epoch": 5.615384615384615,
1390
+ "grad_norm": 1.9508439302444458,
1391
+ "learning_rate": 9.156364375762505e-06,
1392
+ "loss": 0.098,
1393
+ "step": 4650
1394
+ },
1395
+ {
1396
+ "epoch": 5.6455505279034695,
1397
+ "grad_norm": 2.297891139984131,
1398
+ "learning_rate": 9.151281008540057e-06,
1399
+ "loss": 0.0994,
1400
+ "step": 4675
1401
+ },
1402
+ {
1403
+ "epoch": 5.675716440422323,
1404
+ "grad_norm": 1.9096143245697021,
1405
+ "learning_rate": 9.146197641317609e-06,
1406
+ "loss": 0.1021,
1407
+ "step": 4700
1408
+ },
1409
+ {
1410
+ "epoch": 5.705882352941177,
1411
+ "grad_norm": 2.3850667476654053,
1412
+ "learning_rate": 9.14111427409516e-06,
1413
+ "loss": 0.0988,
1414
+ "step": 4725
1415
+ },
1416
+ {
1417
+ "epoch": 5.73604826546003,
1418
+ "grad_norm": 1.954728126525879,
1419
+ "learning_rate": 9.136030906872714e-06,
1420
+ "loss": 0.1021,
1421
+ "step": 4750
1422
+ },
1423
+ {
1424
+ "epoch": 5.766214177978884,
1425
+ "grad_norm": 2.3807568550109863,
1426
+ "learning_rate": 9.130947539650265e-06,
1427
+ "loss": 0.0891,
1428
+ "step": 4775
1429
+ },
1430
+ {
1431
+ "epoch": 5.796380090497737,
1432
+ "grad_norm": 2.0435431003570557,
1433
+ "learning_rate": 9.125864172427817e-06,
1434
+ "loss": 0.1055,
1435
+ "step": 4800
1436
+ },
1437
+ {
1438
+ "epoch": 5.826546003016591,
1439
+ "grad_norm": 2.7044458389282227,
1440
+ "learning_rate": 9.120780805205369e-06,
1441
+ "loss": 0.0979,
1442
+ "step": 4825
1443
+ },
1444
+ {
1445
+ "epoch": 5.856711915535445,
1446
+ "grad_norm": 1.9672693014144897,
1447
+ "learning_rate": 9.11569743798292e-06,
1448
+ "loss": 0.1003,
1449
+ "step": 4850
1450
+ },
1451
+ {
1452
+ "epoch": 5.886877828054299,
1453
+ "grad_norm": 1.9601274728775024,
1454
+ "learning_rate": 9.110614070760472e-06,
1455
+ "loss": 0.0987,
1456
+ "step": 4875
1457
+ },
1458
+ {
1459
+ "epoch": 5.917043740573153,
1460
+ "grad_norm": 2.0287139415740967,
1461
+ "learning_rate": 9.105530703538024e-06,
1462
+ "loss": 0.0963,
1463
+ "step": 4900
1464
+ },
1465
+ {
1466
+ "epoch": 5.947209653092006,
1467
+ "grad_norm": 1.9941165447235107,
1468
+ "learning_rate": 9.100447336315577e-06,
1469
+ "loss": 0.1005,
1470
+ "step": 4925
1471
+ },
1472
+ {
1473
+ "epoch": 5.97737556561086,
1474
+ "grad_norm": 1.8804293870925903,
1475
+ "learning_rate": 9.095363969093129e-06,
1476
+ "loss": 0.1101,
1477
+ "step": 4950
1478
+ },
1479
+ {
1480
+ "epoch": 6.008446455505279,
1481
+ "grad_norm": 1.3988499641418457,
1482
+ "learning_rate": 9.09028060187068e-06,
1483
+ "loss": 0.0912,
1484
+ "step": 4975
1485
+ },
1486
+ {
1487
+ "epoch": 6.038612368024133,
1488
+ "grad_norm": 1.9083130359649658,
1489
+ "learning_rate": 9.085197234648232e-06,
1490
+ "loss": 0.0665,
1491
+ "step": 5000
1492
+ },
1493
+ {
1494
+ "epoch": 6.038612368024133,
1495
+ "eval_loss": 0.2640438675880432,
1496
+ "eval_runtime": 2326.5234,
1497
+ "eval_samples_per_second": 1.425,
1498
+ "eval_steps_per_second": 0.356,
1499
+ "eval_wer": 0.21559232039369314,
1500
+ "step": 5000
1501
+ },
1502
+ {
1503
+ "epoch": 6.068778280542986,
1504
+ "grad_norm": 1.7908351421356201,
1505
+ "learning_rate": 9.080113867425784e-06,
1506
+ "loss": 0.068,
1507
+ "step": 5025
1508
+ },
1509
+ {
1510
+ "epoch": 6.09894419306184,
1511
+ "grad_norm": 2.0591347217559814,
1512
+ "learning_rate": 9.075030500203336e-06,
1513
+ "loss": 0.0761,
1514
+ "step": 5050
1515
+ },
1516
+ {
1517
+ "epoch": 6.129110105580694,
1518
+ "grad_norm": 2.0180952548980713,
1519
+ "learning_rate": 9.069947132980888e-06,
1520
+ "loss": 0.0651,
1521
+ "step": 5075
1522
+ },
1523
+ {
1524
+ "epoch": 6.159276018099548,
1525
+ "grad_norm": 1.842887282371521,
1526
+ "learning_rate": 9.06486376575844e-06,
1527
+ "loss": 0.0712,
1528
+ "step": 5100
1529
+ },
1530
+ {
1531
+ "epoch": 6.189441930618401,
1532
+ "grad_norm": 1.8542691469192505,
1533
+ "learning_rate": 9.059780398535991e-06,
1534
+ "loss": 0.0685,
1535
+ "step": 5125
1536
+ },
1537
+ {
1538
+ "epoch": 6.219607843137255,
1539
+ "grad_norm": 1.998128890991211,
1540
+ "learning_rate": 9.054697031313543e-06,
1541
+ "loss": 0.0663,
1542
+ "step": 5150
1543
+ },
1544
+ {
1545
+ "epoch": 6.249773755656109,
1546
+ "grad_norm": 2.074847459793091,
1547
+ "learning_rate": 9.049613664091094e-06,
1548
+ "loss": 0.0691,
1549
+ "step": 5175
1550
+ },
1551
+ {
1552
+ "epoch": 6.279939668174962,
1553
+ "grad_norm": 2.1815929412841797,
1554
+ "learning_rate": 9.044530296868646e-06,
1555
+ "loss": 0.0646,
1556
+ "step": 5200
1557
+ },
1558
+ {
1559
+ "epoch": 6.310105580693816,
1560
+ "grad_norm": 1.8308629989624023,
1561
+ "learning_rate": 9.039446929646198e-06,
1562
+ "loss": 0.079,
1563
+ "step": 5225
1564
+ },
1565
+ {
1566
+ "epoch": 6.340271493212669,
1567
+ "grad_norm": 1.6758077144622803,
1568
+ "learning_rate": 9.03436356242375e-06,
1569
+ "loss": 0.0747,
1570
+ "step": 5250
1571
+ },
1572
+ {
1573
+ "epoch": 6.370437405731524,
1574
+ "grad_norm": 1.7715494632720947,
1575
+ "learning_rate": 9.029280195201303e-06,
1576
+ "loss": 0.0699,
1577
+ "step": 5275
1578
+ },
1579
+ {
1580
+ "epoch": 6.400603318250377,
1581
+ "grad_norm": 1.9695020914077759,
1582
+ "learning_rate": 9.024196827978855e-06,
1583
+ "loss": 0.0706,
1584
+ "step": 5300
1585
+ },
1586
+ {
1587
+ "epoch": 6.430769230769231,
1588
+ "grad_norm": 2.174950122833252,
1589
+ "learning_rate": 9.019113460756406e-06,
1590
+ "loss": 0.0735,
1591
+ "step": 5325
1592
+ },
1593
+ {
1594
+ "epoch": 6.460935143288085,
1595
+ "grad_norm": 1.3423354625701904,
1596
+ "learning_rate": 9.014030093533958e-06,
1597
+ "loss": 0.0687,
1598
+ "step": 5350
1599
+ },
1600
+ {
1601
+ "epoch": 6.491101055806938,
1602
+ "grad_norm": 1.8988721370697021,
1603
+ "learning_rate": 9.00894672631151e-06,
1604
+ "loss": 0.0684,
1605
+ "step": 5375
1606
+ },
1607
+ {
1608
+ "epoch": 6.521266968325792,
1609
+ "grad_norm": 2.1522958278656006,
1610
+ "learning_rate": 9.003863359089061e-06,
1611
+ "loss": 0.0703,
1612
+ "step": 5400
1613
+ },
1614
+ {
1615
+ "epoch": 6.551432880844645,
1616
+ "grad_norm": 2.2434656620025635,
1617
+ "learning_rate": 8.998779991866613e-06,
1618
+ "loss": 0.0725,
1619
+ "step": 5425
1620
+ },
1621
+ {
1622
+ "epoch": 6.581598793363499,
1623
+ "grad_norm": 1.9670661687850952,
1624
+ "learning_rate": 8.993696624644165e-06,
1625
+ "loss": 0.073,
1626
+ "step": 5450
1627
+ },
1628
+ {
1629
+ "epoch": 6.6117647058823525,
1630
+ "grad_norm": 1.9538111686706543,
1631
+ "learning_rate": 8.988613257421716e-06,
1632
+ "loss": 0.0755,
1633
+ "step": 5475
1634
+ },
1635
+ {
1636
+ "epoch": 6.641930618401207,
1637
+ "grad_norm": 1.7768446207046509,
1638
+ "learning_rate": 8.983529890199268e-06,
1639
+ "loss": 0.0717,
1640
+ "step": 5500
1641
+ },
1642
+ {
1643
+ "epoch": 6.641930618401207,
1644
+ "eval_loss": 0.27195391058921814,
1645
+ "eval_runtime": 2447.6492,
1646
+ "eval_samples_per_second": 1.354,
1647
+ "eval_steps_per_second": 0.339,
1648
+ "eval_wer": 0.2185061676433451,
1649
+ "step": 5500
1650
+ },
1651
+ {
1652
+ "epoch": 6.641930618401207,
1653
+ "step": 5500,
1654
+ "total_flos": 5.08256607043584e+19,
1655
+ "train_loss": 0.25465128779411317,
1656
+ "train_runtime": 113391.8356,
1657
+ "train_samples_per_second": 14.031,
1658
+ "train_steps_per_second": 0.438
1659
+ }
1660
+ ],
1661
+ "logging_steps": 25,
1662
+ "max_steps": 49680,
1663
+ "num_input_tokens_seen": 0,
1664
+ "num_train_epochs": 60,
1665
+ "save_steps": 500,
1666
+ "stateful_callbacks": {
1667
+ "EarlyStoppingCallback": {
1668
+ "args": {
1669
+ "early_stopping_patience": 5,
1670
+ "early_stopping_threshold": 0.0
1671
+ },
1672
+ "attributes": {
1673
+ "early_stopping_patience_counter": 5
1674
+ }
1675
+ },
1676
+ "TrainerControl": {
1677
+ "args": {
1678
+ "should_epoch_stop": false,
1679
+ "should_evaluate": false,
1680
+ "should_log": false,
1681
+ "should_save": true,
1682
+ "should_training_stop": true
1683
+ },
1684
+ "attributes": {}
1685
+ }
1686
+ },
1687
+ "total_flos": 5.08256607043584e+19,
1688
+ "train_batch_size": 4,
1689
+ "trial_name": null,
1690
+ "trial_params": null
1691
+ }