kavanmevada commited on
Commit
0ff0482
·
verified ·
1 Parent(s): 0f34969

Training in progress, step 1600, checkpoint

Browse files
last-checkpoint/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:12dc5f8412845a83883a6fa6cc59850584c80baf1d6c400d03d3f2f7cd1baa8c
3
  size 936544523
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:170ec0688a7be1f7f2a5aa23ea0e5a60abb655c988cf1f0505588635b428e5aa
3
  size 936544523
last-checkpoint/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:95af9069615595c1d805bf41e41dc5596819245ab57943f15ea075e07c7e657d
3
  size 1465
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c18ab5d0441c18232854c58bbfbc8248c2a6b3ba6d26d20fbc7784a942883b62
3
  size 1465
last-checkpoint/trainer_state.json CHANGED
@@ -2,9 +2,9 @@
2
  "best_global_step": null,
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
- "epoch": 0.005690166015038931,
6
  "eval_steps": 500,
7
- "global_step": 1280,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
@@ -8968,6 +8968,2246 @@
8968
  "learning_rate": 2.8428855621867327e-07,
8969
  "loss": 3.2265,
8970
  "step": 1280
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8971
  }
8972
  ],
8973
  "logging_steps": 1,
@@ -8987,7 +11227,7 @@
8987
  "attributes": {}
8988
  }
8989
  },
8990
- "total_flos": 5.06718094098432e+16,
8991
  "train_batch_size": 1,
8992
  "trial_name": null,
8993
  "trial_params": null
 
2
  "best_global_step": null,
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
+ "epoch": 0.007112707518798664,
6
  "eval_steps": 500,
7
+ "global_step": 1600,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
 
8968
  "learning_rate": 2.8428855621867327e-07,
8969
  "loss": 3.2265,
8970
  "step": 1280
8971
+ },
8972
+ {
8973
+ "epoch": 0.00569461145723818,
8974
+ "grad_norm": 9.125,
8975
+ "learning_rate": 2.8451083030484895e-07,
8976
+ "loss": 3.1246,
8977
+ "step": 1281
8978
+ },
8979
+ {
8980
+ "epoch": 0.005699056899437429,
8981
+ "grad_norm": 8.875,
8982
+ "learning_rate": 2.847331043910246e-07,
8983
+ "loss": 3.1274,
8984
+ "step": 1282
8985
+ },
8986
+ {
8987
+ "epoch": 0.005703502341636678,
8988
+ "grad_norm": 11.9375,
8989
+ "learning_rate": 2.8495537847720025e-07,
8990
+ "loss": 2.8214,
8991
+ "step": 1283
8992
+ },
8993
+ {
8994
+ "epoch": 0.005707947783835927,
8995
+ "grad_norm": 9.6875,
8996
+ "learning_rate": 2.8517765256337593e-07,
8997
+ "loss": 3.0174,
8998
+ "step": 1284
8999
+ },
9000
+ {
9001
+ "epoch": 0.005712393226035177,
9002
+ "grad_norm": 8.625,
9003
+ "learning_rate": 2.853999266495516e-07,
9004
+ "loss": 3.0763,
9005
+ "step": 1285
9006
+ },
9007
+ {
9008
+ "epoch": 0.005716838668234426,
9009
+ "grad_norm": 8.25,
9010
+ "learning_rate": 2.856222007357273e-07,
9011
+ "loss": 3.2222,
9012
+ "step": 1286
9013
+ },
9014
+ {
9015
+ "epoch": 0.005721284110433675,
9016
+ "grad_norm": 12.5625,
9017
+ "learning_rate": 2.858444748219029e-07,
9018
+ "loss": 2.7932,
9019
+ "step": 1287
9020
+ },
9021
+ {
9022
+ "epoch": 0.005725729552632924,
9023
+ "grad_norm": 10.75,
9024
+ "learning_rate": 2.860667489080786e-07,
9025
+ "loss": 2.8552,
9026
+ "step": 1288
9027
+ },
9028
+ {
9029
+ "epoch": 0.0057301749948321734,
9030
+ "grad_norm": 11.1875,
9031
+ "learning_rate": 2.8628902299425427e-07,
9032
+ "loss": 2.8399,
9033
+ "step": 1289
9034
+ },
9035
+ {
9036
+ "epoch": 0.0057346204370314225,
9037
+ "grad_norm": 9.1875,
9038
+ "learning_rate": 2.865112970804299e-07,
9039
+ "loss": 3.0231,
9040
+ "step": 1290
9041
+ },
9042
+ {
9043
+ "epoch": 0.005739065879230672,
9044
+ "grad_norm": 10.5625,
9045
+ "learning_rate": 2.8673357116660557e-07,
9046
+ "loss": 2.9536,
9047
+ "step": 1291
9048
+ },
9049
+ {
9050
+ "epoch": 0.005743511321429921,
9051
+ "grad_norm": 9.25,
9052
+ "learning_rate": 2.869558452527812e-07,
9053
+ "loss": 3.0822,
9054
+ "step": 1292
9055
+ },
9056
+ {
9057
+ "epoch": 0.00574795676362917,
9058
+ "grad_norm": 11.9375,
9059
+ "learning_rate": 2.8717811933895687e-07,
9060
+ "loss": 2.8038,
9061
+ "step": 1293
9062
+ },
9063
+ {
9064
+ "epoch": 0.00575240220582842,
9065
+ "grad_norm": 9.5,
9066
+ "learning_rate": 2.8740039342513255e-07,
9067
+ "loss": 3.0759,
9068
+ "step": 1294
9069
+ },
9070
+ {
9071
+ "epoch": 0.005756847648027669,
9072
+ "grad_norm": 9.3125,
9073
+ "learning_rate": 2.876226675113082e-07,
9074
+ "loss": 3.1649,
9075
+ "step": 1295
9076
+ },
9077
+ {
9078
+ "epoch": 0.005761293090226918,
9079
+ "grad_norm": 11.5625,
9080
+ "learning_rate": 2.8784494159748385e-07,
9081
+ "loss": 2.8971,
9082
+ "step": 1296
9083
+ },
9084
+ {
9085
+ "epoch": 0.005765738532426167,
9086
+ "grad_norm": 9.5625,
9087
+ "learning_rate": 2.8806721568365953e-07,
9088
+ "loss": 3.1151,
9089
+ "step": 1297
9090
+ },
9091
+ {
9092
+ "epoch": 0.005770183974625416,
9093
+ "grad_norm": 9.6875,
9094
+ "learning_rate": 2.882894897698352e-07,
9095
+ "loss": 3.0496,
9096
+ "step": 1298
9097
+ },
9098
+ {
9099
+ "epoch": 0.005774629416824665,
9100
+ "grad_norm": 9.3125,
9101
+ "learning_rate": 2.885117638560109e-07,
9102
+ "loss": 3.0914,
9103
+ "step": 1299
9104
+ },
9105
+ {
9106
+ "epoch": 0.005779074859023914,
9107
+ "grad_norm": 10.625,
9108
+ "learning_rate": 2.8873403794218656e-07,
9109
+ "loss": 2.9635,
9110
+ "step": 1300
9111
+ },
9112
+ {
9113
+ "epoch": 0.005783520301223163,
9114
+ "grad_norm": 9.6875,
9115
+ "learning_rate": 2.889563120283622e-07,
9116
+ "loss": 3.0395,
9117
+ "step": 1301
9118
+ },
9119
+ {
9120
+ "epoch": 0.005787965743422413,
9121
+ "grad_norm": 10.3125,
9122
+ "learning_rate": 2.8917858611453787e-07,
9123
+ "loss": 2.8086,
9124
+ "step": 1302
9125
+ },
9126
+ {
9127
+ "epoch": 0.005792411185621662,
9128
+ "grad_norm": 9.6875,
9129
+ "learning_rate": 2.8940086020071354e-07,
9130
+ "loss": 3.0765,
9131
+ "step": 1303
9132
+ },
9133
+ {
9134
+ "epoch": 0.005796856627820911,
9135
+ "grad_norm": 11.75,
9136
+ "learning_rate": 2.896231342868892e-07,
9137
+ "loss": 2.7715,
9138
+ "step": 1304
9139
+ },
9140
+ {
9141
+ "epoch": 0.00580130207002016,
9142
+ "grad_norm": 8.4375,
9143
+ "learning_rate": 2.8984540837306485e-07,
9144
+ "loss": 3.1307,
9145
+ "step": 1305
9146
+ },
9147
+ {
9148
+ "epoch": 0.005805747512219409,
9149
+ "grad_norm": 10.125,
9150
+ "learning_rate": 2.9006768245924047e-07,
9151
+ "loss": 2.9209,
9152
+ "step": 1306
9153
+ },
9154
+ {
9155
+ "epoch": 0.005810192954418658,
9156
+ "grad_norm": 10.6875,
9157
+ "learning_rate": 2.9028995654541615e-07,
9158
+ "loss": 2.926,
9159
+ "step": 1307
9160
+ },
9161
+ {
9162
+ "epoch": 0.005814638396617907,
9163
+ "grad_norm": 10.9375,
9164
+ "learning_rate": 2.905122306315918e-07,
9165
+ "loss": 2.8759,
9166
+ "step": 1308
9167
+ },
9168
+ {
9169
+ "epoch": 0.005819083838817156,
9170
+ "grad_norm": 10.25,
9171
+ "learning_rate": 2.907345047177675e-07,
9172
+ "loss": 2.9456,
9173
+ "step": 1309
9174
+ },
9175
+ {
9176
+ "epoch": 0.0058235292810164055,
9177
+ "grad_norm": 11.875,
9178
+ "learning_rate": 2.909567788039432e-07,
9179
+ "loss": 2.874,
9180
+ "step": 1310
9181
+ },
9182
+ {
9183
+ "epoch": 0.005827974723215655,
9184
+ "grad_norm": 9.9375,
9185
+ "learning_rate": 2.911790528901188e-07,
9186
+ "loss": 2.9463,
9187
+ "step": 1311
9188
+ },
9189
+ {
9190
+ "epoch": 0.0058324201654149045,
9191
+ "grad_norm": 9.8125,
9192
+ "learning_rate": 2.914013269762945e-07,
9193
+ "loss": 3.0364,
9194
+ "step": 1312
9195
+ },
9196
+ {
9197
+ "epoch": 0.0058368656076141535,
9198
+ "grad_norm": 7.78125,
9199
+ "learning_rate": 2.9162360106247016e-07,
9200
+ "loss": 3.1732,
9201
+ "step": 1313
9202
+ },
9203
+ {
9204
+ "epoch": 0.005841311049813403,
9205
+ "grad_norm": 8.625,
9206
+ "learning_rate": 2.9184587514864584e-07,
9207
+ "loss": 3.236,
9208
+ "step": 1314
9209
+ },
9210
+ {
9211
+ "epoch": 0.005845756492012652,
9212
+ "grad_norm": 7.71875,
9213
+ "learning_rate": 2.9206814923482146e-07,
9214
+ "loss": 3.3732,
9215
+ "step": 1315
9216
+ },
9217
+ {
9218
+ "epoch": 0.005850201934211901,
9219
+ "grad_norm": 7.0,
9220
+ "learning_rate": 2.9229042332099714e-07,
9221
+ "loss": 3.3333,
9222
+ "step": 1316
9223
+ },
9224
+ {
9225
+ "epoch": 0.00585464737641115,
9226
+ "grad_norm": 7.375,
9227
+ "learning_rate": 2.925126974071728e-07,
9228
+ "loss": 3.2613,
9229
+ "step": 1317
9230
+ },
9231
+ {
9232
+ "epoch": 0.005859092818610399,
9233
+ "grad_norm": 7.59375,
9234
+ "learning_rate": 2.927349714933485e-07,
9235
+ "loss": 3.2561,
9236
+ "step": 1318
9237
+ },
9238
+ {
9239
+ "epoch": 0.005863538260809649,
9240
+ "grad_norm": 9.25,
9241
+ "learning_rate": 2.929572455795241e-07,
9242
+ "loss": 3.1236,
9243
+ "step": 1319
9244
+ },
9245
+ {
9246
+ "epoch": 0.005867983703008898,
9247
+ "grad_norm": 7.40625,
9248
+ "learning_rate": 2.931795196656998e-07,
9249
+ "loss": 3.4843,
9250
+ "step": 1320
9251
+ },
9252
+ {
9253
+ "epoch": 0.005872429145208147,
9254
+ "grad_norm": 10.3125,
9255
+ "learning_rate": 2.934017937518755e-07,
9256
+ "loss": 3.0469,
9257
+ "step": 1321
9258
+ },
9259
+ {
9260
+ "epoch": 0.005876874587407396,
9261
+ "grad_norm": 10.1875,
9262
+ "learning_rate": 2.9362406783805116e-07,
9263
+ "loss": 2.9779,
9264
+ "step": 1322
9265
+ },
9266
+ {
9267
+ "epoch": 0.005881320029606645,
9268
+ "grad_norm": 9.8125,
9269
+ "learning_rate": 2.938463419242268e-07,
9270
+ "loss": 3.0385,
9271
+ "step": 1323
9272
+ },
9273
+ {
9274
+ "epoch": 0.005885765471805894,
9275
+ "grad_norm": 8.5,
9276
+ "learning_rate": 2.9406861601040246e-07,
9277
+ "loss": 3.2974,
9278
+ "step": 1324
9279
+ },
9280
+ {
9281
+ "epoch": 0.005890210914005143,
9282
+ "grad_norm": 10.1875,
9283
+ "learning_rate": 2.942908900965781e-07,
9284
+ "loss": 2.9625,
9285
+ "step": 1325
9286
+ },
9287
+ {
9288
+ "epoch": 0.005894656356204392,
9289
+ "grad_norm": 10.1875,
9290
+ "learning_rate": 2.9451316418275376e-07,
9291
+ "loss": 3.0334,
9292
+ "step": 1326
9293
+ },
9294
+ {
9295
+ "epoch": 0.005899101798403641,
9296
+ "grad_norm": 11.4375,
9297
+ "learning_rate": 2.9473543826892944e-07,
9298
+ "loss": 2.8511,
9299
+ "step": 1327
9300
+ },
9301
+ {
9302
+ "epoch": 0.005903547240602891,
9303
+ "grad_norm": 11.625,
9304
+ "learning_rate": 2.949577123551051e-07,
9305
+ "loss": 2.8895,
9306
+ "step": 1328
9307
+ },
9308
+ {
9309
+ "epoch": 0.00590799268280214,
9310
+ "grad_norm": 9.4375,
9311
+ "learning_rate": 2.9517998644128074e-07,
9312
+ "loss": 3.0653,
9313
+ "step": 1329
9314
+ },
9315
+ {
9316
+ "epoch": 0.005912438125001389,
9317
+ "grad_norm": 8.375,
9318
+ "learning_rate": 2.954022605274564e-07,
9319
+ "loss": 3.1183,
9320
+ "step": 1330
9321
+ },
9322
+ {
9323
+ "epoch": 0.005916883567200638,
9324
+ "grad_norm": 9.4375,
9325
+ "learning_rate": 2.956245346136321e-07,
9326
+ "loss": 3.0215,
9327
+ "step": 1331
9328
+ },
9329
+ {
9330
+ "epoch": 0.0059213290093998875,
9331
+ "grad_norm": 11.0,
9332
+ "learning_rate": 2.958468086998078e-07,
9333
+ "loss": 3.0208,
9334
+ "step": 1332
9335
+ },
9336
+ {
9337
+ "epoch": 0.0059257744515991365,
9338
+ "grad_norm": 10.25,
9339
+ "learning_rate": 2.960690827859834e-07,
9340
+ "loss": 3.0723,
9341
+ "step": 1333
9342
+ },
9343
+ {
9344
+ "epoch": 0.005930219893798386,
9345
+ "grad_norm": 9.5625,
9346
+ "learning_rate": 2.962913568721591e-07,
9347
+ "loss": 3.0629,
9348
+ "step": 1334
9349
+ },
9350
+ {
9351
+ "epoch": 0.005934665335997635,
9352
+ "grad_norm": 11.3125,
9353
+ "learning_rate": 2.9651363095833476e-07,
9354
+ "loss": 2.9396,
9355
+ "step": 1335
9356
+ },
9357
+ {
9358
+ "epoch": 0.005939110778196885,
9359
+ "grad_norm": 8.125,
9360
+ "learning_rate": 2.9673590504451043e-07,
9361
+ "loss": 3.3736,
9362
+ "step": 1336
9363
+ },
9364
+ {
9365
+ "epoch": 0.005943556220396134,
9366
+ "grad_norm": 8.8125,
9367
+ "learning_rate": 2.969581791306861e-07,
9368
+ "loss": 3.0242,
9369
+ "step": 1337
9370
+ },
9371
+ {
9372
+ "epoch": 0.005948001662595383,
9373
+ "grad_norm": 8.4375,
9374
+ "learning_rate": 2.9718045321686174e-07,
9375
+ "loss": 3.1464,
9376
+ "step": 1338
9377
+ },
9378
+ {
9379
+ "epoch": 0.005952447104794632,
9380
+ "grad_norm": 9.0625,
9381
+ "learning_rate": 2.974027273030374e-07,
9382
+ "loss": 3.1056,
9383
+ "step": 1339
9384
+ },
9385
+ {
9386
+ "epoch": 0.005956892546993881,
9387
+ "grad_norm": 9.875,
9388
+ "learning_rate": 2.9762500138921304e-07,
9389
+ "loss": 2.9336,
9390
+ "step": 1340
9391
+ },
9392
+ {
9393
+ "epoch": 0.00596133798919313,
9394
+ "grad_norm": 9.3125,
9395
+ "learning_rate": 2.978472754753887e-07,
9396
+ "loss": 2.9919,
9397
+ "step": 1341
9398
+ },
9399
+ {
9400
+ "epoch": 0.005965783431392379,
9401
+ "grad_norm": 9.25,
9402
+ "learning_rate": 2.980695495615644e-07,
9403
+ "loss": 3.1238,
9404
+ "step": 1342
9405
+ },
9406
+ {
9407
+ "epoch": 0.005970228873591628,
9408
+ "grad_norm": 11.0625,
9409
+ "learning_rate": 2.9829182364774e-07,
9410
+ "loss": 2.907,
9411
+ "step": 1343
9412
+ },
9413
+ {
9414
+ "epoch": 0.005974674315790877,
9415
+ "grad_norm": 11.0625,
9416
+ "learning_rate": 2.985140977339157e-07,
9417
+ "loss": 2.8599,
9418
+ "step": 1344
9419
+ },
9420
+ {
9421
+ "epoch": 0.005979119757990127,
9422
+ "grad_norm": 9.4375,
9423
+ "learning_rate": 2.987363718200914e-07,
9424
+ "loss": 3.0859,
9425
+ "step": 1345
9426
+ },
9427
+ {
9428
+ "epoch": 0.005983565200189376,
9429
+ "grad_norm": 9.0,
9430
+ "learning_rate": 2.9895864590626705e-07,
9431
+ "loss": 3.0753,
9432
+ "step": 1346
9433
+ },
9434
+ {
9435
+ "epoch": 0.005988010642388625,
9436
+ "grad_norm": 9.0625,
9437
+ "learning_rate": 2.9918091999244273e-07,
9438
+ "loss": 3.0776,
9439
+ "step": 1347
9440
+ },
9441
+ {
9442
+ "epoch": 0.005992456084587874,
9443
+ "grad_norm": 10.75,
9444
+ "learning_rate": 2.9940319407861836e-07,
9445
+ "loss": 2.9689,
9446
+ "step": 1348
9447
+ },
9448
+ {
9449
+ "epoch": 0.005996901526787123,
9450
+ "grad_norm": 10.6875,
9451
+ "learning_rate": 2.9962546816479403e-07,
9452
+ "loss": 2.9394,
9453
+ "step": 1349
9454
+ },
9455
+ {
9456
+ "epoch": 0.006001346968986372,
9457
+ "grad_norm": 9.5,
9458
+ "learning_rate": 2.998477422509697e-07,
9459
+ "loss": 2.9483,
9460
+ "step": 1350
9461
+ },
9462
+ {
9463
+ "epoch": 0.006005792411185621,
9464
+ "grad_norm": 7.25,
9465
+ "learning_rate": 3.000700163371454e-07,
9466
+ "loss": 3.2638,
9467
+ "step": 1351
9468
+ },
9469
+ {
9470
+ "epoch": 0.0060102378533848705,
9471
+ "grad_norm": 11.125,
9472
+ "learning_rate": 3.00292290423321e-07,
9473
+ "loss": 2.8421,
9474
+ "step": 1352
9475
+ },
9476
+ {
9477
+ "epoch": 0.00601468329558412,
9478
+ "grad_norm": 10.3125,
9479
+ "learning_rate": 3.005145645094967e-07,
9480
+ "loss": 3.0807,
9481
+ "step": 1353
9482
+ },
9483
+ {
9484
+ "epoch": 0.0060191287377833694,
9485
+ "grad_norm": 10.625,
9486
+ "learning_rate": 3.0073683859567237e-07,
9487
+ "loss": 3.0317,
9488
+ "step": 1354
9489
+ },
9490
+ {
9491
+ "epoch": 0.0060235741799826185,
9492
+ "grad_norm": 10.3125,
9493
+ "learning_rate": 3.0095911268184805e-07,
9494
+ "loss": 2.9453,
9495
+ "step": 1355
9496
+ },
9497
+ {
9498
+ "epoch": 0.006028019622181868,
9499
+ "grad_norm": 10.9375,
9500
+ "learning_rate": 3.0118138676802367e-07,
9501
+ "loss": 2.9334,
9502
+ "step": 1356
9503
+ },
9504
+ {
9505
+ "epoch": 0.006032465064381117,
9506
+ "grad_norm": 10.0625,
9507
+ "learning_rate": 3.014036608541993e-07,
9508
+ "loss": 2.9875,
9509
+ "step": 1357
9510
+ },
9511
+ {
9512
+ "epoch": 0.006036910506580366,
9513
+ "grad_norm": 10.25,
9514
+ "learning_rate": 3.01625934940375e-07,
9515
+ "loss": 2.9972,
9516
+ "step": 1358
9517
+ },
9518
+ {
9519
+ "epoch": 0.006041355948779615,
9520
+ "grad_norm": 11.0625,
9521
+ "learning_rate": 3.0184820902655065e-07,
9522
+ "loss": 2.9373,
9523
+ "step": 1359
9524
+ },
9525
+ {
9526
+ "epoch": 0.006045801390978864,
9527
+ "grad_norm": 11.9375,
9528
+ "learning_rate": 3.0207048311272633e-07,
9529
+ "loss": 2.8333,
9530
+ "step": 1360
9531
+ },
9532
+ {
9533
+ "epoch": 0.006050246833178113,
9534
+ "grad_norm": 10.5625,
9535
+ "learning_rate": 3.02292757198902e-07,
9536
+ "loss": 2.9671,
9537
+ "step": 1361
9538
+ },
9539
+ {
9540
+ "epoch": 0.006054692275377363,
9541
+ "grad_norm": 9.9375,
9542
+ "learning_rate": 3.0251503128507763e-07,
9543
+ "loss": 2.8971,
9544
+ "step": 1362
9545
+ },
9546
+ {
9547
+ "epoch": 0.006059137717576612,
9548
+ "grad_norm": 11.9375,
9549
+ "learning_rate": 3.027373053712533e-07,
9550
+ "loss": 2.6949,
9551
+ "step": 1363
9552
+ },
9553
+ {
9554
+ "epoch": 0.006063583159775861,
9555
+ "grad_norm": 10.75,
9556
+ "learning_rate": 3.02959579457429e-07,
9557
+ "loss": 2.9037,
9558
+ "step": 1364
9559
+ },
9560
+ {
9561
+ "epoch": 0.00606802860197511,
9562
+ "grad_norm": 9.25,
9563
+ "learning_rate": 3.0318185354360467e-07,
9564
+ "loss": 3.2175,
9565
+ "step": 1365
9566
+ },
9567
+ {
9568
+ "epoch": 0.006072474044174359,
9569
+ "grad_norm": 13.6875,
9570
+ "learning_rate": 3.034041276297803e-07,
9571
+ "loss": 2.669,
9572
+ "step": 1366
9573
+ },
9574
+ {
9575
+ "epoch": 0.006076919486373608,
9576
+ "grad_norm": 7.96875,
9577
+ "learning_rate": 3.0362640171595597e-07,
9578
+ "loss": 3.303,
9579
+ "step": 1367
9580
+ },
9581
+ {
9582
+ "epoch": 0.006081364928572857,
9583
+ "grad_norm": 7.0,
9584
+ "learning_rate": 3.0384867580213165e-07,
9585
+ "loss": 3.3917,
9586
+ "step": 1368
9587
+ },
9588
+ {
9589
+ "epoch": 0.006085810370772106,
9590
+ "grad_norm": 10.25,
9591
+ "learning_rate": 3.040709498883073e-07,
9592
+ "loss": 2.9504,
9593
+ "step": 1369
9594
+ },
9595
+ {
9596
+ "epoch": 0.006090255812971356,
9597
+ "grad_norm": 12.125,
9598
+ "learning_rate": 3.0429322397448295e-07,
9599
+ "loss": 2.7856,
9600
+ "step": 1370
9601
+ },
9602
+ {
9603
+ "epoch": 0.006094701255170605,
9604
+ "grad_norm": 8.875,
9605
+ "learning_rate": 3.0451549806065863e-07,
9606
+ "loss": 3.1407,
9607
+ "step": 1371
9608
+ },
9609
+ {
9610
+ "epoch": 0.006099146697369854,
9611
+ "grad_norm": 9.25,
9612
+ "learning_rate": 3.047377721468343e-07,
9613
+ "loss": 3.1882,
9614
+ "step": 1372
9615
+ },
9616
+ {
9617
+ "epoch": 0.006103592139569103,
9618
+ "grad_norm": 10.0,
9619
+ "learning_rate": 3.0496004623300993e-07,
9620
+ "loss": 2.9092,
9621
+ "step": 1373
9622
+ },
9623
+ {
9624
+ "epoch": 0.006108037581768352,
9625
+ "grad_norm": 11.3125,
9626
+ "learning_rate": 3.051823203191856e-07,
9627
+ "loss": 2.9245,
9628
+ "step": 1374
9629
+ },
9630
+ {
9631
+ "epoch": 0.0061124830239676015,
9632
+ "grad_norm": 10.125,
9633
+ "learning_rate": 3.054045944053613e-07,
9634
+ "loss": 2.9332,
9635
+ "step": 1375
9636
+ },
9637
+ {
9638
+ "epoch": 0.0061169284661668506,
9639
+ "grad_norm": 10.4375,
9640
+ "learning_rate": 3.056268684915369e-07,
9641
+ "loss": 2.942,
9642
+ "step": 1376
9643
+ },
9644
+ {
9645
+ "epoch": 0.0061213739083661,
9646
+ "grad_norm": 8.75,
9647
+ "learning_rate": 3.058491425777126e-07,
9648
+ "loss": 3.152,
9649
+ "step": 1377
9650
+ },
9651
+ {
9652
+ "epoch": 0.006125819350565349,
9653
+ "grad_norm": 10.3125,
9654
+ "learning_rate": 3.0607141666388827e-07,
9655
+ "loss": 2.9159,
9656
+ "step": 1378
9657
+ },
9658
+ {
9659
+ "epoch": 0.006130264792764599,
9660
+ "grad_norm": 8.3125,
9661
+ "learning_rate": 3.0629369075006394e-07,
9662
+ "loss": 3.1517,
9663
+ "step": 1379
9664
+ },
9665
+ {
9666
+ "epoch": 0.006134710234963848,
9667
+ "grad_norm": 11.0,
9668
+ "learning_rate": 3.0651596483623957e-07,
9669
+ "loss": 2.9173,
9670
+ "step": 1380
9671
+ },
9672
+ {
9673
+ "epoch": 0.006139155677163097,
9674
+ "grad_norm": 9.0,
9675
+ "learning_rate": 3.0673823892241525e-07,
9676
+ "loss": 3.0811,
9677
+ "step": 1381
9678
+ },
9679
+ {
9680
+ "epoch": 0.006143601119362346,
9681
+ "grad_norm": 10.75,
9682
+ "learning_rate": 3.069605130085909e-07,
9683
+ "loss": 2.8594,
9684
+ "step": 1382
9685
+ },
9686
+ {
9687
+ "epoch": 0.006148046561561595,
9688
+ "grad_norm": 11.0625,
9689
+ "learning_rate": 3.071827870947666e-07,
9690
+ "loss": 3.0104,
9691
+ "step": 1383
9692
+ },
9693
+ {
9694
+ "epoch": 0.006152492003760844,
9695
+ "grad_norm": 11.125,
9696
+ "learning_rate": 3.0740506118094223e-07,
9697
+ "loss": 2.8202,
9698
+ "step": 1384
9699
+ },
9700
+ {
9701
+ "epoch": 0.006156937445960093,
9702
+ "grad_norm": 9.25,
9703
+ "learning_rate": 3.076273352671179e-07,
9704
+ "loss": 3.0002,
9705
+ "step": 1385
9706
+ },
9707
+ {
9708
+ "epoch": 0.006161382888159342,
9709
+ "grad_norm": 10.625,
9710
+ "learning_rate": 3.078496093532936e-07,
9711
+ "loss": 2.8458,
9712
+ "step": 1386
9713
+ },
9714
+ {
9715
+ "epoch": 0.006165828330358592,
9716
+ "grad_norm": 11.0625,
9717
+ "learning_rate": 3.0807188343946926e-07,
9718
+ "loss": 2.915,
9719
+ "step": 1387
9720
+ },
9721
+ {
9722
+ "epoch": 0.006170273772557841,
9723
+ "grad_norm": 10.5,
9724
+ "learning_rate": 3.0829415752564494e-07,
9725
+ "loss": 3.0639,
9726
+ "step": 1388
9727
+ },
9728
+ {
9729
+ "epoch": 0.00617471921475709,
9730
+ "grad_norm": 8.5,
9731
+ "learning_rate": 3.0851643161182056e-07,
9732
+ "loss": 3.1389,
9733
+ "step": 1389
9734
+ },
9735
+ {
9736
+ "epoch": 0.006179164656956339,
9737
+ "grad_norm": 9.625,
9738
+ "learning_rate": 3.087387056979962e-07,
9739
+ "loss": 3.3029,
9740
+ "step": 1390
9741
+ },
9742
+ {
9743
+ "epoch": 0.006183610099155588,
9744
+ "grad_norm": 9.6875,
9745
+ "learning_rate": 3.0896097978417187e-07,
9746
+ "loss": 3.0344,
9747
+ "step": 1391
9748
+ },
9749
+ {
9750
+ "epoch": 0.006188055541354837,
9751
+ "grad_norm": 11.5625,
9752
+ "learning_rate": 3.0918325387034754e-07,
9753
+ "loss": 2.8049,
9754
+ "step": 1392
9755
+ },
9756
+ {
9757
+ "epoch": 0.006192500983554086,
9758
+ "grad_norm": 10.6875,
9759
+ "learning_rate": 3.094055279565232e-07,
9760
+ "loss": 2.912,
9761
+ "step": 1393
9762
+ },
9763
+ {
9764
+ "epoch": 0.006196946425753335,
9765
+ "grad_norm": 10.75,
9766
+ "learning_rate": 3.0962780204269885e-07,
9767
+ "loss": 2.9791,
9768
+ "step": 1394
9769
+ },
9770
+ {
9771
+ "epoch": 0.006201391867952585,
9772
+ "grad_norm": 9.4375,
9773
+ "learning_rate": 3.098500761288745e-07,
9774
+ "loss": 2.98,
9775
+ "step": 1395
9776
+ },
9777
+ {
9778
+ "epoch": 0.006205837310151834,
9779
+ "grad_norm": 11.5625,
9780
+ "learning_rate": 3.100723502150502e-07,
9781
+ "loss": 2.9625,
9782
+ "step": 1396
9783
+ },
9784
+ {
9785
+ "epoch": 0.0062102827523510835,
9786
+ "grad_norm": 10.3125,
9787
+ "learning_rate": 3.102946243012259e-07,
9788
+ "loss": 2.9074,
9789
+ "step": 1397
9790
+ },
9791
+ {
9792
+ "epoch": 0.0062147281945503325,
9793
+ "grad_norm": 8.9375,
9794
+ "learning_rate": 3.1051689838740156e-07,
9795
+ "loss": 3.0771,
9796
+ "step": 1398
9797
+ },
9798
+ {
9799
+ "epoch": 0.006219173636749582,
9800
+ "grad_norm": 8.4375,
9801
+ "learning_rate": 3.107391724735772e-07,
9802
+ "loss": 3.1583,
9803
+ "step": 1399
9804
+ },
9805
+ {
9806
+ "epoch": 0.006223619078948831,
9807
+ "grad_norm": 8.4375,
9808
+ "learning_rate": 3.1096144655975286e-07,
9809
+ "loss": 3.1964,
9810
+ "step": 1400
9811
+ },
9812
+ {
9813
+ "epoch": 0.00622806452114808,
9814
+ "grad_norm": 12.4375,
9815
+ "learning_rate": 3.1118372064592854e-07,
9816
+ "loss": 2.7624,
9817
+ "step": 1401
9818
+ },
9819
+ {
9820
+ "epoch": 0.006232509963347329,
9821
+ "grad_norm": 9.0,
9822
+ "learning_rate": 3.114059947321042e-07,
9823
+ "loss": 3.0169,
9824
+ "step": 1402
9825
+ },
9826
+ {
9827
+ "epoch": 0.006236955405546578,
9828
+ "grad_norm": 8.875,
9829
+ "learning_rate": 3.1162826881827984e-07,
9830
+ "loss": 3.0511,
9831
+ "step": 1403
9832
+ },
9833
+ {
9834
+ "epoch": 0.006241400847745828,
9835
+ "grad_norm": 8.25,
9836
+ "learning_rate": 3.118505429044555e-07,
9837
+ "loss": 3.3416,
9838
+ "step": 1404
9839
+ },
9840
+ {
9841
+ "epoch": 0.006245846289945077,
9842
+ "grad_norm": 7.375,
9843
+ "learning_rate": 3.120728169906312e-07,
9844
+ "loss": 3.1489,
9845
+ "step": 1405
9846
+ },
9847
+ {
9848
+ "epoch": 0.006250291732144326,
9849
+ "grad_norm": 9.0,
9850
+ "learning_rate": 3.122950910768068e-07,
9851
+ "loss": 3.1389,
9852
+ "step": 1406
9853
+ },
9854
+ {
9855
+ "epoch": 0.006254737174343575,
9856
+ "grad_norm": 8.6875,
9857
+ "learning_rate": 3.125173651629825e-07,
9858
+ "loss": 3.1529,
9859
+ "step": 1407
9860
+ },
9861
+ {
9862
+ "epoch": 0.006259182616542824,
9863
+ "grad_norm": 8.25,
9864
+ "learning_rate": 3.127396392491582e-07,
9865
+ "loss": 3.3158,
9866
+ "step": 1408
9867
+ },
9868
+ {
9869
+ "epoch": 0.006263628058742073,
9870
+ "grad_norm": 10.1875,
9871
+ "learning_rate": 3.1296191333533385e-07,
9872
+ "loss": 3.0265,
9873
+ "step": 1409
9874
+ },
9875
+ {
9876
+ "epoch": 0.006268073500941322,
9877
+ "grad_norm": 8.0625,
9878
+ "learning_rate": 3.131841874215095e-07,
9879
+ "loss": 3.1497,
9880
+ "step": 1410
9881
+ },
9882
+ {
9883
+ "epoch": 0.006272518943140571,
9884
+ "grad_norm": 9.9375,
9885
+ "learning_rate": 3.1340646150768516e-07,
9886
+ "loss": 2.8999,
9887
+ "step": 1411
9888
+ },
9889
+ {
9890
+ "epoch": 0.006276964385339821,
9891
+ "grad_norm": 9.625,
9892
+ "learning_rate": 3.1362873559386083e-07,
9893
+ "loss": 3.1192,
9894
+ "step": 1412
9895
+ },
9896
+ {
9897
+ "epoch": 0.00628140982753907,
9898
+ "grad_norm": 11.25,
9899
+ "learning_rate": 3.138510096800365e-07,
9900
+ "loss": 2.8486,
9901
+ "step": 1413
9902
+ },
9903
+ {
9904
+ "epoch": 0.006285855269738319,
9905
+ "grad_norm": 7.96875,
9906
+ "learning_rate": 3.1407328376621214e-07,
9907
+ "loss": 3.3656,
9908
+ "step": 1414
9909
+ },
9910
+ {
9911
+ "epoch": 0.006290300711937568,
9912
+ "grad_norm": 9.5,
9913
+ "learning_rate": 3.1429555785238776e-07,
9914
+ "loss": 3.0808,
9915
+ "step": 1415
9916
+ },
9917
+ {
9918
+ "epoch": 0.006294746154136817,
9919
+ "grad_norm": 11.375,
9920
+ "learning_rate": 3.145178319385635e-07,
9921
+ "loss": 2.8574,
9922
+ "step": 1416
9923
+ },
9924
+ {
9925
+ "epoch": 0.0062991915963360665,
9926
+ "grad_norm": 6.75,
9927
+ "learning_rate": 3.147401060247391e-07,
9928
+ "loss": 3.3653,
9929
+ "step": 1417
9930
+ },
9931
+ {
9932
+ "epoch": 0.0063036370385353155,
9933
+ "grad_norm": 10.0625,
9934
+ "learning_rate": 3.149623801109148e-07,
9935
+ "loss": 2.9331,
9936
+ "step": 1418
9937
+ },
9938
+ {
9939
+ "epoch": 0.006308082480734565,
9940
+ "grad_norm": 11.75,
9941
+ "learning_rate": 3.151846541970904e-07,
9942
+ "loss": 2.87,
9943
+ "step": 1419
9944
+ },
9945
+ {
9946
+ "epoch": 0.006312527922933814,
9947
+ "grad_norm": 8.625,
9948
+ "learning_rate": 3.1540692828326615e-07,
9949
+ "loss": 3.054,
9950
+ "step": 1420
9951
+ },
9952
+ {
9953
+ "epoch": 0.006316973365133064,
9954
+ "grad_norm": 10.4375,
9955
+ "learning_rate": 3.156292023694418e-07,
9956
+ "loss": 2.9734,
9957
+ "step": 1421
9958
+ },
9959
+ {
9960
+ "epoch": 0.006321418807332313,
9961
+ "grad_norm": 9.8125,
9962
+ "learning_rate": 3.1585147645561745e-07,
9963
+ "loss": 3.0222,
9964
+ "step": 1422
9965
+ },
9966
+ {
9967
+ "epoch": 0.006325864249531562,
9968
+ "grad_norm": 8.75,
9969
+ "learning_rate": 3.160737505417931e-07,
9970
+ "loss": 3.1698,
9971
+ "step": 1423
9972
+ },
9973
+ {
9974
+ "epoch": 0.006330309691730811,
9975
+ "grad_norm": 10.3125,
9976
+ "learning_rate": 3.162960246279688e-07,
9977
+ "loss": 3.1181,
9978
+ "step": 1424
9979
+ },
9980
+ {
9981
+ "epoch": 0.00633475513393006,
9982
+ "grad_norm": 11.8125,
9983
+ "learning_rate": 3.1651829871414443e-07,
9984
+ "loss": 2.8102,
9985
+ "step": 1425
9986
+ },
9987
+ {
9988
+ "epoch": 0.006339200576129309,
9989
+ "grad_norm": 9.375,
9990
+ "learning_rate": 3.167405728003201e-07,
9991
+ "loss": 3.0335,
9992
+ "step": 1426
9993
+ },
9994
+ {
9995
+ "epoch": 0.006343646018328558,
9996
+ "grad_norm": 8.8125,
9997
+ "learning_rate": 3.1696284688649574e-07,
9998
+ "loss": 3.0745,
9999
+ "step": 1427
10000
+ },
10001
+ {
10002
+ "epoch": 0.006348091460527807,
10003
+ "grad_norm": 7.0625,
10004
+ "learning_rate": 3.1718512097267147e-07,
10005
+ "loss": 3.4504,
10006
+ "step": 1428
10007
+ },
10008
+ {
10009
+ "epoch": 0.006352536902727057,
10010
+ "grad_norm": 10.0,
10011
+ "learning_rate": 3.174073950588471e-07,
10012
+ "loss": 2.9273,
10013
+ "step": 1429
10014
+ },
10015
+ {
10016
+ "epoch": 0.006356982344926306,
10017
+ "grad_norm": 9.25,
10018
+ "learning_rate": 3.1762966914502277e-07,
10019
+ "loss": 2.9806,
10020
+ "step": 1430
10021
+ },
10022
+ {
10023
+ "epoch": 0.006361427787125555,
10024
+ "grad_norm": 8.1875,
10025
+ "learning_rate": 3.178519432311984e-07,
10026
+ "loss": 3.1518,
10027
+ "step": 1431
10028
+ },
10029
+ {
10030
+ "epoch": 0.006365873229324804,
10031
+ "grad_norm": 9.25,
10032
+ "learning_rate": 3.1807421731737407e-07,
10033
+ "loss": 3.1738,
10034
+ "step": 1432
10035
+ },
10036
+ {
10037
+ "epoch": 0.006370318671524053,
10038
+ "grad_norm": 10.9375,
10039
+ "learning_rate": 3.1829649140354975e-07,
10040
+ "loss": 2.9154,
10041
+ "step": 1433
10042
+ },
10043
+ {
10044
+ "epoch": 0.006374764113723302,
10045
+ "grad_norm": 9.125,
10046
+ "learning_rate": 3.185187654897254e-07,
10047
+ "loss": 3.1293,
10048
+ "step": 1434
10049
+ },
10050
+ {
10051
+ "epoch": 0.006379209555922551,
10052
+ "grad_norm": 11.9375,
10053
+ "learning_rate": 3.1874103957590105e-07,
10054
+ "loss": 2.8604,
10055
+ "step": 1435
10056
+ },
10057
+ {
10058
+ "epoch": 0.0063836549981218,
10059
+ "grad_norm": 9.8125,
10060
+ "learning_rate": 3.1896331366207673e-07,
10061
+ "loss": 3.0019,
10062
+ "step": 1436
10063
+ },
10064
+ {
10065
+ "epoch": 0.0063881004403210494,
10066
+ "grad_norm": 7.5625,
10067
+ "learning_rate": 3.191855877482524e-07,
10068
+ "loss": 3.3028,
10069
+ "step": 1437
10070
+ },
10071
+ {
10072
+ "epoch": 0.006392545882520299,
10073
+ "grad_norm": 9.875,
10074
+ "learning_rate": 3.1940786183442803e-07,
10075
+ "loss": 3.109,
10076
+ "step": 1438
10077
+ },
10078
+ {
10079
+ "epoch": 0.0063969913247195484,
10080
+ "grad_norm": 10.4375,
10081
+ "learning_rate": 3.1963013592060376e-07,
10082
+ "loss": 2.8731,
10083
+ "step": 1439
10084
+ },
10085
+ {
10086
+ "epoch": 0.0064014367669187975,
10087
+ "grad_norm": 10.5625,
10088
+ "learning_rate": 3.198524100067794e-07,
10089
+ "loss": 2.9233,
10090
+ "step": 1440
10091
+ },
10092
+ {
10093
+ "epoch": 0.0064058822091180466,
10094
+ "grad_norm": 8.9375,
10095
+ "learning_rate": 3.2007468409295507e-07,
10096
+ "loss": 3.174,
10097
+ "step": 1441
10098
+ },
10099
+ {
10100
+ "epoch": 0.006410327651317296,
10101
+ "grad_norm": 11.75,
10102
+ "learning_rate": 3.202969581791307e-07,
10103
+ "loss": 2.8942,
10104
+ "step": 1442
10105
+ },
10106
+ {
10107
+ "epoch": 0.006414773093516545,
10108
+ "grad_norm": 6.46875,
10109
+ "learning_rate": 3.205192322653064e-07,
10110
+ "loss": 3.3406,
10111
+ "step": 1443
10112
+ },
10113
+ {
10114
+ "epoch": 0.006419218535715794,
10115
+ "grad_norm": 9.9375,
10116
+ "learning_rate": 3.2074150635148205e-07,
10117
+ "loss": 2.9487,
10118
+ "step": 1444
10119
+ },
10120
+ {
10121
+ "epoch": 0.006423663977915043,
10122
+ "grad_norm": 11.4375,
10123
+ "learning_rate": 3.209637804376577e-07,
10124
+ "loss": 2.8494,
10125
+ "step": 1445
10126
+ },
10127
+ {
10128
+ "epoch": 0.006428109420114293,
10129
+ "grad_norm": 7.65625,
10130
+ "learning_rate": 3.2118605452383335e-07,
10131
+ "loss": 3.29,
10132
+ "step": 1446
10133
+ },
10134
+ {
10135
+ "epoch": 0.006432554862313542,
10136
+ "grad_norm": 10.3125,
10137
+ "learning_rate": 3.214083286100091e-07,
10138
+ "loss": 2.9935,
10139
+ "step": 1447
10140
+ },
10141
+ {
10142
+ "epoch": 0.006437000304512791,
10143
+ "grad_norm": 8.9375,
10144
+ "learning_rate": 3.216306026961847e-07,
10145
+ "loss": 3.0299,
10146
+ "step": 1448
10147
+ },
10148
+ {
10149
+ "epoch": 0.00644144574671204,
10150
+ "grad_norm": 9.1875,
10151
+ "learning_rate": 3.2185287678236033e-07,
10152
+ "loss": 3.094,
10153
+ "step": 1449
10154
+ },
10155
+ {
10156
+ "epoch": 0.006445891188911289,
10157
+ "grad_norm": 9.75,
10158
+ "learning_rate": 3.22075150868536e-07,
10159
+ "loss": 3.1475,
10160
+ "step": 1450
10161
+ },
10162
+ {
10163
+ "epoch": 0.006450336631110538,
10164
+ "grad_norm": 6.9375,
10165
+ "learning_rate": 3.2229742495471163e-07,
10166
+ "loss": 3.4116,
10167
+ "step": 1451
10168
+ },
10169
+ {
10170
+ "epoch": 0.006454782073309787,
10171
+ "grad_norm": 12.125,
10172
+ "learning_rate": 3.2251969904088736e-07,
10173
+ "loss": 2.7524,
10174
+ "step": 1452
10175
+ },
10176
+ {
10177
+ "epoch": 0.006459227515509036,
10178
+ "grad_norm": 9.5625,
10179
+ "learning_rate": 3.22741973127063e-07,
10180
+ "loss": 2.9845,
10181
+ "step": 1453
10182
+ },
10183
+ {
10184
+ "epoch": 0.006463672957708285,
10185
+ "grad_norm": 8.8125,
10186
+ "learning_rate": 3.2296424721323867e-07,
10187
+ "loss": 3.3037,
10188
+ "step": 1454
10189
+ },
10190
+ {
10191
+ "epoch": 0.006468118399907535,
10192
+ "grad_norm": 10.9375,
10193
+ "learning_rate": 3.231865212994143e-07,
10194
+ "loss": 2.8459,
10195
+ "step": 1455
10196
+ },
10197
+ {
10198
+ "epoch": 0.006472563842106784,
10199
+ "grad_norm": 9.3125,
10200
+ "learning_rate": 3.2340879538559e-07,
10201
+ "loss": 3.1079,
10202
+ "step": 1456
10203
+ },
10204
+ {
10205
+ "epoch": 0.006477009284306033,
10206
+ "grad_norm": 10.3125,
10207
+ "learning_rate": 3.2363106947176565e-07,
10208
+ "loss": 2.9778,
10209
+ "step": 1457
10210
+ },
10211
+ {
10212
+ "epoch": 0.006481454726505282,
10213
+ "grad_norm": 10.4375,
10214
+ "learning_rate": 3.238533435579413e-07,
10215
+ "loss": 2.9249,
10216
+ "step": 1458
10217
+ },
10218
+ {
10219
+ "epoch": 0.006485900168704531,
10220
+ "grad_norm": 8.1875,
10221
+ "learning_rate": 3.24075617644117e-07,
10222
+ "loss": 3.2607,
10223
+ "step": 1459
10224
+ },
10225
+ {
10226
+ "epoch": 0.0064903456109037805,
10227
+ "grad_norm": 11.625,
10228
+ "learning_rate": 3.242978917302927e-07,
10229
+ "loss": 2.9025,
10230
+ "step": 1460
10231
+ },
10232
+ {
10233
+ "epoch": 0.0064947910531030295,
10234
+ "grad_norm": 10.0625,
10235
+ "learning_rate": 3.245201658164683e-07,
10236
+ "loss": 3.0274,
10237
+ "step": 1461
10238
+ },
10239
+ {
10240
+ "epoch": 0.006499236495302279,
10241
+ "grad_norm": 9.3125,
10242
+ "learning_rate": 3.24742439902644e-07,
10243
+ "loss": 3.0148,
10244
+ "step": 1462
10245
+ },
10246
+ {
10247
+ "epoch": 0.0065036819375015285,
10248
+ "grad_norm": 8.0625,
10249
+ "learning_rate": 3.2496471398881966e-07,
10250
+ "loss": 3.232,
10251
+ "step": 1463
10252
+ },
10253
+ {
10254
+ "epoch": 0.006508127379700778,
10255
+ "grad_norm": 8.75,
10256
+ "learning_rate": 3.251869880749953e-07,
10257
+ "loss": 3.2344,
10258
+ "step": 1464
10259
+ },
10260
+ {
10261
+ "epoch": 0.006512572821900027,
10262
+ "grad_norm": 11.375,
10263
+ "learning_rate": 3.2540926216117096e-07,
10264
+ "loss": 2.8236,
10265
+ "step": 1465
10266
+ },
10267
+ {
10268
+ "epoch": 0.006517018264099276,
10269
+ "grad_norm": 7.84375,
10270
+ "learning_rate": 3.256315362473466e-07,
10271
+ "loss": 3.5294,
10272
+ "step": 1466
10273
+ },
10274
+ {
10275
+ "epoch": 0.006521463706298525,
10276
+ "grad_norm": 7.34375,
10277
+ "learning_rate": 3.258538103335223e-07,
10278
+ "loss": 3.533,
10279
+ "step": 1467
10280
+ },
10281
+ {
10282
+ "epoch": 0.006525909148497774,
10283
+ "grad_norm": 10.875,
10284
+ "learning_rate": 3.2607608441969794e-07,
10285
+ "loss": 2.8747,
10286
+ "step": 1468
10287
+ },
10288
+ {
10289
+ "epoch": 0.006530354590697023,
10290
+ "grad_norm": 10.75,
10291
+ "learning_rate": 3.262983585058736e-07,
10292
+ "loss": 2.8983,
10293
+ "step": 1469
10294
+ },
10295
+ {
10296
+ "epoch": 0.006534800032896272,
10297
+ "grad_norm": 11.875,
10298
+ "learning_rate": 3.2652063259204925e-07,
10299
+ "loss": 2.7522,
10300
+ "step": 1470
10301
+ },
10302
+ {
10303
+ "epoch": 0.006539245475095521,
10304
+ "grad_norm": 9.625,
10305
+ "learning_rate": 3.26742906678225e-07,
10306
+ "loss": 3.1023,
10307
+ "step": 1471
10308
+ },
10309
+ {
10310
+ "epoch": 0.006543690917294771,
10311
+ "grad_norm": 9.6875,
10312
+ "learning_rate": 3.269651807644006e-07,
10313
+ "loss": 3.0412,
10314
+ "step": 1472
10315
+ },
10316
+ {
10317
+ "epoch": 0.00654813635949402,
10318
+ "grad_norm": 10.25,
10319
+ "learning_rate": 3.271874548505763e-07,
10320
+ "loss": 2.9,
10321
+ "step": 1473
10322
+ },
10323
+ {
10324
+ "epoch": 0.006552581801693269,
10325
+ "grad_norm": 9.3125,
10326
+ "learning_rate": 3.274097289367519e-07,
10327
+ "loss": 3.0553,
10328
+ "step": 1474
10329
+ },
10330
+ {
10331
+ "epoch": 0.006557027243892518,
10332
+ "grad_norm": 11.0,
10333
+ "learning_rate": 3.2763200302292764e-07,
10334
+ "loss": 3.0155,
10335
+ "step": 1475
10336
+ },
10337
+ {
10338
+ "epoch": 0.006561472686091767,
10339
+ "grad_norm": 11.1875,
10340
+ "learning_rate": 3.2785427710910326e-07,
10341
+ "loss": 2.9113,
10342
+ "step": 1476
10343
+ },
10344
+ {
10345
+ "epoch": 0.006565918128291016,
10346
+ "grad_norm": 9.6875,
10347
+ "learning_rate": 3.2807655119527894e-07,
10348
+ "loss": 2.9571,
10349
+ "step": 1477
10350
+ },
10351
+ {
10352
+ "epoch": 0.006570363570490265,
10353
+ "grad_norm": 9.9375,
10354
+ "learning_rate": 3.2829882528145456e-07,
10355
+ "loss": 3.1552,
10356
+ "step": 1478
10357
+ },
10358
+ {
10359
+ "epoch": 0.006574809012689514,
10360
+ "grad_norm": 7.4375,
10361
+ "learning_rate": 3.285210993676303e-07,
10362
+ "loss": 3.3457,
10363
+ "step": 1479
10364
+ },
10365
+ {
10366
+ "epoch": 0.006579254454888764,
10367
+ "grad_norm": 11.6875,
10368
+ "learning_rate": 3.287433734538059e-07,
10369
+ "loss": 2.91,
10370
+ "step": 1480
10371
+ },
10372
+ {
10373
+ "epoch": 0.006583699897088013,
10374
+ "grad_norm": 8.25,
10375
+ "learning_rate": 3.2896564753998154e-07,
10376
+ "loss": 3.1658,
10377
+ "step": 1481
10378
+ },
10379
+ {
10380
+ "epoch": 0.0065881453392872625,
10381
+ "grad_norm": 10.375,
10382
+ "learning_rate": 3.291879216261572e-07,
10383
+ "loss": 2.9511,
10384
+ "step": 1482
10385
+ },
10386
+ {
10387
+ "epoch": 0.0065925907814865115,
10388
+ "grad_norm": 9.8125,
10389
+ "learning_rate": 3.294101957123329e-07,
10390
+ "loss": 2.9954,
10391
+ "step": 1483
10392
+ },
10393
+ {
10394
+ "epoch": 0.006597036223685761,
10395
+ "grad_norm": 10.375,
10396
+ "learning_rate": 3.296324697985086e-07,
10397
+ "loss": 2.8514,
10398
+ "step": 1484
10399
+ },
10400
+ {
10401
+ "epoch": 0.00660148166588501,
10402
+ "grad_norm": 8.9375,
10403
+ "learning_rate": 3.298547438846842e-07,
10404
+ "loss": 3.0513,
10405
+ "step": 1485
10406
+ },
10407
+ {
10408
+ "epoch": 0.006605927108084259,
10409
+ "grad_norm": 5.59375,
10410
+ "learning_rate": 3.300770179708599e-07,
10411
+ "loss": 3.5142,
10412
+ "step": 1486
10413
+ },
10414
+ {
10415
+ "epoch": 0.006610372550283508,
10416
+ "grad_norm": 9.0625,
10417
+ "learning_rate": 3.3029929205703556e-07,
10418
+ "loss": 3.1915,
10419
+ "step": 1487
10420
+ },
10421
+ {
10422
+ "epoch": 0.006614817992482757,
10423
+ "grad_norm": 8.25,
10424
+ "learning_rate": 3.3052156614321124e-07,
10425
+ "loss": 3.1742,
10426
+ "step": 1488
10427
+ },
10428
+ {
10429
+ "epoch": 0.006619263434682007,
10430
+ "grad_norm": 8.125,
10431
+ "learning_rate": 3.3074384022938686e-07,
10432
+ "loss": 3.2412,
10433
+ "step": 1489
10434
+ },
10435
+ {
10436
+ "epoch": 0.006623708876881256,
10437
+ "grad_norm": 9.9375,
10438
+ "learning_rate": 3.309661143155626e-07,
10439
+ "loss": 3.0882,
10440
+ "step": 1490
10441
+ },
10442
+ {
10443
+ "epoch": 0.006628154319080505,
10444
+ "grad_norm": 10.4375,
10445
+ "learning_rate": 3.311883884017382e-07,
10446
+ "loss": 2.9808,
10447
+ "step": 1491
10448
+ },
10449
+ {
10450
+ "epoch": 0.006632599761279754,
10451
+ "grad_norm": 11.1875,
10452
+ "learning_rate": 3.314106624879139e-07,
10453
+ "loss": 2.9024,
10454
+ "step": 1492
10455
+ },
10456
+ {
10457
+ "epoch": 0.006637045203479003,
10458
+ "grad_norm": 11.25,
10459
+ "learning_rate": 3.316329365740895e-07,
10460
+ "loss": 2.8403,
10461
+ "step": 1493
10462
+ },
10463
+ {
10464
+ "epoch": 0.006641490645678252,
10465
+ "grad_norm": 10.375,
10466
+ "learning_rate": 3.3185521066026525e-07,
10467
+ "loss": 2.984,
10468
+ "step": 1494
10469
+ },
10470
+ {
10471
+ "epoch": 0.006645936087877501,
10472
+ "grad_norm": 10.0625,
10473
+ "learning_rate": 3.320774847464409e-07,
10474
+ "loss": 3.0457,
10475
+ "step": 1495
10476
+ },
10477
+ {
10478
+ "epoch": 0.00665038153007675,
10479
+ "grad_norm": 8.5,
10480
+ "learning_rate": 3.3229975883261655e-07,
10481
+ "loss": 3.0898,
10482
+ "step": 1496
10483
+ },
10484
+ {
10485
+ "epoch": 0.006654826972276,
10486
+ "grad_norm": 8.5625,
10487
+ "learning_rate": 3.325220329187922e-07,
10488
+ "loss": 3.1635,
10489
+ "step": 1497
10490
+ },
10491
+ {
10492
+ "epoch": 0.006659272414475249,
10493
+ "grad_norm": 9.9375,
10494
+ "learning_rate": 3.327443070049678e-07,
10495
+ "loss": 3.1107,
10496
+ "step": 1498
10497
+ },
10498
+ {
10499
+ "epoch": 0.006663717856674498,
10500
+ "grad_norm": 8.3125,
10501
+ "learning_rate": 3.3296658109114353e-07,
10502
+ "loss": 3.2937,
10503
+ "step": 1499
10504
+ },
10505
+ {
10506
+ "epoch": 0.006668163298873747,
10507
+ "grad_norm": 10.0,
10508
+ "learning_rate": 3.3318885517731916e-07,
10509
+ "loss": 2.9609,
10510
+ "step": 1500
10511
+ },
10512
+ {
10513
+ "epoch": 0.006672608741072996,
10514
+ "grad_norm": 9.0,
10515
+ "learning_rate": 3.3341112926349484e-07,
10516
+ "loss": 3.1486,
10517
+ "step": 1501
10518
+ },
10519
+ {
10520
+ "epoch": 0.0066770541832722454,
10521
+ "grad_norm": 11.5625,
10522
+ "learning_rate": 3.3363340334967046e-07,
10523
+ "loss": 2.9105,
10524
+ "step": 1502
10525
+ },
10526
+ {
10527
+ "epoch": 0.0066814996254714945,
10528
+ "grad_norm": 8.3125,
10529
+ "learning_rate": 3.338556774358462e-07,
10530
+ "loss": 3.233,
10531
+ "step": 1503
10532
+ },
10533
+ {
10534
+ "epoch": 0.006685945067670744,
10535
+ "grad_norm": 9.0,
10536
+ "learning_rate": 3.340779515220218e-07,
10537
+ "loss": 3.1901,
10538
+ "step": 1504
10539
+ },
10540
+ {
10541
+ "epoch": 0.006690390509869993,
10542
+ "grad_norm": 11.5625,
10543
+ "learning_rate": 3.343002256081975e-07,
10544
+ "loss": 2.9103,
10545
+ "step": 1505
10546
+ },
10547
+ {
10548
+ "epoch": 0.0066948359520692426,
10549
+ "grad_norm": 12.125,
10550
+ "learning_rate": 3.345224996943731e-07,
10551
+ "loss": 2.8905,
10552
+ "step": 1506
10553
+ },
10554
+ {
10555
+ "epoch": 0.006699281394268492,
10556
+ "grad_norm": 10.5625,
10557
+ "learning_rate": 3.3474477378054885e-07,
10558
+ "loss": 2.8387,
10559
+ "step": 1507
10560
+ },
10561
+ {
10562
+ "epoch": 0.006703726836467741,
10563
+ "grad_norm": 11.9375,
10564
+ "learning_rate": 3.349670478667245e-07,
10565
+ "loss": 2.7865,
10566
+ "step": 1508
10567
+ },
10568
+ {
10569
+ "epoch": 0.00670817227866699,
10570
+ "grad_norm": 9.125,
10571
+ "learning_rate": 3.3518932195290015e-07,
10572
+ "loss": 3.1315,
10573
+ "step": 1509
10574
+ },
10575
+ {
10576
+ "epoch": 0.006712617720866239,
10577
+ "grad_norm": 9.0625,
10578
+ "learning_rate": 3.3541159603907583e-07,
10579
+ "loss": 3.0736,
10580
+ "step": 1510
10581
+ },
10582
+ {
10583
+ "epoch": 0.006717063163065488,
10584
+ "grad_norm": 8.625,
10585
+ "learning_rate": 3.356338701252515e-07,
10586
+ "loss": 3.1138,
10587
+ "step": 1511
10588
+ },
10589
+ {
10590
+ "epoch": 0.006721508605264737,
10591
+ "grad_norm": 10.6875,
10592
+ "learning_rate": 3.3585614421142713e-07,
10593
+ "loss": 3.0507,
10594
+ "step": 1512
10595
+ },
10596
+ {
10597
+ "epoch": 0.006725954047463986,
10598
+ "grad_norm": 10.375,
10599
+ "learning_rate": 3.360784182976028e-07,
10600
+ "loss": 3.0222,
10601
+ "step": 1513
10602
+ },
10603
+ {
10604
+ "epoch": 0.006730399489663236,
10605
+ "grad_norm": 12.0,
10606
+ "learning_rate": 3.363006923837785e-07,
10607
+ "loss": 2.7891,
10608
+ "step": 1514
10609
+ },
10610
+ {
10611
+ "epoch": 0.006734844931862485,
10612
+ "grad_norm": 8.6875,
10613
+ "learning_rate": 3.365229664699541e-07,
10614
+ "loss": 3.027,
10615
+ "step": 1515
10616
+ },
10617
+ {
10618
+ "epoch": 0.006739290374061734,
10619
+ "grad_norm": 9.875,
10620
+ "learning_rate": 3.367452405561298e-07,
10621
+ "loss": 2.9934,
10622
+ "step": 1516
10623
+ },
10624
+ {
10625
+ "epoch": 0.006743735816260983,
10626
+ "grad_norm": 10.3125,
10627
+ "learning_rate": 3.369675146423054e-07,
10628
+ "loss": 2.9754,
10629
+ "step": 1517
10630
+ },
10631
+ {
10632
+ "epoch": 0.006748181258460232,
10633
+ "grad_norm": 9.25,
10634
+ "learning_rate": 3.3718978872848115e-07,
10635
+ "loss": 3.003,
10636
+ "step": 1518
10637
+ },
10638
+ {
10639
+ "epoch": 0.006752626700659481,
10640
+ "grad_norm": 10.875,
10641
+ "learning_rate": 3.3741206281465677e-07,
10642
+ "loss": 2.863,
10643
+ "step": 1519
10644
+ },
10645
+ {
10646
+ "epoch": 0.00675707214285873,
10647
+ "grad_norm": 11.5,
10648
+ "learning_rate": 3.3763433690083245e-07,
10649
+ "loss": 2.8394,
10650
+ "step": 1520
10651
+ },
10652
+ {
10653
+ "epoch": 0.006761517585057979,
10654
+ "grad_norm": 11.3125,
10655
+ "learning_rate": 3.378566109870081e-07,
10656
+ "loss": 2.8357,
10657
+ "step": 1521
10658
+ },
10659
+ {
10660
+ "epoch": 0.006765963027257228,
10661
+ "grad_norm": 9.625,
10662
+ "learning_rate": 3.380788850731838e-07,
10663
+ "loss": 3.0925,
10664
+ "step": 1522
10665
+ },
10666
+ {
10667
+ "epoch": 0.006770408469456478,
10668
+ "grad_norm": 9.375,
10669
+ "learning_rate": 3.3830115915935943e-07,
10670
+ "loss": 3.2608,
10671
+ "step": 1523
10672
+ },
10673
+ {
10674
+ "epoch": 0.006774853911655727,
10675
+ "grad_norm": 9.5625,
10676
+ "learning_rate": 3.385234332455351e-07,
10677
+ "loss": 3.16,
10678
+ "step": 1524
10679
+ },
10680
+ {
10681
+ "epoch": 0.0067792993538549765,
10682
+ "grad_norm": 8.625,
10683
+ "learning_rate": 3.3874570733171073e-07,
10684
+ "loss": 3.0973,
10685
+ "step": 1525
10686
+ },
10687
+ {
10688
+ "epoch": 0.0067837447960542255,
10689
+ "grad_norm": 9.8125,
10690
+ "learning_rate": 3.3896798141788646e-07,
10691
+ "loss": 2.9419,
10692
+ "step": 1526
10693
+ },
10694
+ {
10695
+ "epoch": 0.006788190238253475,
10696
+ "grad_norm": 11.625,
10697
+ "learning_rate": 3.391902555040621e-07,
10698
+ "loss": 2.8104,
10699
+ "step": 1527
10700
+ },
10701
+ {
10702
+ "epoch": 0.006792635680452724,
10703
+ "grad_norm": 10.5,
10704
+ "learning_rate": 3.3941252959023777e-07,
10705
+ "loss": 2.9048,
10706
+ "step": 1528
10707
+ },
10708
+ {
10709
+ "epoch": 0.006797081122651973,
10710
+ "grad_norm": 10.4375,
10711
+ "learning_rate": 3.396348036764134e-07,
10712
+ "loss": 3.0021,
10713
+ "step": 1529
10714
+ },
10715
+ {
10716
+ "epoch": 0.006801526564851222,
10717
+ "grad_norm": 9.25,
10718
+ "learning_rate": 3.398570777625891e-07,
10719
+ "loss": 3.0744,
10720
+ "step": 1530
10721
+ },
10722
+ {
10723
+ "epoch": 0.006805972007050472,
10724
+ "grad_norm": 8.0,
10725
+ "learning_rate": 3.4007935184876475e-07,
10726
+ "loss": 3.2261,
10727
+ "step": 1531
10728
+ },
10729
+ {
10730
+ "epoch": 0.006810417449249721,
10731
+ "grad_norm": 11.75,
10732
+ "learning_rate": 3.4030162593494037e-07,
10733
+ "loss": 2.8506,
10734
+ "step": 1532
10735
+ },
10736
+ {
10737
+ "epoch": 0.00681486289144897,
10738
+ "grad_norm": 9.1875,
10739
+ "learning_rate": 3.4052390002111605e-07,
10740
+ "loss": 3.1766,
10741
+ "step": 1533
10742
+ },
10743
+ {
10744
+ "epoch": 0.006819308333648219,
10745
+ "grad_norm": 9.3125,
10746
+ "learning_rate": 3.407461741072917e-07,
10747
+ "loss": 3.0462,
10748
+ "step": 1534
10749
+ },
10750
+ {
10751
+ "epoch": 0.006823753775847468,
10752
+ "grad_norm": 7.3125,
10753
+ "learning_rate": 3.409684481934674e-07,
10754
+ "loss": 3.1243,
10755
+ "step": 1535
10756
+ },
10757
+ {
10758
+ "epoch": 0.006828199218046717,
10759
+ "grad_norm": 8.75,
10760
+ "learning_rate": 3.4119072227964303e-07,
10761
+ "loss": 3.0206,
10762
+ "step": 1536
10763
+ },
10764
+ {
10765
+ "epoch": 0.006832644660245966,
10766
+ "grad_norm": 11.8125,
10767
+ "learning_rate": 3.414129963658187e-07,
10768
+ "loss": 2.8674,
10769
+ "step": 1537
10770
+ },
10771
+ {
10772
+ "epoch": 0.006837090102445215,
10773
+ "grad_norm": 9.0625,
10774
+ "learning_rate": 3.416352704519944e-07,
10775
+ "loss": 3.0717,
10776
+ "step": 1538
10777
+ },
10778
+ {
10779
+ "epoch": 0.006841535544644464,
10780
+ "grad_norm": 12.0625,
10781
+ "learning_rate": 3.4185754453817006e-07,
10782
+ "loss": 2.7562,
10783
+ "step": 1539
10784
+ },
10785
+ {
10786
+ "epoch": 0.006845980986843714,
10787
+ "grad_norm": 9.5,
10788
+ "learning_rate": 3.420798186243457e-07,
10789
+ "loss": 3.0367,
10790
+ "step": 1540
10791
+ },
10792
+ {
10793
+ "epoch": 0.006850426429042963,
10794
+ "grad_norm": 11.1875,
10795
+ "learning_rate": 3.423020927105214e-07,
10796
+ "loss": 2.8669,
10797
+ "step": 1541
10798
+ },
10799
+ {
10800
+ "epoch": 0.006854871871242212,
10801
+ "grad_norm": 12.4375,
10802
+ "learning_rate": 3.4252436679669704e-07,
10803
+ "loss": 2.7301,
10804
+ "step": 1542
10805
+ },
10806
+ {
10807
+ "epoch": 0.006859317313441461,
10808
+ "grad_norm": 11.1875,
10809
+ "learning_rate": 3.427466408828727e-07,
10810
+ "loss": 2.866,
10811
+ "step": 1543
10812
+ },
10813
+ {
10814
+ "epoch": 0.00686376275564071,
10815
+ "grad_norm": 7.59375,
10816
+ "learning_rate": 3.4296891496904835e-07,
10817
+ "loss": 3.1486,
10818
+ "step": 1544
10819
+ },
10820
+ {
10821
+ "epoch": 0.0068682081978399595,
10822
+ "grad_norm": 12.0,
10823
+ "learning_rate": 3.431911890552241e-07,
10824
+ "loss": 2.8157,
10825
+ "step": 1545
10826
+ },
10827
+ {
10828
+ "epoch": 0.0068726536400392085,
10829
+ "grad_norm": 9.375,
10830
+ "learning_rate": 3.434134631413997e-07,
10831
+ "loss": 2.9621,
10832
+ "step": 1546
10833
+ },
10834
+ {
10835
+ "epoch": 0.006877099082238458,
10836
+ "grad_norm": 8.1875,
10837
+ "learning_rate": 3.436357372275753e-07,
10838
+ "loss": 3.2594,
10839
+ "step": 1547
10840
+ },
10841
+ {
10842
+ "epoch": 0.0068815445244377075,
10843
+ "grad_norm": 8.75,
10844
+ "learning_rate": 3.43858011313751e-07,
10845
+ "loss": 3.2303,
10846
+ "step": 1548
10847
+ },
10848
+ {
10849
+ "epoch": 0.006885989966636957,
10850
+ "grad_norm": 9.8125,
10851
+ "learning_rate": 3.4408028539992663e-07,
10852
+ "loss": 2.9867,
10853
+ "step": 1549
10854
+ },
10855
+ {
10856
+ "epoch": 0.006890435408836206,
10857
+ "grad_norm": 10.375,
10858
+ "learning_rate": 3.4430255948610236e-07,
10859
+ "loss": 3.0437,
10860
+ "step": 1550
10861
+ },
10862
+ {
10863
+ "epoch": 0.006894880851035455,
10864
+ "grad_norm": 10.0625,
10865
+ "learning_rate": 3.44524833572278e-07,
10866
+ "loss": 3.0103,
10867
+ "step": 1551
10868
+ },
10869
+ {
10870
+ "epoch": 0.006899326293234704,
10871
+ "grad_norm": 11.5,
10872
+ "learning_rate": 3.4474710765845366e-07,
10873
+ "loss": 2.8495,
10874
+ "step": 1552
10875
+ },
10876
+ {
10877
+ "epoch": 0.006903771735433953,
10878
+ "grad_norm": 7.15625,
10879
+ "learning_rate": 3.449693817446293e-07,
10880
+ "loss": 3.3484,
10881
+ "step": 1553
10882
+ },
10883
+ {
10884
+ "epoch": 0.006908217177633202,
10885
+ "grad_norm": 11.125,
10886
+ "learning_rate": 3.45191655830805e-07,
10887
+ "loss": 2.9013,
10888
+ "step": 1554
10889
+ },
10890
+ {
10891
+ "epoch": 0.006912662619832451,
10892
+ "grad_norm": 8.25,
10893
+ "learning_rate": 3.4541392991698064e-07,
10894
+ "loss": 3.1855,
10895
+ "step": 1555
10896
+ },
10897
+ {
10898
+ "epoch": 0.0069171080620317,
10899
+ "grad_norm": 11.6875,
10900
+ "learning_rate": 3.456362040031563e-07,
10901
+ "loss": 2.904,
10902
+ "step": 1556
10903
+ },
10904
+ {
10905
+ "epoch": 0.00692155350423095,
10906
+ "grad_norm": 9.0625,
10907
+ "learning_rate": 3.4585847808933195e-07,
10908
+ "loss": 3.1332,
10909
+ "step": 1557
10910
+ },
10911
+ {
10912
+ "epoch": 0.006925998946430199,
10913
+ "grad_norm": 8.6875,
10914
+ "learning_rate": 3.460807521755077e-07,
10915
+ "loss": 3.0808,
10916
+ "step": 1558
10917
+ },
10918
+ {
10919
+ "epoch": 0.006930444388629448,
10920
+ "grad_norm": 8.25,
10921
+ "learning_rate": 3.463030262616833e-07,
10922
+ "loss": 3.4512,
10923
+ "step": 1559
10924
+ },
10925
+ {
10926
+ "epoch": 0.006934889830828697,
10927
+ "grad_norm": 9.25,
10928
+ "learning_rate": 3.46525300347859e-07,
10929
+ "loss": 3.0079,
10930
+ "step": 1560
10931
+ },
10932
+ {
10933
+ "epoch": 0.006939335273027946,
10934
+ "grad_norm": 8.8125,
10935
+ "learning_rate": 3.4674757443403466e-07,
10936
+ "loss": 3.2632,
10937
+ "step": 1561
10938
+ },
10939
+ {
10940
+ "epoch": 0.006943780715227195,
10941
+ "grad_norm": 11.625,
10942
+ "learning_rate": 3.4696984852021033e-07,
10943
+ "loss": 2.7714,
10944
+ "step": 1562
10945
+ },
10946
+ {
10947
+ "epoch": 0.006948226157426444,
10948
+ "grad_norm": 9.375,
10949
+ "learning_rate": 3.4719212260638596e-07,
10950
+ "loss": 3.0542,
10951
+ "step": 1563
10952
+ },
10953
+ {
10954
+ "epoch": 0.006952671599625693,
10955
+ "grad_norm": 9.0625,
10956
+ "learning_rate": 3.474143966925616e-07,
10957
+ "loss": 3.121,
10958
+ "step": 1564
10959
+ },
10960
+ {
10961
+ "epoch": 0.006957117041824943,
10962
+ "grad_norm": 9.1875,
10963
+ "learning_rate": 3.476366707787373e-07,
10964
+ "loss": 2.9955,
10965
+ "step": 1565
10966
+ },
10967
+ {
10968
+ "epoch": 0.006961562484024192,
10969
+ "grad_norm": 10.1875,
10970
+ "learning_rate": 3.4785894486491294e-07,
10971
+ "loss": 3.0429,
10972
+ "step": 1566
10973
+ },
10974
+ {
10975
+ "epoch": 0.0069660079262234414,
10976
+ "grad_norm": 10.875,
10977
+ "learning_rate": 3.480812189510886e-07,
10978
+ "loss": 2.9506,
10979
+ "step": 1567
10980
+ },
10981
+ {
10982
+ "epoch": 0.0069704533684226905,
10983
+ "grad_norm": 11.75,
10984
+ "learning_rate": 3.4830349303726424e-07,
10985
+ "loss": 2.8532,
10986
+ "step": 1568
10987
+ },
10988
+ {
10989
+ "epoch": 0.00697489881062194,
10990
+ "grad_norm": 8.5625,
10991
+ "learning_rate": 3.4852576712343997e-07,
10992
+ "loss": 3.1654,
10993
+ "step": 1569
10994
+ },
10995
+ {
10996
+ "epoch": 0.006979344252821189,
10997
+ "grad_norm": 9.5,
10998
+ "learning_rate": 3.487480412096156e-07,
10999
+ "loss": 2.9531,
11000
+ "step": 1570
11001
+ },
11002
+ {
11003
+ "epoch": 0.006983789695020438,
11004
+ "grad_norm": 12.25,
11005
+ "learning_rate": 3.489703152957913e-07,
11006
+ "loss": 2.7171,
11007
+ "step": 1571
11008
+ },
11009
+ {
11010
+ "epoch": 0.006988235137219687,
11011
+ "grad_norm": 9.625,
11012
+ "learning_rate": 3.491925893819669e-07,
11013
+ "loss": 2.9861,
11014
+ "step": 1572
11015
+ },
11016
+ {
11017
+ "epoch": 0.006992680579418936,
11018
+ "grad_norm": 9.8125,
11019
+ "learning_rate": 3.4941486346814263e-07,
11020
+ "loss": 3.0001,
11021
+ "step": 1573
11022
+ },
11023
+ {
11024
+ "epoch": 0.006997126021618186,
11025
+ "grad_norm": 9.875,
11026
+ "learning_rate": 3.4963713755431826e-07,
11027
+ "loss": 2.9464,
11028
+ "step": 1574
11029
+ },
11030
+ {
11031
+ "epoch": 0.007001571463817435,
11032
+ "grad_norm": 8.6875,
11033
+ "learning_rate": 3.4985941164049393e-07,
11034
+ "loss": 3.1696,
11035
+ "step": 1575
11036
+ },
11037
+ {
11038
+ "epoch": 0.007006016906016684,
11039
+ "grad_norm": 9.8125,
11040
+ "learning_rate": 3.5008168572666956e-07,
11041
+ "loss": 2.996,
11042
+ "step": 1576
11043
+ },
11044
+ {
11045
+ "epoch": 0.007010462348215933,
11046
+ "grad_norm": 11.5,
11047
+ "learning_rate": 3.503039598128453e-07,
11048
+ "loss": 2.8594,
11049
+ "step": 1577
11050
+ },
11051
+ {
11052
+ "epoch": 0.007014907790415182,
11053
+ "grad_norm": 7.5,
11054
+ "learning_rate": 3.505262338990209e-07,
11055
+ "loss": 3.3521,
11056
+ "step": 1578
11057
+ },
11058
+ {
11059
+ "epoch": 0.007019353232614431,
11060
+ "grad_norm": 7.90625,
11061
+ "learning_rate": 3.507485079851966e-07,
11062
+ "loss": 3.2562,
11063
+ "step": 1579
11064
+ },
11065
+ {
11066
+ "epoch": 0.00702379867481368,
11067
+ "grad_norm": 8.25,
11068
+ "learning_rate": 3.509707820713722e-07,
11069
+ "loss": 3.3022,
11070
+ "step": 1580
11071
+ },
11072
+ {
11073
+ "epoch": 0.007028244117012929,
11074
+ "grad_norm": 9.625,
11075
+ "learning_rate": 3.5119305615754784e-07,
11076
+ "loss": 2.986,
11077
+ "step": 1581
11078
+ },
11079
+ {
11080
+ "epoch": 0.007032689559212179,
11081
+ "grad_norm": 8.125,
11082
+ "learning_rate": 3.5141533024372357e-07,
11083
+ "loss": 3.4736,
11084
+ "step": 1582
11085
+ },
11086
+ {
11087
+ "epoch": 0.007037135001411428,
11088
+ "grad_norm": 12.3125,
11089
+ "learning_rate": 3.516376043298992e-07,
11090
+ "loss": 2.797,
11091
+ "step": 1583
11092
+ },
11093
+ {
11094
+ "epoch": 0.007041580443610677,
11095
+ "grad_norm": 10.3125,
11096
+ "learning_rate": 3.518598784160749e-07,
11097
+ "loss": 3.002,
11098
+ "step": 1584
11099
+ },
11100
+ {
11101
+ "epoch": 0.007046025885809926,
11102
+ "grad_norm": 9.625,
11103
+ "learning_rate": 3.5208215250225055e-07,
11104
+ "loss": 2.9461,
11105
+ "step": 1585
11106
+ },
11107
+ {
11108
+ "epoch": 0.007050471328009175,
11109
+ "grad_norm": 10.25,
11110
+ "learning_rate": 3.5230442658842623e-07,
11111
+ "loss": 2.9823,
11112
+ "step": 1586
11113
+ },
11114
+ {
11115
+ "epoch": 0.0070549167702084244,
11116
+ "grad_norm": 10.4375,
11117
+ "learning_rate": 3.5252670067460186e-07,
11118
+ "loss": 2.9964,
11119
+ "step": 1587
11120
+ },
11121
+ {
11122
+ "epoch": 0.0070593622124076735,
11123
+ "grad_norm": 9.1875,
11124
+ "learning_rate": 3.5274897476077753e-07,
11125
+ "loss": 3.0419,
11126
+ "step": 1588
11127
+ },
11128
+ {
11129
+ "epoch": 0.0070638076546069226,
11130
+ "grad_norm": 10.5,
11131
+ "learning_rate": 3.529712488469532e-07,
11132
+ "loss": 2.9279,
11133
+ "step": 1589
11134
+ },
11135
+ {
11136
+ "epoch": 0.007068253096806172,
11137
+ "grad_norm": 9.375,
11138
+ "learning_rate": 3.531935229331289e-07,
11139
+ "loss": 2.9769,
11140
+ "step": 1590
11141
+ },
11142
+ {
11143
+ "epoch": 0.0070726985390054216,
11144
+ "grad_norm": 8.5625,
11145
+ "learning_rate": 3.534157970193045e-07,
11146
+ "loss": 3.1274,
11147
+ "step": 1591
11148
+ },
11149
+ {
11150
+ "epoch": 0.007077143981204671,
11151
+ "grad_norm": 10.125,
11152
+ "learning_rate": 3.5363807110548024e-07,
11153
+ "loss": 3.0245,
11154
+ "step": 1592
11155
+ },
11156
+ {
11157
+ "epoch": 0.00708158942340392,
11158
+ "grad_norm": 10.375,
11159
+ "learning_rate": 3.5386034519165587e-07,
11160
+ "loss": 2.9498,
11161
+ "step": 1593
11162
+ },
11163
+ {
11164
+ "epoch": 0.007086034865603169,
11165
+ "grad_norm": 11.125,
11166
+ "learning_rate": 3.5408261927783155e-07,
11167
+ "loss": 2.9692,
11168
+ "step": 1594
11169
+ },
11170
+ {
11171
+ "epoch": 0.007090480307802418,
11172
+ "grad_norm": 10.8125,
11173
+ "learning_rate": 3.5430489336400717e-07,
11174
+ "loss": 3.0287,
11175
+ "step": 1595
11176
+ },
11177
+ {
11178
+ "epoch": 0.007094925750001667,
11179
+ "grad_norm": 12.3125,
11180
+ "learning_rate": 3.545271674501829e-07,
11181
+ "loss": 2.7552,
11182
+ "step": 1596
11183
+ },
11184
+ {
11185
+ "epoch": 0.007099371192200916,
11186
+ "grad_norm": 10.375,
11187
+ "learning_rate": 3.5474944153635853e-07,
11188
+ "loss": 2.9677,
11189
+ "step": 1597
11190
+ },
11191
+ {
11192
+ "epoch": 0.007103816634400165,
11193
+ "grad_norm": 10.4375,
11194
+ "learning_rate": 3.5497171562253415e-07,
11195
+ "loss": 2.9863,
11196
+ "step": 1598
11197
+ },
11198
+ {
11199
+ "epoch": 0.007108262076599415,
11200
+ "grad_norm": 11.3125,
11201
+ "learning_rate": 3.5519398970870983e-07,
11202
+ "loss": 2.8698,
11203
+ "step": 1599
11204
+ },
11205
+ {
11206
+ "epoch": 0.007112707518798664,
11207
+ "grad_norm": 12.1875,
11208
+ "learning_rate": 3.5541626379488546e-07,
11209
+ "loss": 2.7243,
11210
+ "step": 1600
11211
  }
11212
  ],
11213
  "logging_steps": 1,
 
11227
  "attributes": {}
11228
  }
11229
  },
11230
+ "total_flos": 6.3339761762304e+16,
11231
  "train_batch_size": 1,
11232
  "trial_name": null,
11233
  "trial_params": null