tranhuyHoang commited on
Commit
bfd497d
·
verified ·
1 Parent(s): bee0bea

Training in progress, step 1000, checkpoint

Browse files
last-checkpoint/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:471137440a77902a41f0492a0392c17bd587418b6032b8103558056a929e9229
3
  size 91951912
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc6b44951f71e38932dd738d841d6f489b985f24e3c3523e2a7adb893f8fdc5d
3
  size 91951912
last-checkpoint/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:83dc102b2790cabadc5eacc94ef7712a399dc1dc65c59a0275d5277933e37c2a
3
  size 183991226
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c02cb680c39a21e145a8cbcca42081ce685165e39759570b73fb2f82a2e28f37
3
  size 183991226
last-checkpoint/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1ff264f99d31b522cc7e2a4eac9d38606d0c58a34c0adc74d71e0ca8b371dc36
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9196a1e708bf24d6abba41cce3f8558820acc3e50f9394c5955e29eb41ffea3d
3
  size 14244
last-checkpoint/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e5753f749b911fa96d59764d60c465d1a8b86e6805c89de87ea330f387654115
3
  size 1064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc0ead927c2d87154ea1c8600519341bfb2c3090b1e4a3776c279253dd2e14c0
3
  size 1064
last-checkpoint/trainer_state.json CHANGED
@@ -2,9 +2,9 @@
2
  "best_global_step": null,
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
- "epoch": 5e-05,
6
  "eval_steps": 500,
7
- "global_step": 500,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
@@ -3516,6 +3516,3514 @@
3516
  "eval_samples_per_second": 27.452,
3517
  "eval_steps_per_second": 1.716,
3518
  "step": 500
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3519
  }
3520
  ],
3521
  "logging_steps": 1,
 
2
  "best_global_step": null,
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
+ "epoch": 0.0001,
6
  "eval_steps": 500,
7
+ "global_step": 1000,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
 
3516
  "eval_samples_per_second": 27.452,
3517
  "eval_steps_per_second": 1.716,
3518
  "step": 500
3519
+ },
3520
+ {
3521
+ "epoch": 5.01e-05,
3522
+ "grad_norm": 12.755992889404297,
3523
+ "learning_rate": 5e-06,
3524
+ "loss": 72.125,
3525
+ "step": 501
3526
+ },
3527
+ {
3528
+ "epoch": 5.02e-05,
3529
+ "grad_norm": 13.567408561706543,
3530
+ "learning_rate": 5.01e-06,
3531
+ "loss": 72.1875,
3532
+ "step": 502
3533
+ },
3534
+ {
3535
+ "epoch": 5.03e-05,
3536
+ "grad_norm": 13.13305377960205,
3537
+ "learning_rate": 5.02e-06,
3538
+ "loss": 72.125,
3539
+ "step": 503
3540
+ },
3541
+ {
3542
+ "epoch": 5.04e-05,
3543
+ "grad_norm": 13.306023597717285,
3544
+ "learning_rate": 5.03e-06,
3545
+ "loss": 72.0625,
3546
+ "step": 504
3547
+ },
3548
+ {
3549
+ "epoch": 5.05e-05,
3550
+ "grad_norm": 13.41368579864502,
3551
+ "learning_rate": 5.04e-06,
3552
+ "loss": 72.1875,
3553
+ "step": 505
3554
+ },
3555
+ {
3556
+ "epoch": 5.06e-05,
3557
+ "grad_norm": 13.642829895019531,
3558
+ "learning_rate": 5.05e-06,
3559
+ "loss": 72.0625,
3560
+ "step": 506
3561
+ },
3562
+ {
3563
+ "epoch": 5.07e-05,
3564
+ "grad_norm": 13.747808456420898,
3565
+ "learning_rate": 5.06e-06,
3566
+ "loss": 72.0625,
3567
+ "step": 507
3568
+ },
3569
+ {
3570
+ "epoch": 5.08e-05,
3571
+ "grad_norm": 13.421524047851562,
3572
+ "learning_rate": 5.070000000000001e-06,
3573
+ "loss": 72.125,
3574
+ "step": 508
3575
+ },
3576
+ {
3577
+ "epoch": 5.09e-05,
3578
+ "grad_norm": 13.79678726196289,
3579
+ "learning_rate": 5.0800000000000005e-06,
3580
+ "loss": 72.0,
3581
+ "step": 509
3582
+ },
3583
+ {
3584
+ "epoch": 5.1e-05,
3585
+ "grad_norm": 13.362953186035156,
3586
+ "learning_rate": 5.09e-06,
3587
+ "loss": 72.0,
3588
+ "step": 510
3589
+ },
3590
+ {
3591
+ "epoch": 5.11e-05,
3592
+ "grad_norm": 13.886173248291016,
3593
+ "learning_rate": 5.0999999999999995e-06,
3594
+ "loss": 72.0,
3595
+ "step": 511
3596
+ },
3597
+ {
3598
+ "epoch": 5.12e-05,
3599
+ "grad_norm": 13.49032211303711,
3600
+ "learning_rate": 5.11e-06,
3601
+ "loss": 72.0625,
3602
+ "step": 512
3603
+ },
3604
+ {
3605
+ "epoch": 5.13e-05,
3606
+ "grad_norm": 13.661921501159668,
3607
+ "learning_rate": 5.12e-06,
3608
+ "loss": 72.0625,
3609
+ "step": 513
3610
+ },
3611
+ {
3612
+ "epoch": 5.14e-05,
3613
+ "grad_norm": 13.80624771118164,
3614
+ "learning_rate": 5.13e-06,
3615
+ "loss": 72.0625,
3616
+ "step": 514
3617
+ },
3618
+ {
3619
+ "epoch": 5.15e-05,
3620
+ "grad_norm": 13.84620475769043,
3621
+ "learning_rate": 5.140000000000001e-06,
3622
+ "loss": 72.0,
3623
+ "step": 515
3624
+ },
3625
+ {
3626
+ "epoch": 5.16e-05,
3627
+ "grad_norm": 13.160120010375977,
3628
+ "learning_rate": 5.15e-06,
3629
+ "loss": 72.0625,
3630
+ "step": 516
3631
+ },
3632
+ {
3633
+ "epoch": 5.17e-05,
3634
+ "grad_norm": 13.24973201751709,
3635
+ "learning_rate": 5.16e-06,
3636
+ "loss": 72.125,
3637
+ "step": 517
3638
+ },
3639
+ {
3640
+ "epoch": 5.18e-05,
3641
+ "grad_norm": 14.063905715942383,
3642
+ "learning_rate": 5.17e-06,
3643
+ "loss": 72.0,
3644
+ "step": 518
3645
+ },
3646
+ {
3647
+ "epoch": 5.19e-05,
3648
+ "grad_norm": 13.477499008178711,
3649
+ "learning_rate": 5.18e-06,
3650
+ "loss": 72.0,
3651
+ "step": 519
3652
+ },
3653
+ {
3654
+ "epoch": 5.2e-05,
3655
+ "grad_norm": 13.442255020141602,
3656
+ "learning_rate": 5.19e-06,
3657
+ "loss": 72.0,
3658
+ "step": 520
3659
+ },
3660
+ {
3661
+ "epoch": 5.21e-05,
3662
+ "grad_norm": 13.900424003601074,
3663
+ "learning_rate": 5.2e-06,
3664
+ "loss": 72.0,
3665
+ "step": 521
3666
+ },
3667
+ {
3668
+ "epoch": 5.22e-05,
3669
+ "grad_norm": 14.005378723144531,
3670
+ "learning_rate": 5.21e-06,
3671
+ "loss": 72.0,
3672
+ "step": 522
3673
+ },
3674
+ {
3675
+ "epoch": 5.23e-05,
3676
+ "grad_norm": 13.34553337097168,
3677
+ "learning_rate": 5.22e-06,
3678
+ "loss": 71.9375,
3679
+ "step": 523
3680
+ },
3681
+ {
3682
+ "epoch": 5.24e-05,
3683
+ "grad_norm": 13.30509090423584,
3684
+ "learning_rate": 5.23e-06,
3685
+ "loss": 71.9375,
3686
+ "step": 524
3687
+ },
3688
+ {
3689
+ "epoch": 5.25e-05,
3690
+ "grad_norm": 13.428346633911133,
3691
+ "learning_rate": 5.240000000000001e-06,
3692
+ "loss": 72.0,
3693
+ "step": 525
3694
+ },
3695
+ {
3696
+ "epoch": 5.26e-05,
3697
+ "grad_norm": 13.481391906738281,
3698
+ "learning_rate": 5.2500000000000006e-06,
3699
+ "loss": 72.0,
3700
+ "step": 526
3701
+ },
3702
+ {
3703
+ "epoch": 5.27e-05,
3704
+ "grad_norm": 13.900728225708008,
3705
+ "learning_rate": 5.26e-06,
3706
+ "loss": 72.0,
3707
+ "step": 527
3708
+ },
3709
+ {
3710
+ "epoch": 5.28e-05,
3711
+ "grad_norm": 12.912371635437012,
3712
+ "learning_rate": 5.2699999999999995e-06,
3713
+ "loss": 72.0,
3714
+ "step": 528
3715
+ },
3716
+ {
3717
+ "epoch": 5.29e-05,
3718
+ "grad_norm": 13.814693450927734,
3719
+ "learning_rate": 5.28e-06,
3720
+ "loss": 72.0,
3721
+ "step": 529
3722
+ },
3723
+ {
3724
+ "epoch": 5.3e-05,
3725
+ "grad_norm": 13.151686668395996,
3726
+ "learning_rate": 5.29e-06,
3727
+ "loss": 72.0,
3728
+ "step": 530
3729
+ },
3730
+ {
3731
+ "epoch": 5.31e-05,
3732
+ "grad_norm": 13.657590866088867,
3733
+ "learning_rate": 5.3e-06,
3734
+ "loss": 71.9375,
3735
+ "step": 531
3736
+ },
3737
+ {
3738
+ "epoch": 5.32e-05,
3739
+ "grad_norm": 13.253593444824219,
3740
+ "learning_rate": 5.31e-06,
3741
+ "loss": 71.9375,
3742
+ "step": 532
3743
+ },
3744
+ {
3745
+ "epoch": 5.33e-05,
3746
+ "grad_norm": 13.740532875061035,
3747
+ "learning_rate": 5.32e-06,
3748
+ "loss": 71.9375,
3749
+ "step": 533
3750
+ },
3751
+ {
3752
+ "epoch": 5.34e-05,
3753
+ "grad_norm": 14.07413387298584,
3754
+ "learning_rate": 5.33e-06,
3755
+ "loss": 71.875,
3756
+ "step": 534
3757
+ },
3758
+ {
3759
+ "epoch": 5.35e-05,
3760
+ "grad_norm": 13.681927680969238,
3761
+ "learning_rate": 5.34e-06,
3762
+ "loss": 72.0,
3763
+ "step": 535
3764
+ },
3765
+ {
3766
+ "epoch": 5.36e-05,
3767
+ "grad_norm": 13.324553489685059,
3768
+ "learning_rate": 5.3500000000000004e-06,
3769
+ "loss": 71.9375,
3770
+ "step": 536
3771
+ },
3772
+ {
3773
+ "epoch": 5.37e-05,
3774
+ "grad_norm": 13.417908668518066,
3775
+ "learning_rate": 5.36e-06,
3776
+ "loss": 71.9375,
3777
+ "step": 537
3778
+ },
3779
+ {
3780
+ "epoch": 5.38e-05,
3781
+ "grad_norm": 12.827329635620117,
3782
+ "learning_rate": 5.37e-06,
3783
+ "loss": 71.9375,
3784
+ "step": 538
3785
+ },
3786
+ {
3787
+ "epoch": 5.39e-05,
3788
+ "grad_norm": 13.676140785217285,
3789
+ "learning_rate": 5.38e-06,
3790
+ "loss": 71.9375,
3791
+ "step": 539
3792
+ },
3793
+ {
3794
+ "epoch": 5.4e-05,
3795
+ "grad_norm": 13.923197746276855,
3796
+ "learning_rate": 5.39e-06,
3797
+ "loss": 71.875,
3798
+ "step": 540
3799
+ },
3800
+ {
3801
+ "epoch": 5.41e-05,
3802
+ "grad_norm": 13.1552152633667,
3803
+ "learning_rate": 5.4e-06,
3804
+ "loss": 71.875,
3805
+ "step": 541
3806
+ },
3807
+ {
3808
+ "epoch": 5.42e-05,
3809
+ "grad_norm": 13.265161514282227,
3810
+ "learning_rate": 5.410000000000001e-06,
3811
+ "loss": 71.9375,
3812
+ "step": 542
3813
+ },
3814
+ {
3815
+ "epoch": 5.43e-05,
3816
+ "grad_norm": 13.648545265197754,
3817
+ "learning_rate": 5.420000000000001e-06,
3818
+ "loss": 71.875,
3819
+ "step": 543
3820
+ },
3821
+ {
3822
+ "epoch": 5.44e-05,
3823
+ "grad_norm": 14.054911613464355,
3824
+ "learning_rate": 5.43e-06,
3825
+ "loss": 71.75,
3826
+ "step": 544
3827
+ },
3828
+ {
3829
+ "epoch": 5.45e-05,
3830
+ "grad_norm": 13.410968780517578,
3831
+ "learning_rate": 5.44e-06,
3832
+ "loss": 71.8125,
3833
+ "step": 545
3834
+ },
3835
+ {
3836
+ "epoch": 5.46e-05,
3837
+ "grad_norm": 14.150003433227539,
3838
+ "learning_rate": 5.45e-06,
3839
+ "loss": 71.8125,
3840
+ "step": 546
3841
+ },
3842
+ {
3843
+ "epoch": 5.47e-05,
3844
+ "grad_norm": 13.140257835388184,
3845
+ "learning_rate": 5.46e-06,
3846
+ "loss": 71.75,
3847
+ "step": 547
3848
+ },
3849
+ {
3850
+ "epoch": 5.48e-05,
3851
+ "grad_norm": 13.407617568969727,
3852
+ "learning_rate": 5.47e-06,
3853
+ "loss": 71.875,
3854
+ "step": 548
3855
+ },
3856
+ {
3857
+ "epoch": 5.49e-05,
3858
+ "grad_norm": 13.361193656921387,
3859
+ "learning_rate": 5.48e-06,
3860
+ "loss": 71.8125,
3861
+ "step": 549
3862
+ },
3863
+ {
3864
+ "epoch": 5.5e-05,
3865
+ "grad_norm": 13.742301940917969,
3866
+ "learning_rate": 5.49e-06,
3867
+ "loss": 71.75,
3868
+ "step": 550
3869
+ },
3870
+ {
3871
+ "epoch": 5.51e-05,
3872
+ "grad_norm": 13.647250175476074,
3873
+ "learning_rate": 5.5e-06,
3874
+ "loss": 71.8125,
3875
+ "step": 551
3876
+ },
3877
+ {
3878
+ "epoch": 5.52e-05,
3879
+ "grad_norm": 13.841861724853516,
3880
+ "learning_rate": 5.51e-06,
3881
+ "loss": 71.625,
3882
+ "step": 552
3883
+ },
3884
+ {
3885
+ "epoch": 5.53e-05,
3886
+ "grad_norm": 13.940226554870605,
3887
+ "learning_rate": 5.5200000000000005e-06,
3888
+ "loss": 71.5625,
3889
+ "step": 553
3890
+ },
3891
+ {
3892
+ "epoch": 5.54e-05,
3893
+ "grad_norm": 14.098892211914062,
3894
+ "learning_rate": 5.53e-06,
3895
+ "loss": 71.875,
3896
+ "step": 554
3897
+ },
3898
+ {
3899
+ "epoch": 5.55e-05,
3900
+ "grad_norm": 13.767340660095215,
3901
+ "learning_rate": 5.54e-06,
3902
+ "loss": 71.8125,
3903
+ "step": 555
3904
+ },
3905
+ {
3906
+ "epoch": 5.56e-05,
3907
+ "grad_norm": 13.485721588134766,
3908
+ "learning_rate": 5.55e-06,
3909
+ "loss": 71.5625,
3910
+ "step": 556
3911
+ },
3912
+ {
3913
+ "epoch": 5.57e-05,
3914
+ "grad_norm": 13.09888744354248,
3915
+ "learning_rate": 5.56e-06,
3916
+ "loss": 71.75,
3917
+ "step": 557
3918
+ },
3919
+ {
3920
+ "epoch": 5.58e-05,
3921
+ "grad_norm": 13.47565746307373,
3922
+ "learning_rate": 5.57e-06,
3923
+ "loss": 71.75,
3924
+ "step": 558
3925
+ },
3926
+ {
3927
+ "epoch": 5.59e-05,
3928
+ "grad_norm": 13.305706977844238,
3929
+ "learning_rate": 5.580000000000001e-06,
3930
+ "loss": 71.5625,
3931
+ "step": 559
3932
+ },
3933
+ {
3934
+ "epoch": 5.6e-05,
3935
+ "grad_norm": 13.260443687438965,
3936
+ "learning_rate": 5.59e-06,
3937
+ "loss": 71.5625,
3938
+ "step": 560
3939
+ },
3940
+ {
3941
+ "epoch": 5.61e-05,
3942
+ "grad_norm": 13.876309394836426,
3943
+ "learning_rate": 5.6e-06,
3944
+ "loss": 71.625,
3945
+ "step": 561
3946
+ },
3947
+ {
3948
+ "epoch": 5.62e-05,
3949
+ "grad_norm": 13.163431167602539,
3950
+ "learning_rate": 5.61e-06,
3951
+ "loss": 71.5625,
3952
+ "step": 562
3953
+ },
3954
+ {
3955
+ "epoch": 5.63e-05,
3956
+ "grad_norm": 13.940244674682617,
3957
+ "learning_rate": 5.62e-06,
3958
+ "loss": 71.5625,
3959
+ "step": 563
3960
+ },
3961
+ {
3962
+ "epoch": 5.64e-05,
3963
+ "grad_norm": 14.002641677856445,
3964
+ "learning_rate": 5.63e-06,
3965
+ "loss": 71.625,
3966
+ "step": 564
3967
+ },
3968
+ {
3969
+ "epoch": 5.65e-05,
3970
+ "grad_norm": 13.585519790649414,
3971
+ "learning_rate": 5.64e-06,
3972
+ "loss": 71.5,
3973
+ "step": 565
3974
+ },
3975
+ {
3976
+ "epoch": 5.66e-05,
3977
+ "grad_norm": 13.694162368774414,
3978
+ "learning_rate": 5.65e-06,
3979
+ "loss": 71.625,
3980
+ "step": 566
3981
+ },
3982
+ {
3983
+ "epoch": 5.67e-05,
3984
+ "grad_norm": 13.502567291259766,
3985
+ "learning_rate": 5.66e-06,
3986
+ "loss": 71.625,
3987
+ "step": 567
3988
+ },
3989
+ {
3990
+ "epoch": 5.68e-05,
3991
+ "grad_norm": 13.50383186340332,
3992
+ "learning_rate": 5.67e-06,
3993
+ "loss": 71.625,
3994
+ "step": 568
3995
+ },
3996
+ {
3997
+ "epoch": 5.69e-05,
3998
+ "grad_norm": 13.493083953857422,
3999
+ "learning_rate": 5.68e-06,
4000
+ "loss": 71.5625,
4001
+ "step": 569
4002
+ },
4003
+ {
4004
+ "epoch": 5.7e-05,
4005
+ "grad_norm": 13.770458221435547,
4006
+ "learning_rate": 5.690000000000001e-06,
4007
+ "loss": 71.4375,
4008
+ "step": 570
4009
+ },
4010
+ {
4011
+ "epoch": 5.71e-05,
4012
+ "grad_norm": 13.740925788879395,
4013
+ "learning_rate": 5.7000000000000005e-06,
4014
+ "loss": 71.5625,
4015
+ "step": 571
4016
+ },
4017
+ {
4018
+ "epoch": 5.72e-05,
4019
+ "grad_norm": 13.495709419250488,
4020
+ "learning_rate": 5.7099999999999995e-06,
4021
+ "loss": 71.5625,
4022
+ "step": 572
4023
+ },
4024
+ {
4025
+ "epoch": 5.73e-05,
4026
+ "grad_norm": 14.037687301635742,
4027
+ "learning_rate": 5.72e-06,
4028
+ "loss": 71.5,
4029
+ "step": 573
4030
+ },
4031
+ {
4032
+ "epoch": 5.74e-05,
4033
+ "grad_norm": 13.584688186645508,
4034
+ "learning_rate": 5.73e-06,
4035
+ "loss": 71.5625,
4036
+ "step": 574
4037
+ },
4038
+ {
4039
+ "epoch": 5.75e-05,
4040
+ "grad_norm": 13.670408248901367,
4041
+ "learning_rate": 5.74e-06,
4042
+ "loss": 71.5,
4043
+ "step": 575
4044
+ },
4045
+ {
4046
+ "epoch": 5.76e-05,
4047
+ "grad_norm": 13.831013679504395,
4048
+ "learning_rate": 5.75e-06,
4049
+ "loss": 71.5,
4050
+ "step": 576
4051
+ },
4052
+ {
4053
+ "epoch": 5.77e-05,
4054
+ "grad_norm": 13.588455200195312,
4055
+ "learning_rate": 5.76e-06,
4056
+ "loss": 71.5,
4057
+ "step": 577
4058
+ },
4059
+ {
4060
+ "epoch": 5.78e-05,
4061
+ "grad_norm": 14.016668319702148,
4062
+ "learning_rate": 5.77e-06,
4063
+ "loss": 71.5,
4064
+ "step": 578
4065
+ },
4066
+ {
4067
+ "epoch": 5.79e-05,
4068
+ "grad_norm": 13.188124656677246,
4069
+ "learning_rate": 5.78e-06,
4070
+ "loss": 71.5625,
4071
+ "step": 579
4072
+ },
4073
+ {
4074
+ "epoch": 5.8e-05,
4075
+ "grad_norm": 14.161450386047363,
4076
+ "learning_rate": 5.7900000000000005e-06,
4077
+ "loss": 71.5,
4078
+ "step": 580
4079
+ },
4080
+ {
4081
+ "epoch": 5.81e-05,
4082
+ "grad_norm": 13.826251983642578,
4083
+ "learning_rate": 5.8e-06,
4084
+ "loss": 71.4375,
4085
+ "step": 581
4086
+ },
4087
+ {
4088
+ "epoch": 5.82e-05,
4089
+ "grad_norm": 13.522049903869629,
4090
+ "learning_rate": 5.81e-06,
4091
+ "loss": 71.5,
4092
+ "step": 582
4093
+ },
4094
+ {
4095
+ "epoch": 5.83e-05,
4096
+ "grad_norm": 13.675570487976074,
4097
+ "learning_rate": 5.82e-06,
4098
+ "loss": 71.5625,
4099
+ "step": 583
4100
+ },
4101
+ {
4102
+ "epoch": 5.84e-05,
4103
+ "grad_norm": 13.925761222839355,
4104
+ "learning_rate": 5.83e-06,
4105
+ "loss": 71.5,
4106
+ "step": 584
4107
+ },
4108
+ {
4109
+ "epoch": 5.85e-05,
4110
+ "grad_norm": 13.596080780029297,
4111
+ "learning_rate": 5.84e-06,
4112
+ "loss": 71.5,
4113
+ "step": 585
4114
+ },
4115
+ {
4116
+ "epoch": 5.86e-05,
4117
+ "grad_norm": 13.48724365234375,
4118
+ "learning_rate": 5.85e-06,
4119
+ "loss": 71.5,
4120
+ "step": 586
4121
+ },
4122
+ {
4123
+ "epoch": 5.87e-05,
4124
+ "grad_norm": 13.393336296081543,
4125
+ "learning_rate": 5.860000000000001e-06,
4126
+ "loss": 71.3125,
4127
+ "step": 587
4128
+ },
4129
+ {
4130
+ "epoch": 5.88e-05,
4131
+ "grad_norm": 13.0198392868042,
4132
+ "learning_rate": 5.87e-06,
4133
+ "loss": 71.5,
4134
+ "step": 588
4135
+ },
4136
+ {
4137
+ "epoch": 5.89e-05,
4138
+ "grad_norm": 14.366602897644043,
4139
+ "learning_rate": 5.88e-06,
4140
+ "loss": 71.3125,
4141
+ "step": 589
4142
+ },
4143
+ {
4144
+ "epoch": 5.9e-05,
4145
+ "grad_norm": 13.80266284942627,
4146
+ "learning_rate": 5.89e-06,
4147
+ "loss": 71.4375,
4148
+ "step": 590
4149
+ },
4150
+ {
4151
+ "epoch": 5.91e-05,
4152
+ "grad_norm": 14.043784141540527,
4153
+ "learning_rate": 5.899999999999999e-06,
4154
+ "loss": 71.4375,
4155
+ "step": 591
4156
+ },
4157
+ {
4158
+ "epoch": 5.92e-05,
4159
+ "grad_norm": 14.366748809814453,
4160
+ "learning_rate": 5.91e-06,
4161
+ "loss": 71.5,
4162
+ "step": 592
4163
+ },
4164
+ {
4165
+ "epoch": 5.93e-05,
4166
+ "grad_norm": 13.664860725402832,
4167
+ "learning_rate": 5.92e-06,
4168
+ "loss": 71.4375,
4169
+ "step": 593
4170
+ },
4171
+ {
4172
+ "epoch": 5.94e-05,
4173
+ "grad_norm": 12.838698387145996,
4174
+ "learning_rate": 5.93e-06,
4175
+ "loss": 71.4375,
4176
+ "step": 594
4177
+ },
4178
+ {
4179
+ "epoch": 5.95e-05,
4180
+ "grad_norm": 13.87798023223877,
4181
+ "learning_rate": 5.94e-06,
4182
+ "loss": 71.3125,
4183
+ "step": 595
4184
+ },
4185
+ {
4186
+ "epoch": 5.96e-05,
4187
+ "grad_norm": 13.455665588378906,
4188
+ "learning_rate": 5.950000000000001e-06,
4189
+ "loss": 71.3125,
4190
+ "step": 596
4191
+ },
4192
+ {
4193
+ "epoch": 5.97e-05,
4194
+ "grad_norm": 13.53802490234375,
4195
+ "learning_rate": 5.96e-06,
4196
+ "loss": 71.4375,
4197
+ "step": 597
4198
+ },
4199
+ {
4200
+ "epoch": 5.98e-05,
4201
+ "grad_norm": 14.010590553283691,
4202
+ "learning_rate": 5.9700000000000004e-06,
4203
+ "loss": 71.5,
4204
+ "step": 598
4205
+ },
4206
+ {
4207
+ "epoch": 5.99e-05,
4208
+ "grad_norm": 13.441306114196777,
4209
+ "learning_rate": 5.98e-06,
4210
+ "loss": 71.375,
4211
+ "step": 599
4212
+ },
4213
+ {
4214
+ "epoch": 6e-05,
4215
+ "grad_norm": 13.627511024475098,
4216
+ "learning_rate": 5.989999999999999e-06,
4217
+ "loss": 71.4375,
4218
+ "step": 600
4219
+ },
4220
+ {
4221
+ "epoch": 6.01e-05,
4222
+ "grad_norm": 13.531881332397461,
4223
+ "learning_rate": 6e-06,
4224
+ "loss": 71.4375,
4225
+ "step": 601
4226
+ },
4227
+ {
4228
+ "epoch": 6.02e-05,
4229
+ "grad_norm": 13.889076232910156,
4230
+ "learning_rate": 6.010000000000001e-06,
4231
+ "loss": 71.3125,
4232
+ "step": 602
4233
+ },
4234
+ {
4235
+ "epoch": 6.03e-05,
4236
+ "grad_norm": 13.621227264404297,
4237
+ "learning_rate": 6.02e-06,
4238
+ "loss": 71.25,
4239
+ "step": 603
4240
+ },
4241
+ {
4242
+ "epoch": 6.04e-05,
4243
+ "grad_norm": 13.696484565734863,
4244
+ "learning_rate": 6.030000000000001e-06,
4245
+ "loss": 71.375,
4246
+ "step": 604
4247
+ },
4248
+ {
4249
+ "epoch": 6.05e-05,
4250
+ "grad_norm": 13.138842582702637,
4251
+ "learning_rate": 6.04e-06,
4252
+ "loss": 71.3125,
4253
+ "step": 605
4254
+ },
4255
+ {
4256
+ "epoch": 6.06e-05,
4257
+ "grad_norm": 14.300238609313965,
4258
+ "learning_rate": 6.05e-06,
4259
+ "loss": 71.25,
4260
+ "step": 606
4261
+ },
4262
+ {
4263
+ "epoch": 6.07e-05,
4264
+ "grad_norm": 13.133456230163574,
4265
+ "learning_rate": 6.0600000000000004e-06,
4266
+ "loss": 71.4375,
4267
+ "step": 607
4268
+ },
4269
+ {
4270
+ "epoch": 6.08e-05,
4271
+ "grad_norm": 13.548200607299805,
4272
+ "learning_rate": 6.0699999999999995e-06,
4273
+ "loss": 71.375,
4274
+ "step": 608
4275
+ },
4276
+ {
4277
+ "epoch": 6.09e-05,
4278
+ "grad_norm": 13.798404693603516,
4279
+ "learning_rate": 6.08e-06,
4280
+ "loss": 71.25,
4281
+ "step": 609
4282
+ },
4283
+ {
4284
+ "epoch": 6.1e-05,
4285
+ "grad_norm": 13.428898811340332,
4286
+ "learning_rate": 6.09e-06,
4287
+ "loss": 71.375,
4288
+ "step": 610
4289
+ },
4290
+ {
4291
+ "epoch": 6.11e-05,
4292
+ "grad_norm": 13.93244457244873,
4293
+ "learning_rate": 6.1e-06,
4294
+ "loss": 71.25,
4295
+ "step": 611
4296
+ },
4297
+ {
4298
+ "epoch": 6.12e-05,
4299
+ "grad_norm": 14.050407409667969,
4300
+ "learning_rate": 6.11e-06,
4301
+ "loss": 71.375,
4302
+ "step": 612
4303
+ },
4304
+ {
4305
+ "epoch": 6.13e-05,
4306
+ "grad_norm": 13.533824920654297,
4307
+ "learning_rate": 6.120000000000001e-06,
4308
+ "loss": 71.3125,
4309
+ "step": 613
4310
+ },
4311
+ {
4312
+ "epoch": 6.14e-05,
4313
+ "grad_norm": 13.453178405761719,
4314
+ "learning_rate": 6.13e-06,
4315
+ "loss": 71.25,
4316
+ "step": 614
4317
+ },
4318
+ {
4319
+ "epoch": 6.15e-05,
4320
+ "grad_norm": 13.044660568237305,
4321
+ "learning_rate": 6.1400000000000005e-06,
4322
+ "loss": 71.4375,
4323
+ "step": 615
4324
+ },
4325
+ {
4326
+ "epoch": 6.16e-05,
4327
+ "grad_norm": 13.782415390014648,
4328
+ "learning_rate": 6.15e-06,
4329
+ "loss": 71.25,
4330
+ "step": 616
4331
+ },
4332
+ {
4333
+ "epoch": 6.17e-05,
4334
+ "grad_norm": 14.078405380249023,
4335
+ "learning_rate": 6.1599999999999995e-06,
4336
+ "loss": 71.25,
4337
+ "step": 617
4338
+ },
4339
+ {
4340
+ "epoch": 6.18e-05,
4341
+ "grad_norm": 13.621548652648926,
4342
+ "learning_rate": 6.17e-06,
4343
+ "loss": 71.3125,
4344
+ "step": 618
4345
+ },
4346
+ {
4347
+ "epoch": 6.19e-05,
4348
+ "grad_norm": 13.817481994628906,
4349
+ "learning_rate": 6.180000000000001e-06,
4350
+ "loss": 71.3125,
4351
+ "step": 619
4352
+ },
4353
+ {
4354
+ "epoch": 6.2e-05,
4355
+ "grad_norm": 13.902446746826172,
4356
+ "learning_rate": 6.19e-06,
4357
+ "loss": 71.125,
4358
+ "step": 620
4359
+ },
4360
+ {
4361
+ "epoch": 6.21e-05,
4362
+ "grad_norm": 14.227055549621582,
4363
+ "learning_rate": 6.2e-06,
4364
+ "loss": 71.0625,
4365
+ "step": 621
4366
+ },
4367
+ {
4368
+ "epoch": 6.22e-05,
4369
+ "grad_norm": 14.249224662780762,
4370
+ "learning_rate": 6.21e-06,
4371
+ "loss": 71.0,
4372
+ "step": 622
4373
+ },
4374
+ {
4375
+ "epoch": 6.23e-05,
4376
+ "grad_norm": 13.914899826049805,
4377
+ "learning_rate": 6.22e-06,
4378
+ "loss": 71.1875,
4379
+ "step": 623
4380
+ },
4381
+ {
4382
+ "epoch": 6.24e-05,
4383
+ "grad_norm": 13.97021770477295,
4384
+ "learning_rate": 6.2300000000000005e-06,
4385
+ "loss": 71.0,
4386
+ "step": 624
4387
+ },
4388
+ {
4389
+ "epoch": 6.25e-05,
4390
+ "grad_norm": 13.667683601379395,
4391
+ "learning_rate": 6.2399999999999995e-06,
4392
+ "loss": 71.0625,
4393
+ "step": 625
4394
+ },
4395
+ {
4396
+ "epoch": 6.26e-05,
4397
+ "grad_norm": 13.566893577575684,
4398
+ "learning_rate": 6.25e-06,
4399
+ "loss": 71.1875,
4400
+ "step": 626
4401
+ },
4402
+ {
4403
+ "epoch": 6.27e-05,
4404
+ "grad_norm": 14.130186080932617,
4405
+ "learning_rate": 6.26e-06,
4406
+ "loss": 71.0625,
4407
+ "step": 627
4408
+ },
4409
+ {
4410
+ "epoch": 6.28e-05,
4411
+ "grad_norm": 13.867085456848145,
4412
+ "learning_rate": 6.269999999999999e-06,
4413
+ "loss": 71.1875,
4414
+ "step": 628
4415
+ },
4416
+ {
4417
+ "epoch": 6.29e-05,
4418
+ "grad_norm": 14.15766716003418,
4419
+ "learning_rate": 6.28e-06,
4420
+ "loss": 71.0625,
4421
+ "step": 629
4422
+ },
4423
+ {
4424
+ "epoch": 6.3e-05,
4425
+ "grad_norm": 13.474671363830566,
4426
+ "learning_rate": 6.290000000000001e-06,
4427
+ "loss": 71.1875,
4428
+ "step": 630
4429
+ },
4430
+ {
4431
+ "epoch": 6.31e-05,
4432
+ "grad_norm": 13.981532096862793,
4433
+ "learning_rate": 6.3e-06,
4434
+ "loss": 71.0,
4435
+ "step": 631
4436
+ },
4437
+ {
4438
+ "epoch": 6.32e-05,
4439
+ "grad_norm": 13.78842544555664,
4440
+ "learning_rate": 6.3100000000000006e-06,
4441
+ "loss": 71.0625,
4442
+ "step": 632
4443
+ },
4444
+ {
4445
+ "epoch": 6.33e-05,
4446
+ "grad_norm": 14.002915382385254,
4447
+ "learning_rate": 6.3200000000000005e-06,
4448
+ "loss": 71.1875,
4449
+ "step": 633
4450
+ },
4451
+ {
4452
+ "epoch": 6.34e-05,
4453
+ "grad_norm": 13.505497932434082,
4454
+ "learning_rate": 6.3299999999999995e-06,
4455
+ "loss": 71.125,
4456
+ "step": 634
4457
+ },
4458
+ {
4459
+ "epoch": 6.35e-05,
4460
+ "grad_norm": 13.425030708312988,
4461
+ "learning_rate": 6.34e-06,
4462
+ "loss": 71.0625,
4463
+ "step": 635
4464
+ },
4465
+ {
4466
+ "epoch": 6.36e-05,
4467
+ "grad_norm": 13.330291748046875,
4468
+ "learning_rate": 6.350000000000001e-06,
4469
+ "loss": 71.1875,
4470
+ "step": 636
4471
+ },
4472
+ {
4473
+ "epoch": 6.37e-05,
4474
+ "grad_norm": 14.178326606750488,
4475
+ "learning_rate": 6.36e-06,
4476
+ "loss": 71.0625,
4477
+ "step": 637
4478
+ },
4479
+ {
4480
+ "epoch": 6.38e-05,
4481
+ "grad_norm": 13.922074317932129,
4482
+ "learning_rate": 6.37e-06,
4483
+ "loss": 71.0,
4484
+ "step": 638
4485
+ },
4486
+ {
4487
+ "epoch": 6.39e-05,
4488
+ "grad_norm": 13.808489799499512,
4489
+ "learning_rate": 6.38e-06,
4490
+ "loss": 71.0625,
4491
+ "step": 639
4492
+ },
4493
+ {
4494
+ "epoch": 6.4e-05,
4495
+ "grad_norm": 13.859967231750488,
4496
+ "learning_rate": 6.39e-06,
4497
+ "loss": 71.0625,
4498
+ "step": 640
4499
+ },
4500
+ {
4501
+ "epoch": 6.41e-05,
4502
+ "grad_norm": 14.012020111083984,
4503
+ "learning_rate": 6.4000000000000006e-06,
4504
+ "loss": 71.0,
4505
+ "step": 641
4506
+ },
4507
+ {
4508
+ "epoch": 6.42e-05,
4509
+ "grad_norm": 13.656329154968262,
4510
+ "learning_rate": 6.41e-06,
4511
+ "loss": 71.0,
4512
+ "step": 642
4513
+ },
4514
+ {
4515
+ "epoch": 6.43e-05,
4516
+ "grad_norm": 13.934961318969727,
4517
+ "learning_rate": 6.42e-06,
4518
+ "loss": 71.0625,
4519
+ "step": 643
4520
+ },
4521
+ {
4522
+ "epoch": 6.44e-05,
4523
+ "grad_norm": 13.582793235778809,
4524
+ "learning_rate": 6.43e-06,
4525
+ "loss": 71.0,
4526
+ "step": 644
4527
+ },
4528
+ {
4529
+ "epoch": 6.45e-05,
4530
+ "grad_norm": 13.44519329071045,
4531
+ "learning_rate": 6.439999999999999e-06,
4532
+ "loss": 71.0625,
4533
+ "step": 645
4534
+ },
4535
+ {
4536
+ "epoch": 6.46e-05,
4537
+ "grad_norm": 14.572875022888184,
4538
+ "learning_rate": 6.45e-06,
4539
+ "loss": 70.9375,
4540
+ "step": 646
4541
+ },
4542
+ {
4543
+ "epoch": 6.47e-05,
4544
+ "grad_norm": 13.89630126953125,
4545
+ "learning_rate": 6.460000000000001e-06,
4546
+ "loss": 70.9375,
4547
+ "step": 647
4548
+ },
4549
+ {
4550
+ "epoch": 6.48e-05,
4551
+ "grad_norm": 13.762505531311035,
4552
+ "learning_rate": 6.47e-06,
4553
+ "loss": 70.875,
4554
+ "step": 648
4555
+ },
4556
+ {
4557
+ "epoch": 6.49e-05,
4558
+ "grad_norm": 13.604170799255371,
4559
+ "learning_rate": 6.480000000000001e-06,
4560
+ "loss": 70.9375,
4561
+ "step": 649
4562
+ },
4563
+ {
4564
+ "epoch": 6.5e-05,
4565
+ "grad_norm": 13.724708557128906,
4566
+ "learning_rate": 6.4900000000000005e-06,
4567
+ "loss": 70.9375,
4568
+ "step": 650
4569
+ },
4570
+ {
4571
+ "epoch": 6.51e-05,
4572
+ "grad_norm": 13.948774337768555,
4573
+ "learning_rate": 6.5e-06,
4574
+ "loss": 70.9375,
4575
+ "step": 651
4576
+ },
4577
+ {
4578
+ "epoch": 6.52e-05,
4579
+ "grad_norm": 14.045595169067383,
4580
+ "learning_rate": 6.51e-06,
4581
+ "loss": 70.9375,
4582
+ "step": 652
4583
+ },
4584
+ {
4585
+ "epoch": 6.53e-05,
4586
+ "grad_norm": 13.71200180053711,
4587
+ "learning_rate": 6.520000000000001e-06,
4588
+ "loss": 70.9375,
4589
+ "step": 653
4590
+ },
4591
+ {
4592
+ "epoch": 6.54e-05,
4593
+ "grad_norm": 14.408919334411621,
4594
+ "learning_rate": 6.53e-06,
4595
+ "loss": 70.8125,
4596
+ "step": 654
4597
+ },
4598
+ {
4599
+ "epoch": 6.55e-05,
4600
+ "grad_norm": 13.958176612854004,
4601
+ "learning_rate": 6.54e-06,
4602
+ "loss": 70.9375,
4603
+ "step": 655
4604
+ },
4605
+ {
4606
+ "epoch": 6.56e-05,
4607
+ "grad_norm": 13.734806060791016,
4608
+ "learning_rate": 6.549999999999999e-06,
4609
+ "loss": 71.0625,
4610
+ "step": 656
4611
+ },
4612
+ {
4613
+ "epoch": 6.57e-05,
4614
+ "grad_norm": 14.043981552124023,
4615
+ "learning_rate": 6.56e-06,
4616
+ "loss": 70.8125,
4617
+ "step": 657
4618
+ },
4619
+ {
4620
+ "epoch": 6.58e-05,
4621
+ "grad_norm": 13.461383819580078,
4622
+ "learning_rate": 6.570000000000001e-06,
4623
+ "loss": 71.0,
4624
+ "step": 658
4625
+ },
4626
+ {
4627
+ "epoch": 6.59e-05,
4628
+ "grad_norm": 13.893697738647461,
4629
+ "learning_rate": 6.58e-06,
4630
+ "loss": 70.9375,
4631
+ "step": 659
4632
+ },
4633
+ {
4634
+ "epoch": 6.6e-05,
4635
+ "grad_norm": 14.152552604675293,
4636
+ "learning_rate": 6.5900000000000004e-06,
4637
+ "loss": 70.8125,
4638
+ "step": 660
4639
+ },
4640
+ {
4641
+ "epoch": 6.61e-05,
4642
+ "grad_norm": 14.377528190612793,
4643
+ "learning_rate": 6.6e-06,
4644
+ "loss": 70.75,
4645
+ "step": 661
4646
+ },
4647
+ {
4648
+ "epoch": 6.62e-05,
4649
+ "grad_norm": 13.767778396606445,
4650
+ "learning_rate": 6.609999999999999e-06,
4651
+ "loss": 70.8125,
4652
+ "step": 662
4653
+ },
4654
+ {
4655
+ "epoch": 6.63e-05,
4656
+ "grad_norm": 13.831917762756348,
4657
+ "learning_rate": 6.62e-06,
4658
+ "loss": 70.8125,
4659
+ "step": 663
4660
+ },
4661
+ {
4662
+ "epoch": 6.64e-05,
4663
+ "grad_norm": 14.295366287231445,
4664
+ "learning_rate": 6.630000000000001e-06,
4665
+ "loss": 70.8125,
4666
+ "step": 664
4667
+ },
4668
+ {
4669
+ "epoch": 6.65e-05,
4670
+ "grad_norm": 13.494677543640137,
4671
+ "learning_rate": 6.64e-06,
4672
+ "loss": 71.0,
4673
+ "step": 665
4674
+ },
4675
+ {
4676
+ "epoch": 6.66e-05,
4677
+ "grad_norm": 14.52921199798584,
4678
+ "learning_rate": 6.65e-06,
4679
+ "loss": 70.6875,
4680
+ "step": 666
4681
+ },
4682
+ {
4683
+ "epoch": 6.67e-05,
4684
+ "grad_norm": 14.000243186950684,
4685
+ "learning_rate": 6.660000000000001e-06,
4686
+ "loss": 70.625,
4687
+ "step": 667
4688
+ },
4689
+ {
4690
+ "epoch": 6.68e-05,
4691
+ "grad_norm": 13.905571937561035,
4692
+ "learning_rate": 6.67e-06,
4693
+ "loss": 70.8125,
4694
+ "step": 668
4695
+ },
4696
+ {
4697
+ "epoch": 6.69e-05,
4698
+ "grad_norm": 13.985931396484375,
4699
+ "learning_rate": 6.68e-06,
4700
+ "loss": 70.8125,
4701
+ "step": 669
4702
+ },
4703
+ {
4704
+ "epoch": 6.7e-05,
4705
+ "grad_norm": 13.165407180786133,
4706
+ "learning_rate": 6.690000000000001e-06,
4707
+ "loss": 70.875,
4708
+ "step": 670
4709
+ },
4710
+ {
4711
+ "epoch": 6.71e-05,
4712
+ "grad_norm": 14.570108413696289,
4713
+ "learning_rate": 6.7e-06,
4714
+ "loss": 70.5625,
4715
+ "step": 671
4716
+ },
4717
+ {
4718
+ "epoch": 6.72e-05,
4719
+ "grad_norm": 14.332723617553711,
4720
+ "learning_rate": 6.71e-06,
4721
+ "loss": 70.625,
4722
+ "step": 672
4723
+ },
4724
+ {
4725
+ "epoch": 6.73e-05,
4726
+ "grad_norm": 13.870111465454102,
4727
+ "learning_rate": 6.719999999999999e-06,
4728
+ "loss": 70.5625,
4729
+ "step": 673
4730
+ },
4731
+ {
4732
+ "epoch": 6.74e-05,
4733
+ "grad_norm": 14.580185890197754,
4734
+ "learning_rate": 6.73e-06,
4735
+ "loss": 70.75,
4736
+ "step": 674
4737
+ },
4738
+ {
4739
+ "epoch": 6.75e-05,
4740
+ "grad_norm": 13.757917404174805,
4741
+ "learning_rate": 6.740000000000001e-06,
4742
+ "loss": 70.5625,
4743
+ "step": 675
4744
+ },
4745
+ {
4746
+ "epoch": 6.76e-05,
4747
+ "grad_norm": 14.098215103149414,
4748
+ "learning_rate": 6.75e-06,
4749
+ "loss": 70.625,
4750
+ "step": 676
4751
+ },
4752
+ {
4753
+ "epoch": 6.77e-05,
4754
+ "grad_norm": 13.889370918273926,
4755
+ "learning_rate": 6.7600000000000005e-06,
4756
+ "loss": 70.625,
4757
+ "step": 677
4758
+ },
4759
+ {
4760
+ "epoch": 6.78e-05,
4761
+ "grad_norm": 13.543923377990723,
4762
+ "learning_rate": 6.77e-06,
4763
+ "loss": 70.75,
4764
+ "step": 678
4765
+ },
4766
+ {
4767
+ "epoch": 6.79e-05,
4768
+ "grad_norm": 14.494497299194336,
4769
+ "learning_rate": 6.7799999999999995e-06,
4770
+ "loss": 70.4375,
4771
+ "step": 679
4772
+ },
4773
+ {
4774
+ "epoch": 6.8e-05,
4775
+ "grad_norm": 13.892583847045898,
4776
+ "learning_rate": 6.79e-06,
4777
+ "loss": 70.6875,
4778
+ "step": 680
4779
+ },
4780
+ {
4781
+ "epoch": 6.81e-05,
4782
+ "grad_norm": 14.069755554199219,
4783
+ "learning_rate": 6.800000000000001e-06,
4784
+ "loss": 70.5,
4785
+ "step": 681
4786
+ },
4787
+ {
4788
+ "epoch": 6.82e-05,
4789
+ "grad_norm": 14.10586166381836,
4790
+ "learning_rate": 6.81e-06,
4791
+ "loss": 70.4375,
4792
+ "step": 682
4793
+ },
4794
+ {
4795
+ "epoch": 6.83e-05,
4796
+ "grad_norm": 14.291821479797363,
4797
+ "learning_rate": 6.82e-06,
4798
+ "loss": 70.5625,
4799
+ "step": 683
4800
+ },
4801
+ {
4802
+ "epoch": 6.84e-05,
4803
+ "grad_norm": 14.351411819458008,
4804
+ "learning_rate": 6.830000000000001e-06,
4805
+ "loss": 70.5625,
4806
+ "step": 684
4807
+ },
4808
+ {
4809
+ "epoch": 6.85e-05,
4810
+ "grad_norm": 13.590216636657715,
4811
+ "learning_rate": 6.84e-06,
4812
+ "loss": 70.4375,
4813
+ "step": 685
4814
+ },
4815
+ {
4816
+ "epoch": 6.86e-05,
4817
+ "grad_norm": 14.20474910736084,
4818
+ "learning_rate": 6.8500000000000005e-06,
4819
+ "loss": 70.5,
4820
+ "step": 686
4821
+ },
4822
+ {
4823
+ "epoch": 6.87e-05,
4824
+ "grad_norm": 14.491683959960938,
4825
+ "learning_rate": 6.86e-06,
4826
+ "loss": 70.25,
4827
+ "step": 687
4828
+ },
4829
+ {
4830
+ "epoch": 6.88e-05,
4831
+ "grad_norm": 14.437154769897461,
4832
+ "learning_rate": 6.87e-06,
4833
+ "loss": 70.3125,
4834
+ "step": 688
4835
+ },
4836
+ {
4837
+ "epoch": 6.89e-05,
4838
+ "grad_norm": 14.214030265808105,
4839
+ "learning_rate": 6.88e-06,
4840
+ "loss": 70.375,
4841
+ "step": 689
4842
+ },
4843
+ {
4844
+ "epoch": 6.9e-05,
4845
+ "grad_norm": 13.905532836914062,
4846
+ "learning_rate": 6.889999999999999e-06,
4847
+ "loss": 70.4375,
4848
+ "step": 690
4849
+ },
4850
+ {
4851
+ "epoch": 6.91e-05,
4852
+ "grad_norm": 14.396810531616211,
4853
+ "learning_rate": 6.9e-06,
4854
+ "loss": 70.3125,
4855
+ "step": 691
4856
+ },
4857
+ {
4858
+ "epoch": 6.92e-05,
4859
+ "grad_norm": 13.368425369262695,
4860
+ "learning_rate": 6.910000000000001e-06,
4861
+ "loss": 70.625,
4862
+ "step": 692
4863
+ },
4864
+ {
4865
+ "epoch": 6.93e-05,
4866
+ "grad_norm": 14.467592239379883,
4867
+ "learning_rate": 6.92e-06,
4868
+ "loss": 70.125,
4869
+ "step": 693
4870
+ },
4871
+ {
4872
+ "epoch": 6.94e-05,
4873
+ "grad_norm": 14.07659912109375,
4874
+ "learning_rate": 6.93e-06,
4875
+ "loss": 70.25,
4876
+ "step": 694
4877
+ },
4878
+ {
4879
+ "epoch": 6.95e-05,
4880
+ "grad_norm": 13.678894996643066,
4881
+ "learning_rate": 6.9400000000000005e-06,
4882
+ "loss": 70.25,
4883
+ "step": 695
4884
+ },
4885
+ {
4886
+ "epoch": 6.96e-05,
4887
+ "grad_norm": 13.846305847167969,
4888
+ "learning_rate": 6.9499999999999995e-06,
4889
+ "loss": 70.4375,
4890
+ "step": 696
4891
+ },
4892
+ {
4893
+ "epoch": 6.97e-05,
4894
+ "grad_norm": 14.344511032104492,
4895
+ "learning_rate": 6.96e-06,
4896
+ "loss": 70.125,
4897
+ "step": 697
4898
+ },
4899
+ {
4900
+ "epoch": 6.98e-05,
4901
+ "grad_norm": 13.975683212280273,
4902
+ "learning_rate": 6.970000000000001e-06,
4903
+ "loss": 70.3125,
4904
+ "step": 698
4905
+ },
4906
+ {
4907
+ "epoch": 6.99e-05,
4908
+ "grad_norm": 13.725882530212402,
4909
+ "learning_rate": 6.98e-06,
4910
+ "loss": 70.25,
4911
+ "step": 699
4912
+ },
4913
+ {
4914
+ "epoch": 7e-05,
4915
+ "grad_norm": 14.02601146697998,
4916
+ "learning_rate": 6.99e-06,
4917
+ "loss": 70.1875,
4918
+ "step": 700
4919
+ },
4920
+ {
4921
+ "epoch": 7.01e-05,
4922
+ "grad_norm": 14.149601936340332,
4923
+ "learning_rate": 7.000000000000001e-06,
4924
+ "loss": 70.25,
4925
+ "step": 701
4926
+ },
4927
+ {
4928
+ "epoch": 7.02e-05,
4929
+ "grad_norm": 14.683643341064453,
4930
+ "learning_rate": 7.01e-06,
4931
+ "loss": 69.9375,
4932
+ "step": 702
4933
+ },
4934
+ {
4935
+ "epoch": 7.03e-05,
4936
+ "grad_norm": 14.377092361450195,
4937
+ "learning_rate": 7.0200000000000006e-06,
4938
+ "loss": 70.0625,
4939
+ "step": 703
4940
+ },
4941
+ {
4942
+ "epoch": 7.04e-05,
4943
+ "grad_norm": 13.508744239807129,
4944
+ "learning_rate": 7.03e-06,
4945
+ "loss": 70.125,
4946
+ "step": 704
4947
+ },
4948
+ {
4949
+ "epoch": 7.05e-05,
4950
+ "grad_norm": 14.27863883972168,
4951
+ "learning_rate": 7.04e-06,
4952
+ "loss": 70.0,
4953
+ "step": 705
4954
+ },
4955
+ {
4956
+ "epoch": 7.06e-05,
4957
+ "grad_norm": 14.04149055480957,
4958
+ "learning_rate": 7.05e-06,
4959
+ "loss": 70.0625,
4960
+ "step": 706
4961
+ },
4962
+ {
4963
+ "epoch": 7.07e-05,
4964
+ "grad_norm": 13.799717903137207,
4965
+ "learning_rate": 7.059999999999999e-06,
4966
+ "loss": 70.125,
4967
+ "step": 707
4968
+ },
4969
+ {
4970
+ "epoch": 7.08e-05,
4971
+ "grad_norm": 13.9593505859375,
4972
+ "learning_rate": 7.07e-06,
4973
+ "loss": 70.0,
4974
+ "step": 708
4975
+ },
4976
+ {
4977
+ "epoch": 7.09e-05,
4978
+ "grad_norm": 14.294474601745605,
4979
+ "learning_rate": 7.080000000000001e-06,
4980
+ "loss": 69.875,
4981
+ "step": 709
4982
+ },
4983
+ {
4984
+ "epoch": 7.1e-05,
4985
+ "grad_norm": 14.307499885559082,
4986
+ "learning_rate": 7.09e-06,
4987
+ "loss": 70.0,
4988
+ "step": 710
4989
+ },
4990
+ {
4991
+ "epoch": 7.11e-05,
4992
+ "grad_norm": 13.888057708740234,
4993
+ "learning_rate": 7.1e-06,
4994
+ "loss": 70.0,
4995
+ "step": 711
4996
+ },
4997
+ {
4998
+ "epoch": 7.12e-05,
4999
+ "grad_norm": 13.905348777770996,
5000
+ "learning_rate": 7.1100000000000005e-06,
5001
+ "loss": 70.0,
5002
+ "step": 712
5003
+ },
5004
+ {
5005
+ "epoch": 7.13e-05,
5006
+ "grad_norm": 14.086119651794434,
5007
+ "learning_rate": 7.12e-06,
5008
+ "loss": 69.875,
5009
+ "step": 713
5010
+ },
5011
+ {
5012
+ "epoch": 7.14e-05,
5013
+ "grad_norm": 14.221412658691406,
5014
+ "learning_rate": 7.13e-06,
5015
+ "loss": 69.9375,
5016
+ "step": 714
5017
+ },
5018
+ {
5019
+ "epoch": 7.15e-05,
5020
+ "grad_norm": 14.204227447509766,
5021
+ "learning_rate": 7.14e-06,
5022
+ "loss": 69.875,
5023
+ "step": 715
5024
+ },
5025
+ {
5026
+ "epoch": 7.16e-05,
5027
+ "grad_norm": 14.413752555847168,
5028
+ "learning_rate": 7.15e-06,
5029
+ "loss": 69.75,
5030
+ "step": 716
5031
+ },
5032
+ {
5033
+ "epoch": 7.17e-05,
5034
+ "grad_norm": 14.447648048400879,
5035
+ "learning_rate": 7.16e-06,
5036
+ "loss": 69.625,
5037
+ "step": 717
5038
+ },
5039
+ {
5040
+ "epoch": 7.18e-05,
5041
+ "grad_norm": 13.275869369506836,
5042
+ "learning_rate": 7.170000000000001e-06,
5043
+ "loss": 69.9375,
5044
+ "step": 718
5045
+ },
5046
+ {
5047
+ "epoch": 7.19e-05,
5048
+ "grad_norm": 14.1619873046875,
5049
+ "learning_rate": 7.18e-06,
5050
+ "loss": 69.6875,
5051
+ "step": 719
5052
+ },
5053
+ {
5054
+ "epoch": 7.2e-05,
5055
+ "grad_norm": 14.165518760681152,
5056
+ "learning_rate": 7.190000000000001e-06,
5057
+ "loss": 69.6875,
5058
+ "step": 720
5059
+ },
5060
+ {
5061
+ "epoch": 7.21e-05,
5062
+ "grad_norm": 13.807438850402832,
5063
+ "learning_rate": 7.2e-06,
5064
+ "loss": 69.75,
5065
+ "step": 721
5066
+ },
5067
+ {
5068
+ "epoch": 7.22e-05,
5069
+ "grad_norm": 14.538020133972168,
5070
+ "learning_rate": 7.21e-06,
5071
+ "loss": 69.6875,
5072
+ "step": 722
5073
+ },
5074
+ {
5075
+ "epoch": 7.23e-05,
5076
+ "grad_norm": 14.57617473602295,
5077
+ "learning_rate": 7.22e-06,
5078
+ "loss": 69.5625,
5079
+ "step": 723
5080
+ },
5081
+ {
5082
+ "epoch": 7.24e-05,
5083
+ "grad_norm": 13.881351470947266,
5084
+ "learning_rate": 7.229999999999999e-06,
5085
+ "loss": 69.625,
5086
+ "step": 724
5087
+ },
5088
+ {
5089
+ "epoch": 7.25e-05,
5090
+ "grad_norm": 14.827073097229004,
5091
+ "learning_rate": 7.24e-06,
5092
+ "loss": 69.5625,
5093
+ "step": 725
5094
+ },
5095
+ {
5096
+ "epoch": 7.26e-05,
5097
+ "grad_norm": 14.633003234863281,
5098
+ "learning_rate": 7.250000000000001e-06,
5099
+ "loss": 69.375,
5100
+ "step": 726
5101
+ },
5102
+ {
5103
+ "epoch": 7.27e-05,
5104
+ "grad_norm": 13.844290733337402,
5105
+ "learning_rate": 7.26e-06,
5106
+ "loss": 69.4375,
5107
+ "step": 727
5108
+ },
5109
+ {
5110
+ "epoch": 7.28e-05,
5111
+ "grad_norm": 14.124848365783691,
5112
+ "learning_rate": 7.27e-06,
5113
+ "loss": 69.4375,
5114
+ "step": 728
5115
+ },
5116
+ {
5117
+ "epoch": 7.29e-05,
5118
+ "grad_norm": 14.088972091674805,
5119
+ "learning_rate": 7.280000000000001e-06,
5120
+ "loss": 69.5,
5121
+ "step": 729
5122
+ },
5123
+ {
5124
+ "epoch": 7.3e-05,
5125
+ "grad_norm": 14.448264122009277,
5126
+ "learning_rate": 7.29e-06,
5127
+ "loss": 69.4375,
5128
+ "step": 730
5129
+ },
5130
+ {
5131
+ "epoch": 7.31e-05,
5132
+ "grad_norm": 14.05547046661377,
5133
+ "learning_rate": 7.3e-06,
5134
+ "loss": 69.375,
5135
+ "step": 731
5136
+ },
5137
+ {
5138
+ "epoch": 7.32e-05,
5139
+ "grad_norm": 14.225979804992676,
5140
+ "learning_rate": 7.31e-06,
5141
+ "loss": 69.375,
5142
+ "step": 732
5143
+ },
5144
+ {
5145
+ "epoch": 7.33e-05,
5146
+ "grad_norm": 14.301802635192871,
5147
+ "learning_rate": 7.32e-06,
5148
+ "loss": 69.25,
5149
+ "step": 733
5150
+ },
5151
+ {
5152
+ "epoch": 7.34e-05,
5153
+ "grad_norm": 14.079911231994629,
5154
+ "learning_rate": 7.33e-06,
5155
+ "loss": 69.375,
5156
+ "step": 734
5157
+ },
5158
+ {
5159
+ "epoch": 7.35e-05,
5160
+ "grad_norm": 13.954413414001465,
5161
+ "learning_rate": 7.340000000000001e-06,
5162
+ "loss": 69.375,
5163
+ "step": 735
5164
+ },
5165
+ {
5166
+ "epoch": 7.36e-05,
5167
+ "grad_norm": 14.633395195007324,
5168
+ "learning_rate": 7.35e-06,
5169
+ "loss": 69.1875,
5170
+ "step": 736
5171
+ },
5172
+ {
5173
+ "epoch": 7.37e-05,
5174
+ "grad_norm": 13.73357105255127,
5175
+ "learning_rate": 7.360000000000001e-06,
5176
+ "loss": 69.3125,
5177
+ "step": 737
5178
+ },
5179
+ {
5180
+ "epoch": 7.38e-05,
5181
+ "grad_norm": 13.533805847167969,
5182
+ "learning_rate": 7.37e-06,
5183
+ "loss": 69.375,
5184
+ "step": 738
5185
+ },
5186
+ {
5187
+ "epoch": 7.39e-05,
5188
+ "grad_norm": 13.967336654663086,
5189
+ "learning_rate": 7.38e-06,
5190
+ "loss": 69.125,
5191
+ "step": 739
5192
+ },
5193
+ {
5194
+ "epoch": 7.4e-05,
5195
+ "grad_norm": 13.907071113586426,
5196
+ "learning_rate": 7.39e-06,
5197
+ "loss": 69.25,
5198
+ "step": 740
5199
+ },
5200
+ {
5201
+ "epoch": 7.41e-05,
5202
+ "grad_norm": 14.33962345123291,
5203
+ "learning_rate": 7.3999999999999995e-06,
5204
+ "loss": 69.0625,
5205
+ "step": 741
5206
+ },
5207
+ {
5208
+ "epoch": 7.42e-05,
5209
+ "grad_norm": 14.036276817321777,
5210
+ "learning_rate": 7.41e-06,
5211
+ "loss": 69.0625,
5212
+ "step": 742
5213
+ },
5214
+ {
5215
+ "epoch": 7.43e-05,
5216
+ "grad_norm": 14.373995780944824,
5217
+ "learning_rate": 7.420000000000001e-06,
5218
+ "loss": 69.0,
5219
+ "step": 743
5220
+ },
5221
+ {
5222
+ "epoch": 7.44e-05,
5223
+ "grad_norm": 13.279586791992188,
5224
+ "learning_rate": 7.43e-06,
5225
+ "loss": 69.25,
5226
+ "step": 744
5227
+ },
5228
+ {
5229
+ "epoch": 7.45e-05,
5230
+ "grad_norm": 13.835576057434082,
5231
+ "learning_rate": 7.44e-06,
5232
+ "loss": 69.0625,
5233
+ "step": 745
5234
+ },
5235
+ {
5236
+ "epoch": 7.46e-05,
5237
+ "grad_norm": 14.343767166137695,
5238
+ "learning_rate": 7.450000000000001e-06,
5239
+ "loss": 69.0,
5240
+ "step": 746
5241
+ },
5242
+ {
5243
+ "epoch": 7.47e-05,
5244
+ "grad_norm": 14.20533275604248,
5245
+ "learning_rate": 7.46e-06,
5246
+ "loss": 68.875,
5247
+ "step": 747
5248
+ },
5249
+ {
5250
+ "epoch": 7.48e-05,
5251
+ "grad_norm": 14.370162010192871,
5252
+ "learning_rate": 7.4700000000000005e-06,
5253
+ "loss": 68.75,
5254
+ "step": 748
5255
+ },
5256
+ {
5257
+ "epoch": 7.49e-05,
5258
+ "grad_norm": 14.102258682250977,
5259
+ "learning_rate": 7.48e-06,
5260
+ "loss": 68.8125,
5261
+ "step": 749
5262
+ },
5263
+ {
5264
+ "epoch": 7.5e-05,
5265
+ "grad_norm": 14.238746643066406,
5266
+ "learning_rate": 7.4899999999999994e-06,
5267
+ "loss": 68.75,
5268
+ "step": 750
5269
+ },
5270
+ {
5271
+ "epoch": 7.51e-05,
5272
+ "grad_norm": 14.459688186645508,
5273
+ "learning_rate": 7.5e-06,
5274
+ "loss": 68.75,
5275
+ "step": 751
5276
+ },
5277
+ {
5278
+ "epoch": 7.52e-05,
5279
+ "grad_norm": 13.407461166381836,
5280
+ "learning_rate": 7.510000000000001e-06,
5281
+ "loss": 68.9375,
5282
+ "step": 752
5283
+ },
5284
+ {
5285
+ "epoch": 7.53e-05,
5286
+ "grad_norm": 13.310074806213379,
5287
+ "learning_rate": 7.52e-06,
5288
+ "loss": 68.9375,
5289
+ "step": 753
5290
+ },
5291
+ {
5292
+ "epoch": 7.54e-05,
5293
+ "grad_norm": 14.59634017944336,
5294
+ "learning_rate": 7.530000000000001e-06,
5295
+ "loss": 68.5625,
5296
+ "step": 754
5297
+ },
5298
+ {
5299
+ "epoch": 7.55e-05,
5300
+ "grad_norm": 13.75051212310791,
5301
+ "learning_rate": 7.54e-06,
5302
+ "loss": 68.6875,
5303
+ "step": 755
5304
+ },
5305
+ {
5306
+ "epoch": 7.56e-05,
5307
+ "grad_norm": 14.305959701538086,
5308
+ "learning_rate": 7.55e-06,
5309
+ "loss": 68.5625,
5310
+ "step": 756
5311
+ },
5312
+ {
5313
+ "epoch": 7.57e-05,
5314
+ "grad_norm": 14.588299751281738,
5315
+ "learning_rate": 7.5600000000000005e-06,
5316
+ "loss": 68.375,
5317
+ "step": 757
5318
+ },
5319
+ {
5320
+ "epoch": 7.58e-05,
5321
+ "grad_norm": 14.081939697265625,
5322
+ "learning_rate": 7.5699999999999995e-06,
5323
+ "loss": 68.5,
5324
+ "step": 758
5325
+ },
5326
+ {
5327
+ "epoch": 7.59e-05,
5328
+ "grad_norm": 13.225693702697754,
5329
+ "learning_rate": 7.58e-06,
5330
+ "loss": 68.9375,
5331
+ "step": 759
5332
+ },
5333
+ {
5334
+ "epoch": 7.6e-05,
5335
+ "grad_norm": 13.850139617919922,
5336
+ "learning_rate": 7.59e-06,
5337
+ "loss": 68.625,
5338
+ "step": 760
5339
+ },
5340
+ {
5341
+ "epoch": 7.61e-05,
5342
+ "grad_norm": 13.827278137207031,
5343
+ "learning_rate": 7.6e-06,
5344
+ "loss": 68.5,
5345
+ "step": 761
5346
+ },
5347
+ {
5348
+ "epoch": 7.62e-05,
5349
+ "grad_norm": 14.146428108215332,
5350
+ "learning_rate": 7.61e-06,
5351
+ "loss": 68.3125,
5352
+ "step": 762
5353
+ },
5354
+ {
5355
+ "epoch": 7.63e-05,
5356
+ "grad_norm": 13.658434867858887,
5357
+ "learning_rate": 7.620000000000001e-06,
5358
+ "loss": 68.4375,
5359
+ "step": 763
5360
+ },
5361
+ {
5362
+ "epoch": 7.64e-05,
5363
+ "grad_norm": 14.356409072875977,
5364
+ "learning_rate": 7.63e-06,
5365
+ "loss": 68.3125,
5366
+ "step": 764
5367
+ },
5368
+ {
5369
+ "epoch": 7.65e-05,
5370
+ "grad_norm": 14.146637916564941,
5371
+ "learning_rate": 7.64e-06,
5372
+ "loss": 68.3125,
5373
+ "step": 765
5374
+ },
5375
+ {
5376
+ "epoch": 7.66e-05,
5377
+ "grad_norm": 13.119014739990234,
5378
+ "learning_rate": 7.650000000000001e-06,
5379
+ "loss": 68.5625,
5380
+ "step": 766
5381
+ },
5382
+ {
5383
+ "epoch": 7.67e-05,
5384
+ "grad_norm": 14.340291976928711,
5385
+ "learning_rate": 7.66e-06,
5386
+ "loss": 68.1875,
5387
+ "step": 767
5388
+ },
5389
+ {
5390
+ "epoch": 7.68e-05,
5391
+ "grad_norm": 13.567488670349121,
5392
+ "learning_rate": 7.670000000000001e-06,
5393
+ "loss": 68.375,
5394
+ "step": 768
5395
+ },
5396
+ {
5397
+ "epoch": 7.69e-05,
5398
+ "grad_norm": 14.072280883789062,
5399
+ "learning_rate": 7.680000000000001e-06,
5400
+ "loss": 68.125,
5401
+ "step": 769
5402
+ },
5403
+ {
5404
+ "epoch": 7.7e-05,
5405
+ "grad_norm": 13.906455993652344,
5406
+ "learning_rate": 7.69e-06,
5407
+ "loss": 68.0625,
5408
+ "step": 770
5409
+ },
5410
+ {
5411
+ "epoch": 7.71e-05,
5412
+ "grad_norm": 14.400522232055664,
5413
+ "learning_rate": 7.7e-06,
5414
+ "loss": 67.9375,
5415
+ "step": 771
5416
+ },
5417
+ {
5418
+ "epoch": 7.72e-05,
5419
+ "grad_norm": 14.000092506408691,
5420
+ "learning_rate": 7.709999999999999e-06,
5421
+ "loss": 68.0,
5422
+ "step": 772
5423
+ },
5424
+ {
5425
+ "epoch": 7.73e-05,
5426
+ "grad_norm": 13.759057998657227,
5427
+ "learning_rate": 7.72e-06,
5428
+ "loss": 68.125,
5429
+ "step": 773
5430
+ },
5431
+ {
5432
+ "epoch": 7.74e-05,
5433
+ "grad_norm": 13.556885719299316,
5434
+ "learning_rate": 7.73e-06,
5435
+ "loss": 68.1875,
5436
+ "step": 774
5437
+ },
5438
+ {
5439
+ "epoch": 7.75e-05,
5440
+ "grad_norm": 13.875627517700195,
5441
+ "learning_rate": 7.74e-06,
5442
+ "loss": 68.0,
5443
+ "step": 775
5444
+ },
5445
+ {
5446
+ "epoch": 7.76e-05,
5447
+ "grad_norm": 14.190862655639648,
5448
+ "learning_rate": 7.75e-06,
5449
+ "loss": 67.75,
5450
+ "step": 776
5451
+ },
5452
+ {
5453
+ "epoch": 7.77e-05,
5454
+ "grad_norm": 14.008695602416992,
5455
+ "learning_rate": 7.76e-06,
5456
+ "loss": 67.8125,
5457
+ "step": 777
5458
+ },
5459
+ {
5460
+ "epoch": 7.78e-05,
5461
+ "grad_norm": 13.735383033752441,
5462
+ "learning_rate": 7.77e-06,
5463
+ "loss": 67.6875,
5464
+ "step": 778
5465
+ },
5466
+ {
5467
+ "epoch": 7.79e-05,
5468
+ "grad_norm": 13.905145645141602,
5469
+ "learning_rate": 7.78e-06,
5470
+ "loss": 67.875,
5471
+ "step": 779
5472
+ },
5473
+ {
5474
+ "epoch": 7.8e-05,
5475
+ "grad_norm": 13.315927505493164,
5476
+ "learning_rate": 7.79e-06,
5477
+ "loss": 68.0,
5478
+ "step": 780
5479
+ },
5480
+ {
5481
+ "epoch": 7.81e-05,
5482
+ "grad_norm": 14.015225410461426,
5483
+ "learning_rate": 7.8e-06,
5484
+ "loss": 67.75,
5485
+ "step": 781
5486
+ },
5487
+ {
5488
+ "epoch": 7.82e-05,
5489
+ "grad_norm": 13.459209442138672,
5490
+ "learning_rate": 7.81e-06,
5491
+ "loss": 67.875,
5492
+ "step": 782
5493
+ },
5494
+ {
5495
+ "epoch": 7.83e-05,
5496
+ "grad_norm": 13.911338806152344,
5497
+ "learning_rate": 7.820000000000001e-06,
5498
+ "loss": 67.75,
5499
+ "step": 783
5500
+ },
5501
+ {
5502
+ "epoch": 7.84e-05,
5503
+ "grad_norm": 13.783612251281738,
5504
+ "learning_rate": 7.83e-06,
5505
+ "loss": 67.8125,
5506
+ "step": 784
5507
+ },
5508
+ {
5509
+ "epoch": 7.85e-05,
5510
+ "grad_norm": 14.066487312316895,
5511
+ "learning_rate": 7.840000000000001e-06,
5512
+ "loss": 67.375,
5513
+ "step": 785
5514
+ },
5515
+ {
5516
+ "epoch": 7.86e-05,
5517
+ "grad_norm": 13.927901268005371,
5518
+ "learning_rate": 7.850000000000001e-06,
5519
+ "loss": 67.5,
5520
+ "step": 786
5521
+ },
5522
+ {
5523
+ "epoch": 7.87e-05,
5524
+ "grad_norm": 13.599596977233887,
5525
+ "learning_rate": 7.86e-06,
5526
+ "loss": 67.5,
5527
+ "step": 787
5528
+ },
5529
+ {
5530
+ "epoch": 7.88e-05,
5531
+ "grad_norm": 13.849498748779297,
5532
+ "learning_rate": 7.870000000000001e-06,
5533
+ "loss": 67.4375,
5534
+ "step": 788
5535
+ },
5536
+ {
5537
+ "epoch": 7.89e-05,
5538
+ "grad_norm": 13.334601402282715,
5539
+ "learning_rate": 7.879999999999999e-06,
5540
+ "loss": 67.5,
5541
+ "step": 789
5542
+ },
5543
+ {
5544
+ "epoch": 7.9e-05,
5545
+ "grad_norm": 13.693037986755371,
5546
+ "learning_rate": 7.89e-06,
5547
+ "loss": 67.375,
5548
+ "step": 790
5549
+ },
5550
+ {
5551
+ "epoch": 7.91e-05,
5552
+ "grad_norm": 13.442035675048828,
5553
+ "learning_rate": 7.9e-06,
5554
+ "loss": 67.5,
5555
+ "step": 791
5556
+ },
5557
+ {
5558
+ "epoch": 7.92e-05,
5559
+ "grad_norm": 13.878938674926758,
5560
+ "learning_rate": 7.91e-06,
5561
+ "loss": 67.0625,
5562
+ "step": 792
5563
+ },
5564
+ {
5565
+ "epoch": 7.93e-05,
5566
+ "grad_norm": 13.525701522827148,
5567
+ "learning_rate": 7.92e-06,
5568
+ "loss": 67.5,
5569
+ "step": 793
5570
+ },
5571
+ {
5572
+ "epoch": 7.94e-05,
5573
+ "grad_norm": 13.480079650878906,
5574
+ "learning_rate": 7.93e-06,
5575
+ "loss": 67.3125,
5576
+ "step": 794
5577
+ },
5578
+ {
5579
+ "epoch": 7.95e-05,
5580
+ "grad_norm": 13.224320411682129,
5581
+ "learning_rate": 7.94e-06,
5582
+ "loss": 67.375,
5583
+ "step": 795
5584
+ },
5585
+ {
5586
+ "epoch": 7.96e-05,
5587
+ "grad_norm": 13.158392906188965,
5588
+ "learning_rate": 7.95e-06,
5589
+ "loss": 67.3125,
5590
+ "step": 796
5591
+ },
5592
+ {
5593
+ "epoch": 7.97e-05,
5594
+ "grad_norm": 13.183201789855957,
5595
+ "learning_rate": 7.96e-06,
5596
+ "loss": 67.375,
5597
+ "step": 797
5598
+ },
5599
+ {
5600
+ "epoch": 7.98e-05,
5601
+ "grad_norm": 13.698829650878906,
5602
+ "learning_rate": 7.97e-06,
5603
+ "loss": 67.0,
5604
+ "step": 798
5605
+ },
5606
+ {
5607
+ "epoch": 7.99e-05,
5608
+ "grad_norm": 13.33995532989502,
5609
+ "learning_rate": 7.98e-06,
5610
+ "loss": 67.0,
5611
+ "step": 799
5612
+ },
5613
+ {
5614
+ "epoch": 8e-05,
5615
+ "grad_norm": 13.223627090454102,
5616
+ "learning_rate": 7.990000000000001e-06,
5617
+ "loss": 67.1875,
5618
+ "step": 800
5619
+ },
5620
+ {
5621
+ "epoch": 8.01e-05,
5622
+ "grad_norm": 13.501330375671387,
5623
+ "learning_rate": 8e-06,
5624
+ "loss": 66.9375,
5625
+ "step": 801
5626
+ },
5627
+ {
5628
+ "epoch": 8.02e-05,
5629
+ "grad_norm": 13.263901710510254,
5630
+ "learning_rate": 8.01e-06,
5631
+ "loss": 66.9375,
5632
+ "step": 802
5633
+ },
5634
+ {
5635
+ "epoch": 8.03e-05,
5636
+ "grad_norm": 13.32174301147461,
5637
+ "learning_rate": 8.02e-06,
5638
+ "loss": 66.875,
5639
+ "step": 803
5640
+ },
5641
+ {
5642
+ "epoch": 8.04e-05,
5643
+ "grad_norm": 13.487302780151367,
5644
+ "learning_rate": 8.03e-06,
5645
+ "loss": 66.75,
5646
+ "step": 804
5647
+ },
5648
+ {
5649
+ "epoch": 8.05e-05,
5650
+ "grad_norm": 13.301477432250977,
5651
+ "learning_rate": 8.040000000000001e-06,
5652
+ "loss": 66.875,
5653
+ "step": 805
5654
+ },
5655
+ {
5656
+ "epoch": 8.06e-05,
5657
+ "grad_norm": 13.382040023803711,
5658
+ "learning_rate": 8.05e-06,
5659
+ "loss": 66.625,
5660
+ "step": 806
5661
+ },
5662
+ {
5663
+ "epoch": 8.07e-05,
5664
+ "grad_norm": 13.217137336730957,
5665
+ "learning_rate": 8.06e-06,
5666
+ "loss": 66.875,
5667
+ "step": 807
5668
+ },
5669
+ {
5670
+ "epoch": 8.08e-05,
5671
+ "grad_norm": 13.310396194458008,
5672
+ "learning_rate": 8.07e-06,
5673
+ "loss": 66.6875,
5674
+ "step": 808
5675
+ },
5676
+ {
5677
+ "epoch": 8.09e-05,
5678
+ "grad_norm": 12.70387077331543,
5679
+ "learning_rate": 8.079999999999999e-06,
5680
+ "loss": 67.0,
5681
+ "step": 809
5682
+ },
5683
+ {
5684
+ "epoch": 8.1e-05,
5685
+ "grad_norm": 13.554513931274414,
5686
+ "learning_rate": 8.09e-06,
5687
+ "loss": 66.4375,
5688
+ "step": 810
5689
+ },
5690
+ {
5691
+ "epoch": 8.11e-05,
5692
+ "grad_norm": 13.117542266845703,
5693
+ "learning_rate": 8.1e-06,
5694
+ "loss": 66.5625,
5695
+ "step": 811
5696
+ },
5697
+ {
5698
+ "epoch": 8.12e-05,
5699
+ "grad_norm": 12.712786674499512,
5700
+ "learning_rate": 8.11e-06,
5701
+ "loss": 66.6875,
5702
+ "step": 812
5703
+ },
5704
+ {
5705
+ "epoch": 8.13e-05,
5706
+ "grad_norm": 13.098764419555664,
5707
+ "learning_rate": 8.12e-06,
5708
+ "loss": 66.5,
5709
+ "step": 813
5710
+ },
5711
+ {
5712
+ "epoch": 8.14e-05,
5713
+ "grad_norm": 13.022997856140137,
5714
+ "learning_rate": 8.13e-06,
5715
+ "loss": 66.4375,
5716
+ "step": 814
5717
+ },
5718
+ {
5719
+ "epoch": 8.15e-05,
5720
+ "grad_norm": 12.858806610107422,
5721
+ "learning_rate": 8.14e-06,
5722
+ "loss": 66.625,
5723
+ "step": 815
5724
+ },
5725
+ {
5726
+ "epoch": 8.16e-05,
5727
+ "grad_norm": 13.087048530578613,
5728
+ "learning_rate": 8.15e-06,
5729
+ "loss": 66.5,
5730
+ "step": 816
5731
+ },
5732
+ {
5733
+ "epoch": 8.17e-05,
5734
+ "grad_norm": 12.747725486755371,
5735
+ "learning_rate": 8.160000000000001e-06,
5736
+ "loss": 66.4375,
5737
+ "step": 817
5738
+ },
5739
+ {
5740
+ "epoch": 8.18e-05,
5741
+ "grad_norm": 13.237361907958984,
5742
+ "learning_rate": 8.17e-06,
5743
+ "loss": 66.1875,
5744
+ "step": 818
5745
+ },
5746
+ {
5747
+ "epoch": 8.19e-05,
5748
+ "grad_norm": 12.671786308288574,
5749
+ "learning_rate": 8.18e-06,
5750
+ "loss": 66.4375,
5751
+ "step": 819
5752
+ },
5753
+ {
5754
+ "epoch": 8.2e-05,
5755
+ "grad_norm": 12.85875129699707,
5756
+ "learning_rate": 8.19e-06,
5757
+ "loss": 66.3125,
5758
+ "step": 820
5759
+ },
5760
+ {
5761
+ "epoch": 8.21e-05,
5762
+ "grad_norm": 12.405821800231934,
5763
+ "learning_rate": 8.2e-06,
5764
+ "loss": 66.5,
5765
+ "step": 821
5766
+ },
5767
+ {
5768
+ "epoch": 8.22e-05,
5769
+ "grad_norm": 13.07856559753418,
5770
+ "learning_rate": 8.210000000000001e-06,
5771
+ "loss": 66.0625,
5772
+ "step": 822
5773
+ },
5774
+ {
5775
+ "epoch": 8.23e-05,
5776
+ "grad_norm": 13.018475532531738,
5777
+ "learning_rate": 8.22e-06,
5778
+ "loss": 66.1875,
5779
+ "step": 823
5780
+ },
5781
+ {
5782
+ "epoch": 8.24e-05,
5783
+ "grad_norm": 13.105154037475586,
5784
+ "learning_rate": 8.23e-06,
5785
+ "loss": 65.9375,
5786
+ "step": 824
5787
+ },
5788
+ {
5789
+ "epoch": 8.25e-05,
5790
+ "grad_norm": 12.811441421508789,
5791
+ "learning_rate": 8.24e-06,
5792
+ "loss": 66.0,
5793
+ "step": 825
5794
+ },
5795
+ {
5796
+ "epoch": 8.26e-05,
5797
+ "grad_norm": 12.24394702911377,
5798
+ "learning_rate": 8.249999999999999e-06,
5799
+ "loss": 66.5,
5800
+ "step": 826
5801
+ },
5802
+ {
5803
+ "epoch": 8.27e-05,
5804
+ "grad_norm": 12.856313705444336,
5805
+ "learning_rate": 8.26e-06,
5806
+ "loss": 66.1875,
5807
+ "step": 827
5808
+ },
5809
+ {
5810
+ "epoch": 8.28e-05,
5811
+ "grad_norm": 12.286417007446289,
5812
+ "learning_rate": 8.27e-06,
5813
+ "loss": 66.0625,
5814
+ "step": 828
5815
+ },
5816
+ {
5817
+ "epoch": 8.29e-05,
5818
+ "grad_norm": 12.537483215332031,
5819
+ "learning_rate": 8.28e-06,
5820
+ "loss": 65.9375,
5821
+ "step": 829
5822
+ },
5823
+ {
5824
+ "epoch": 8.3e-05,
5825
+ "grad_norm": 12.375907897949219,
5826
+ "learning_rate": 8.29e-06,
5827
+ "loss": 65.875,
5828
+ "step": 830
5829
+ },
5830
+ {
5831
+ "epoch": 8.31e-05,
5832
+ "grad_norm": 12.420113563537598,
5833
+ "learning_rate": 8.3e-06,
5834
+ "loss": 66.0,
5835
+ "step": 831
5836
+ },
5837
+ {
5838
+ "epoch": 8.32e-05,
5839
+ "grad_norm": 12.373382568359375,
5840
+ "learning_rate": 8.31e-06,
5841
+ "loss": 65.6875,
5842
+ "step": 832
5843
+ },
5844
+ {
5845
+ "epoch": 8.33e-05,
5846
+ "grad_norm": 11.9176607131958,
5847
+ "learning_rate": 8.32e-06,
5848
+ "loss": 65.875,
5849
+ "step": 833
5850
+ },
5851
+ {
5852
+ "epoch": 8.34e-05,
5853
+ "grad_norm": 12.355897903442383,
5854
+ "learning_rate": 8.330000000000002e-06,
5855
+ "loss": 65.9375,
5856
+ "step": 834
5857
+ },
5858
+ {
5859
+ "epoch": 8.35e-05,
5860
+ "grad_norm": 12.412797927856445,
5861
+ "learning_rate": 8.34e-06,
5862
+ "loss": 65.75,
5863
+ "step": 835
5864
+ },
5865
+ {
5866
+ "epoch": 8.36e-05,
5867
+ "grad_norm": 12.29367733001709,
5868
+ "learning_rate": 8.35e-06,
5869
+ "loss": 65.8125,
5870
+ "step": 836
5871
+ },
5872
+ {
5873
+ "epoch": 8.37e-05,
5874
+ "grad_norm": 12.362329483032227,
5875
+ "learning_rate": 8.36e-06,
5876
+ "loss": 65.5625,
5877
+ "step": 837
5878
+ },
5879
+ {
5880
+ "epoch": 8.38e-05,
5881
+ "grad_norm": 12.16931438446045,
5882
+ "learning_rate": 8.37e-06,
5883
+ "loss": 65.875,
5884
+ "step": 838
5885
+ },
5886
+ {
5887
+ "epoch": 8.39e-05,
5888
+ "grad_norm": 12.073834419250488,
5889
+ "learning_rate": 8.380000000000001e-06,
5890
+ "loss": 65.6875,
5891
+ "step": 839
5892
+ },
5893
+ {
5894
+ "epoch": 8.4e-05,
5895
+ "grad_norm": 12.422995567321777,
5896
+ "learning_rate": 8.39e-06,
5897
+ "loss": 65.5,
5898
+ "step": 840
5899
+ },
5900
+ {
5901
+ "epoch": 8.41e-05,
5902
+ "grad_norm": 12.235881805419922,
5903
+ "learning_rate": 8.400000000000001e-06,
5904
+ "loss": 65.5,
5905
+ "step": 841
5906
+ },
5907
+ {
5908
+ "epoch": 8.42e-05,
5909
+ "grad_norm": 11.909436225891113,
5910
+ "learning_rate": 8.41e-06,
5911
+ "loss": 65.6875,
5912
+ "step": 842
5913
+ },
5914
+ {
5915
+ "epoch": 8.43e-05,
5916
+ "grad_norm": 12.046891212463379,
5917
+ "learning_rate": 8.419999999999999e-06,
5918
+ "loss": 65.6875,
5919
+ "step": 843
5920
+ },
5921
+ {
5922
+ "epoch": 8.44e-05,
5923
+ "grad_norm": 12.069416999816895,
5924
+ "learning_rate": 8.43e-06,
5925
+ "loss": 65.25,
5926
+ "step": 844
5927
+ },
5928
+ {
5929
+ "epoch": 8.45e-05,
5930
+ "grad_norm": 11.80577278137207,
5931
+ "learning_rate": 8.44e-06,
5932
+ "loss": 65.375,
5933
+ "step": 845
5934
+ },
5935
+ {
5936
+ "epoch": 8.46e-05,
5937
+ "grad_norm": 11.65450382232666,
5938
+ "learning_rate": 8.45e-06,
5939
+ "loss": 65.5,
5940
+ "step": 846
5941
+ },
5942
+ {
5943
+ "epoch": 8.47e-05,
5944
+ "grad_norm": 11.70303726196289,
5945
+ "learning_rate": 8.46e-06,
5946
+ "loss": 65.4375,
5947
+ "step": 847
5948
+ },
5949
+ {
5950
+ "epoch": 8.48e-05,
5951
+ "grad_norm": 12.221776008605957,
5952
+ "learning_rate": 8.47e-06,
5953
+ "loss": 65.25,
5954
+ "step": 848
5955
+ },
5956
+ {
5957
+ "epoch": 8.49e-05,
5958
+ "grad_norm": 11.785761833190918,
5959
+ "learning_rate": 8.48e-06,
5960
+ "loss": 65.4375,
5961
+ "step": 849
5962
+ },
5963
+ {
5964
+ "epoch": 8.5e-05,
5965
+ "grad_norm": 12.039600372314453,
5966
+ "learning_rate": 8.49e-06,
5967
+ "loss": 65.25,
5968
+ "step": 850
5969
+ },
5970
+ {
5971
+ "epoch": 8.51e-05,
5972
+ "grad_norm": 11.632035255432129,
5973
+ "learning_rate": 8.500000000000002e-06,
5974
+ "loss": 65.4375,
5975
+ "step": 851
5976
+ },
5977
+ {
5978
+ "epoch": 8.52e-05,
5979
+ "grad_norm": 11.960968017578125,
5980
+ "learning_rate": 8.51e-06,
5981
+ "loss": 65.125,
5982
+ "step": 852
5983
+ },
5984
+ {
5985
+ "epoch": 8.53e-05,
5986
+ "grad_norm": 11.791102409362793,
5987
+ "learning_rate": 8.52e-06,
5988
+ "loss": 65.0625,
5989
+ "step": 853
5990
+ },
5991
+ {
5992
+ "epoch": 8.54e-05,
5993
+ "grad_norm": 11.28836727142334,
5994
+ "learning_rate": 8.53e-06,
5995
+ "loss": 65.5,
5996
+ "step": 854
5997
+ },
5998
+ {
5999
+ "epoch": 8.55e-05,
6000
+ "grad_norm": 11.553174018859863,
6001
+ "learning_rate": 8.54e-06,
6002
+ "loss": 65.125,
6003
+ "step": 855
6004
+ },
6005
+ {
6006
+ "epoch": 8.56e-05,
6007
+ "grad_norm": 11.61713981628418,
6008
+ "learning_rate": 8.550000000000001e-06,
6009
+ "loss": 65.0625,
6010
+ "step": 856
6011
+ },
6012
+ {
6013
+ "epoch": 8.57e-05,
6014
+ "grad_norm": 11.704262733459473,
6015
+ "learning_rate": 8.56e-06,
6016
+ "loss": 64.8125,
6017
+ "step": 857
6018
+ },
6019
+ {
6020
+ "epoch": 8.58e-05,
6021
+ "grad_norm": 11.702911376953125,
6022
+ "learning_rate": 8.57e-06,
6023
+ "loss": 64.875,
6024
+ "step": 858
6025
+ },
6026
+ {
6027
+ "epoch": 8.59e-05,
6028
+ "grad_norm": 11.293460845947266,
6029
+ "learning_rate": 8.580000000000001e-06,
6030
+ "loss": 65.0,
6031
+ "step": 859
6032
+ },
6033
+ {
6034
+ "epoch": 8.6e-05,
6035
+ "grad_norm": 11.456600189208984,
6036
+ "learning_rate": 8.589999999999999e-06,
6037
+ "loss": 64.7812,
6038
+ "step": 860
6039
+ },
6040
+ {
6041
+ "epoch": 8.61e-05,
6042
+ "grad_norm": 11.203218460083008,
6043
+ "learning_rate": 8.6e-06,
6044
+ "loss": 65.125,
6045
+ "step": 861
6046
+ },
6047
+ {
6048
+ "epoch": 8.62e-05,
6049
+ "grad_norm": 11.105107307434082,
6050
+ "learning_rate": 8.61e-06,
6051
+ "loss": 65.0,
6052
+ "step": 862
6053
+ },
6054
+ {
6055
+ "epoch": 8.63e-05,
6056
+ "grad_norm": 11.6665620803833,
6057
+ "learning_rate": 8.62e-06,
6058
+ "loss": 64.625,
6059
+ "step": 863
6060
+ },
6061
+ {
6062
+ "epoch": 8.64e-05,
6063
+ "grad_norm": 11.519858360290527,
6064
+ "learning_rate": 8.63e-06,
6065
+ "loss": 64.8125,
6066
+ "step": 864
6067
+ },
6068
+ {
6069
+ "epoch": 8.65e-05,
6070
+ "grad_norm": 11.364030838012695,
6071
+ "learning_rate": 8.64e-06,
6072
+ "loss": 64.8125,
6073
+ "step": 865
6074
+ },
6075
+ {
6076
+ "epoch": 8.66e-05,
6077
+ "grad_norm": 11.232035636901855,
6078
+ "learning_rate": 8.65e-06,
6079
+ "loss": 64.8125,
6080
+ "step": 866
6081
+ },
6082
+ {
6083
+ "epoch": 8.67e-05,
6084
+ "grad_norm": 11.68748950958252,
6085
+ "learning_rate": 8.66e-06,
6086
+ "loss": 64.5,
6087
+ "step": 867
6088
+ },
6089
+ {
6090
+ "epoch": 8.68e-05,
6091
+ "grad_norm": 10.944489479064941,
6092
+ "learning_rate": 8.67e-06,
6093
+ "loss": 64.875,
6094
+ "step": 868
6095
+ },
6096
+ {
6097
+ "epoch": 8.69e-05,
6098
+ "grad_norm": 11.102535247802734,
6099
+ "learning_rate": 8.68e-06,
6100
+ "loss": 64.5,
6101
+ "step": 869
6102
+ },
6103
+ {
6104
+ "epoch": 8.7e-05,
6105
+ "grad_norm": 10.927079200744629,
6106
+ "learning_rate": 8.69e-06,
6107
+ "loss": 64.9375,
6108
+ "step": 870
6109
+ },
6110
+ {
6111
+ "epoch": 8.71e-05,
6112
+ "grad_norm": 10.939054489135742,
6113
+ "learning_rate": 8.7e-06,
6114
+ "loss": 64.4688,
6115
+ "step": 871
6116
+ },
6117
+ {
6118
+ "epoch": 8.72e-05,
6119
+ "grad_norm": 11.497695922851562,
6120
+ "learning_rate": 8.71e-06,
6121
+ "loss": 64.625,
6122
+ "step": 872
6123
+ },
6124
+ {
6125
+ "epoch": 8.73e-05,
6126
+ "grad_norm": 11.157756805419922,
6127
+ "learning_rate": 8.720000000000001e-06,
6128
+ "loss": 64.8125,
6129
+ "step": 873
6130
+ },
6131
+ {
6132
+ "epoch": 8.74e-05,
6133
+ "grad_norm": 10.892260551452637,
6134
+ "learning_rate": 8.73e-06,
6135
+ "loss": 64.5,
6136
+ "step": 874
6137
+ },
6138
+ {
6139
+ "epoch": 8.75e-05,
6140
+ "grad_norm": 10.81840991973877,
6141
+ "learning_rate": 8.74e-06,
6142
+ "loss": 64.625,
6143
+ "step": 875
6144
+ },
6145
+ {
6146
+ "epoch": 8.76e-05,
6147
+ "grad_norm": 10.906977653503418,
6148
+ "learning_rate": 8.750000000000001e-06,
6149
+ "loss": 64.4062,
6150
+ "step": 876
6151
+ },
6152
+ {
6153
+ "epoch": 8.77e-05,
6154
+ "grad_norm": 10.821674346923828,
6155
+ "learning_rate": 8.759999999999999e-06,
6156
+ "loss": 64.5,
6157
+ "step": 877
6158
+ },
6159
+ {
6160
+ "epoch": 8.78e-05,
6161
+ "grad_norm": 10.746541023254395,
6162
+ "learning_rate": 8.77e-06,
6163
+ "loss": 64.5625,
6164
+ "step": 878
6165
+ },
6166
+ {
6167
+ "epoch": 8.79e-05,
6168
+ "grad_norm": 11.237996101379395,
6169
+ "learning_rate": 8.78e-06,
6170
+ "loss": 64.125,
6171
+ "step": 879
6172
+ },
6173
+ {
6174
+ "epoch": 8.8e-05,
6175
+ "grad_norm": 11.085122108459473,
6176
+ "learning_rate": 8.79e-06,
6177
+ "loss": 64.25,
6178
+ "step": 880
6179
+ },
6180
+ {
6181
+ "epoch": 8.81e-05,
6182
+ "grad_norm": 10.644852638244629,
6183
+ "learning_rate": 8.8e-06,
6184
+ "loss": 64.4062,
6185
+ "step": 881
6186
+ },
6187
+ {
6188
+ "epoch": 8.82e-05,
6189
+ "grad_norm": 10.829193115234375,
6190
+ "learning_rate": 8.81e-06,
6191
+ "loss": 64.0938,
6192
+ "step": 882
6193
+ },
6194
+ {
6195
+ "epoch": 8.83e-05,
6196
+ "grad_norm": 10.780670166015625,
6197
+ "learning_rate": 8.82e-06,
6198
+ "loss": 64.4062,
6199
+ "step": 883
6200
+ },
6201
+ {
6202
+ "epoch": 8.84e-05,
6203
+ "grad_norm": 10.732144355773926,
6204
+ "learning_rate": 8.83e-06,
6205
+ "loss": 64.4375,
6206
+ "step": 884
6207
+ },
6208
+ {
6209
+ "epoch": 8.85e-05,
6210
+ "grad_norm": 10.749310493469238,
6211
+ "learning_rate": 8.84e-06,
6212
+ "loss": 64.0625,
6213
+ "step": 885
6214
+ },
6215
+ {
6216
+ "epoch": 8.86e-05,
6217
+ "grad_norm": 10.903584480285645,
6218
+ "learning_rate": 8.85e-06,
6219
+ "loss": 63.8125,
6220
+ "step": 886
6221
+ },
6222
+ {
6223
+ "epoch": 8.87e-05,
6224
+ "grad_norm": 10.870782852172852,
6225
+ "learning_rate": 8.86e-06,
6226
+ "loss": 63.5625,
6227
+ "step": 887
6228
+ },
6229
+ {
6230
+ "epoch": 8.88e-05,
6231
+ "grad_norm": 10.494131088256836,
6232
+ "learning_rate": 8.87e-06,
6233
+ "loss": 64.2188,
6234
+ "step": 888
6235
+ },
6236
+ {
6237
+ "epoch": 8.89e-05,
6238
+ "grad_norm": 10.62818717956543,
6239
+ "learning_rate": 8.88e-06,
6240
+ "loss": 64.3438,
6241
+ "step": 889
6242
+ },
6243
+ {
6244
+ "epoch": 8.9e-05,
6245
+ "grad_norm": 10.806772232055664,
6246
+ "learning_rate": 8.890000000000001e-06,
6247
+ "loss": 63.7812,
6248
+ "step": 890
6249
+ },
6250
+ {
6251
+ "epoch": 8.91e-05,
6252
+ "grad_norm": 10.955464363098145,
6253
+ "learning_rate": 8.9e-06,
6254
+ "loss": 63.75,
6255
+ "step": 891
6256
+ },
6257
+ {
6258
+ "epoch": 8.92e-05,
6259
+ "grad_norm": 10.537474632263184,
6260
+ "learning_rate": 8.91e-06,
6261
+ "loss": 63.6875,
6262
+ "step": 892
6263
+ },
6264
+ {
6265
+ "epoch": 8.93e-05,
6266
+ "grad_norm": 10.190065383911133,
6267
+ "learning_rate": 8.920000000000001e-06,
6268
+ "loss": 64.1562,
6269
+ "step": 893
6270
+ },
6271
+ {
6272
+ "epoch": 8.94e-05,
6273
+ "grad_norm": 10.386507987976074,
6274
+ "learning_rate": 8.93e-06,
6275
+ "loss": 63.9375,
6276
+ "step": 894
6277
+ },
6278
+ {
6279
+ "epoch": 8.95e-05,
6280
+ "grad_norm": 10.566305160522461,
6281
+ "learning_rate": 8.94e-06,
6282
+ "loss": 63.5,
6283
+ "step": 895
6284
+ },
6285
+ {
6286
+ "epoch": 8.96e-05,
6287
+ "grad_norm": 10.46068000793457,
6288
+ "learning_rate": 8.95e-06,
6289
+ "loss": 63.6562,
6290
+ "step": 896
6291
+ },
6292
+ {
6293
+ "epoch": 8.97e-05,
6294
+ "grad_norm": 10.569717407226562,
6295
+ "learning_rate": 8.96e-06,
6296
+ "loss": 63.5625,
6297
+ "step": 897
6298
+ },
6299
+ {
6300
+ "epoch": 8.98e-05,
6301
+ "grad_norm": 10.547991752624512,
6302
+ "learning_rate": 8.97e-06,
6303
+ "loss": 63.625,
6304
+ "step": 898
6305
+ },
6306
+ {
6307
+ "epoch": 8.99e-05,
6308
+ "grad_norm": 10.377073287963867,
6309
+ "learning_rate": 8.98e-06,
6310
+ "loss": 63.4688,
6311
+ "step": 899
6312
+ },
6313
+ {
6314
+ "epoch": 9e-05,
6315
+ "grad_norm": 10.192441940307617,
6316
+ "learning_rate": 8.99e-06,
6317
+ "loss": 63.75,
6318
+ "step": 900
6319
+ },
6320
+ {
6321
+ "epoch": 9.01e-05,
6322
+ "grad_norm": 10.346672058105469,
6323
+ "learning_rate": 9e-06,
6324
+ "loss": 63.5312,
6325
+ "step": 901
6326
+ },
6327
+ {
6328
+ "epoch": 9.02e-05,
6329
+ "grad_norm": 10.225927352905273,
6330
+ "learning_rate": 9.01e-06,
6331
+ "loss": 63.5,
6332
+ "step": 902
6333
+ },
6334
+ {
6335
+ "epoch": 9.03e-05,
6336
+ "grad_norm": 10.153120994567871,
6337
+ "learning_rate": 9.02e-06,
6338
+ "loss": 63.6562,
6339
+ "step": 903
6340
+ },
6341
+ {
6342
+ "epoch": 9.04e-05,
6343
+ "grad_norm": 10.05332088470459,
6344
+ "learning_rate": 9.03e-06,
6345
+ "loss": 63.6875,
6346
+ "step": 904
6347
+ },
6348
+ {
6349
+ "epoch": 9.05e-05,
6350
+ "grad_norm": 10.070221900939941,
6351
+ "learning_rate": 9.04e-06,
6352
+ "loss": 63.5938,
6353
+ "step": 905
6354
+ },
6355
+ {
6356
+ "epoch": 9.06e-05,
6357
+ "grad_norm": 10.313018798828125,
6358
+ "learning_rate": 9.05e-06,
6359
+ "loss": 63.3438,
6360
+ "step": 906
6361
+ },
6362
+ {
6363
+ "epoch": 9.07e-05,
6364
+ "grad_norm": 10.222672462463379,
6365
+ "learning_rate": 9.060000000000001e-06,
6366
+ "loss": 63.3438,
6367
+ "step": 907
6368
+ },
6369
+ {
6370
+ "epoch": 9.08e-05,
6371
+ "grad_norm": 10.298966407775879,
6372
+ "learning_rate": 9.07e-06,
6373
+ "loss": 63.3125,
6374
+ "step": 908
6375
+ },
6376
+ {
6377
+ "epoch": 9.09e-05,
6378
+ "grad_norm": 9.815750122070312,
6379
+ "learning_rate": 9.08e-06,
6380
+ "loss": 63.625,
6381
+ "step": 909
6382
+ },
6383
+ {
6384
+ "epoch": 9.1e-05,
6385
+ "grad_norm": 9.667336463928223,
6386
+ "learning_rate": 9.090000000000001e-06,
6387
+ "loss": 63.625,
6388
+ "step": 910
6389
+ },
6390
+ {
6391
+ "epoch": 9.11e-05,
6392
+ "grad_norm": 9.815160751342773,
6393
+ "learning_rate": 9.1e-06,
6394
+ "loss": 63.4062,
6395
+ "step": 911
6396
+ },
6397
+ {
6398
+ "epoch": 9.12e-05,
6399
+ "grad_norm": 10.02402114868164,
6400
+ "learning_rate": 9.110000000000001e-06,
6401
+ "loss": 63.1562,
6402
+ "step": 912
6403
+ },
6404
+ {
6405
+ "epoch": 9.13e-05,
6406
+ "grad_norm": 10.039457321166992,
6407
+ "learning_rate": 9.12e-06,
6408
+ "loss": 63.0,
6409
+ "step": 913
6410
+ },
6411
+ {
6412
+ "epoch": 9.14e-05,
6413
+ "grad_norm": 10.062280654907227,
6414
+ "learning_rate": 9.13e-06,
6415
+ "loss": 63.2812,
6416
+ "step": 914
6417
+ },
6418
+ {
6419
+ "epoch": 9.15e-05,
6420
+ "grad_norm": 9.866779327392578,
6421
+ "learning_rate": 9.14e-06,
6422
+ "loss": 63.5,
6423
+ "step": 915
6424
+ },
6425
+ {
6426
+ "epoch": 9.16e-05,
6427
+ "grad_norm": 9.92796516418457,
6428
+ "learning_rate": 9.15e-06,
6429
+ "loss": 63.0938,
6430
+ "step": 916
6431
+ },
6432
+ {
6433
+ "epoch": 9.17e-05,
6434
+ "grad_norm": 9.809439659118652,
6435
+ "learning_rate": 9.16e-06,
6436
+ "loss": 63.3438,
6437
+ "step": 917
6438
+ },
6439
+ {
6440
+ "epoch": 9.18e-05,
6441
+ "grad_norm": 9.726333618164062,
6442
+ "learning_rate": 9.17e-06,
6443
+ "loss": 63.4062,
6444
+ "step": 918
6445
+ },
6446
+ {
6447
+ "epoch": 9.19e-05,
6448
+ "grad_norm": 10.27943229675293,
6449
+ "learning_rate": 9.18e-06,
6450
+ "loss": 62.7812,
6451
+ "step": 919
6452
+ },
6453
+ {
6454
+ "epoch": 9.2e-05,
6455
+ "grad_norm": 9.923724174499512,
6456
+ "learning_rate": 9.19e-06,
6457
+ "loss": 63.125,
6458
+ "step": 920
6459
+ },
6460
+ {
6461
+ "epoch": 9.21e-05,
6462
+ "grad_norm": 9.776262283325195,
6463
+ "learning_rate": 9.2e-06,
6464
+ "loss": 63.1562,
6465
+ "step": 921
6466
+ },
6467
+ {
6468
+ "epoch": 9.22e-05,
6469
+ "grad_norm": 10.05753231048584,
6470
+ "learning_rate": 9.21e-06,
6471
+ "loss": 62.5312,
6472
+ "step": 922
6473
+ },
6474
+ {
6475
+ "epoch": 9.23e-05,
6476
+ "grad_norm": 10.067216873168945,
6477
+ "learning_rate": 9.22e-06,
6478
+ "loss": 62.6875,
6479
+ "step": 923
6480
+ },
6481
+ {
6482
+ "epoch": 9.24e-05,
6483
+ "grad_norm": 9.938775062561035,
6484
+ "learning_rate": 9.23e-06,
6485
+ "loss": 62.5312,
6486
+ "step": 924
6487
+ },
6488
+ {
6489
+ "epoch": 9.25e-05,
6490
+ "grad_norm": 9.898990631103516,
6491
+ "learning_rate": 9.24e-06,
6492
+ "loss": 62.5938,
6493
+ "step": 925
6494
+ },
6495
+ {
6496
+ "epoch": 9.26e-05,
6497
+ "grad_norm": 9.89505672454834,
6498
+ "learning_rate": 9.25e-06,
6499
+ "loss": 62.75,
6500
+ "step": 926
6501
+ },
6502
+ {
6503
+ "epoch": 9.27e-05,
6504
+ "grad_norm": 9.820269584655762,
6505
+ "learning_rate": 9.260000000000001e-06,
6506
+ "loss": 62.8125,
6507
+ "step": 927
6508
+ },
6509
+ {
6510
+ "epoch": 9.28e-05,
6511
+ "grad_norm": 9.785930633544922,
6512
+ "learning_rate": 9.27e-06,
6513
+ "loss": 62.7188,
6514
+ "step": 928
6515
+ },
6516
+ {
6517
+ "epoch": 9.29e-05,
6518
+ "grad_norm": 9.649805068969727,
6519
+ "learning_rate": 9.280000000000001e-06,
6520
+ "loss": 62.75,
6521
+ "step": 929
6522
+ },
6523
+ {
6524
+ "epoch": 9.3e-05,
6525
+ "grad_norm": 9.778607368469238,
6526
+ "learning_rate": 9.29e-06,
6527
+ "loss": 62.5312,
6528
+ "step": 930
6529
+ },
6530
+ {
6531
+ "epoch": 9.31e-05,
6532
+ "grad_norm": 9.530630111694336,
6533
+ "learning_rate": 9.299999999999999e-06,
6534
+ "loss": 62.75,
6535
+ "step": 931
6536
+ },
6537
+ {
6538
+ "epoch": 9.32e-05,
6539
+ "grad_norm": 9.770853042602539,
6540
+ "learning_rate": 9.31e-06,
6541
+ "loss": 62.375,
6542
+ "step": 932
6543
+ },
6544
+ {
6545
+ "epoch": 9.33e-05,
6546
+ "grad_norm": 9.653392791748047,
6547
+ "learning_rate": 9.32e-06,
6548
+ "loss": 62.6875,
6549
+ "step": 933
6550
+ },
6551
+ {
6552
+ "epoch": 9.34e-05,
6553
+ "grad_norm": 9.52197265625,
6554
+ "learning_rate": 9.33e-06,
6555
+ "loss": 62.4688,
6556
+ "step": 934
6557
+ },
6558
+ {
6559
+ "epoch": 9.35e-05,
6560
+ "grad_norm": 9.621892929077148,
6561
+ "learning_rate": 9.34e-06,
6562
+ "loss": 62.4375,
6563
+ "step": 935
6564
+ },
6565
+ {
6566
+ "epoch": 9.36e-05,
6567
+ "grad_norm": 9.49044132232666,
6568
+ "learning_rate": 9.35e-06,
6569
+ "loss": 63.0938,
6570
+ "step": 936
6571
+ },
6572
+ {
6573
+ "epoch": 9.37e-05,
6574
+ "grad_norm": 9.625767707824707,
6575
+ "learning_rate": 9.36e-06,
6576
+ "loss": 62.6875,
6577
+ "step": 937
6578
+ },
6579
+ {
6580
+ "epoch": 9.38e-05,
6581
+ "grad_norm": 9.668294906616211,
6582
+ "learning_rate": 9.37e-06,
6583
+ "loss": 62.2812,
6584
+ "step": 938
6585
+ },
6586
+ {
6587
+ "epoch": 9.39e-05,
6588
+ "grad_norm": 9.578293800354004,
6589
+ "learning_rate": 9.38e-06,
6590
+ "loss": 62.3125,
6591
+ "step": 939
6592
+ },
6593
+ {
6594
+ "epoch": 9.4e-05,
6595
+ "grad_norm": 9.879005432128906,
6596
+ "learning_rate": 9.39e-06,
6597
+ "loss": 62.3125,
6598
+ "step": 940
6599
+ },
6600
+ {
6601
+ "epoch": 9.41e-05,
6602
+ "grad_norm": 9.26992130279541,
6603
+ "learning_rate": 9.4e-06,
6604
+ "loss": 62.5938,
6605
+ "step": 941
6606
+ },
6607
+ {
6608
+ "epoch": 9.42e-05,
6609
+ "grad_norm": 9.741432189941406,
6610
+ "learning_rate": 9.41e-06,
6611
+ "loss": 62.125,
6612
+ "step": 942
6613
+ },
6614
+ {
6615
+ "epoch": 9.43e-05,
6616
+ "grad_norm": 9.542763710021973,
6617
+ "learning_rate": 9.42e-06,
6618
+ "loss": 62.4062,
6619
+ "step": 943
6620
+ },
6621
+ {
6622
+ "epoch": 9.44e-05,
6623
+ "grad_norm": 9.331867218017578,
6624
+ "learning_rate": 9.430000000000001e-06,
6625
+ "loss": 62.375,
6626
+ "step": 944
6627
+ },
6628
+ {
6629
+ "epoch": 9.45e-05,
6630
+ "grad_norm": 9.504682540893555,
6631
+ "learning_rate": 9.44e-06,
6632
+ "loss": 62.0312,
6633
+ "step": 945
6634
+ },
6635
+ {
6636
+ "epoch": 9.46e-05,
6637
+ "grad_norm": 9.386909484863281,
6638
+ "learning_rate": 9.450000000000001e-06,
6639
+ "loss": 62.375,
6640
+ "step": 946
6641
+ },
6642
+ {
6643
+ "epoch": 9.47e-05,
6644
+ "grad_norm": 9.543030738830566,
6645
+ "learning_rate": 9.460000000000001e-06,
6646
+ "loss": 62.0938,
6647
+ "step": 947
6648
+ },
6649
+ {
6650
+ "epoch": 9.48e-05,
6651
+ "grad_norm": 9.453919410705566,
6652
+ "learning_rate": 9.469999999999999e-06,
6653
+ "loss": 62.375,
6654
+ "step": 948
6655
+ },
6656
+ {
6657
+ "epoch": 9.49e-05,
6658
+ "grad_norm": 9.36674690246582,
6659
+ "learning_rate": 9.48e-06,
6660
+ "loss": 62.0312,
6661
+ "step": 949
6662
+ },
6663
+ {
6664
+ "epoch": 9.5e-05,
6665
+ "grad_norm": 9.571072578430176,
6666
+ "learning_rate": 9.49e-06,
6667
+ "loss": 62.0312,
6668
+ "step": 950
6669
+ },
6670
+ {
6671
+ "epoch": 9.51e-05,
6672
+ "grad_norm": 9.615150451660156,
6673
+ "learning_rate": 9.5e-06,
6674
+ "loss": 61.8125,
6675
+ "step": 951
6676
+ },
6677
+ {
6678
+ "epoch": 9.52e-05,
6679
+ "grad_norm": 9.266009330749512,
6680
+ "learning_rate": 9.51e-06,
6681
+ "loss": 62.375,
6682
+ "step": 952
6683
+ },
6684
+ {
6685
+ "epoch": 9.53e-05,
6686
+ "grad_norm": 9.239862442016602,
6687
+ "learning_rate": 9.52e-06,
6688
+ "loss": 62.3438,
6689
+ "step": 953
6690
+ },
6691
+ {
6692
+ "epoch": 9.54e-05,
6693
+ "grad_norm": 9.388384819030762,
6694
+ "learning_rate": 9.53e-06,
6695
+ "loss": 61.9062,
6696
+ "step": 954
6697
+ },
6698
+ {
6699
+ "epoch": 9.55e-05,
6700
+ "grad_norm": 9.385138511657715,
6701
+ "learning_rate": 9.54e-06,
6702
+ "loss": 61.9375,
6703
+ "step": 955
6704
+ },
6705
+ {
6706
+ "epoch": 9.56e-05,
6707
+ "grad_norm": 9.419622421264648,
6708
+ "learning_rate": 9.55e-06,
6709
+ "loss": 61.7188,
6710
+ "step": 956
6711
+ },
6712
+ {
6713
+ "epoch": 9.57e-05,
6714
+ "grad_norm": 9.560093879699707,
6715
+ "learning_rate": 9.56e-06,
6716
+ "loss": 61.6562,
6717
+ "step": 957
6718
+ },
6719
+ {
6720
+ "epoch": 9.58e-05,
6721
+ "grad_norm": 9.4889554977417,
6722
+ "learning_rate": 9.57e-06,
6723
+ "loss": 62.0625,
6724
+ "step": 958
6725
+ },
6726
+ {
6727
+ "epoch": 9.59e-05,
6728
+ "grad_norm": 8.988265991210938,
6729
+ "learning_rate": 9.58e-06,
6730
+ "loss": 62.3125,
6731
+ "step": 959
6732
+ },
6733
+ {
6734
+ "epoch": 9.6e-05,
6735
+ "grad_norm": 9.52734661102295,
6736
+ "learning_rate": 9.59e-06,
6737
+ "loss": 61.5312,
6738
+ "step": 960
6739
+ },
6740
+ {
6741
+ "epoch": 9.61e-05,
6742
+ "grad_norm": 9.097179412841797,
6743
+ "learning_rate": 9.600000000000001e-06,
6744
+ "loss": 62.1562,
6745
+ "step": 961
6746
+ },
6747
+ {
6748
+ "epoch": 9.62e-05,
6749
+ "grad_norm": 9.454554557800293,
6750
+ "learning_rate": 9.61e-06,
6751
+ "loss": 61.4688,
6752
+ "step": 962
6753
+ },
6754
+ {
6755
+ "epoch": 9.63e-05,
6756
+ "grad_norm": 9.153898239135742,
6757
+ "learning_rate": 9.620000000000001e-06,
6758
+ "loss": 62.125,
6759
+ "step": 963
6760
+ },
6761
+ {
6762
+ "epoch": 9.64e-05,
6763
+ "grad_norm": 9.330815315246582,
6764
+ "learning_rate": 9.630000000000001e-06,
6765
+ "loss": 61.6562,
6766
+ "step": 964
6767
+ },
6768
+ {
6769
+ "epoch": 9.65e-05,
6770
+ "grad_norm": 9.72275161743164,
6771
+ "learning_rate": 9.64e-06,
6772
+ "loss": 61.3125,
6773
+ "step": 965
6774
+ },
6775
+ {
6776
+ "epoch": 9.66e-05,
6777
+ "grad_norm": 9.388081550598145,
6778
+ "learning_rate": 9.65e-06,
6779
+ "loss": 61.4688,
6780
+ "step": 966
6781
+ },
6782
+ {
6783
+ "epoch": 9.67e-05,
6784
+ "grad_norm": 9.236139297485352,
6785
+ "learning_rate": 9.66e-06,
6786
+ "loss": 61.5938,
6787
+ "step": 967
6788
+ },
6789
+ {
6790
+ "epoch": 9.68e-05,
6791
+ "grad_norm": 9.027604103088379,
6792
+ "learning_rate": 9.67e-06,
6793
+ "loss": 62.0312,
6794
+ "step": 968
6795
+ },
6796
+ {
6797
+ "epoch": 9.69e-05,
6798
+ "grad_norm": 9.193624496459961,
6799
+ "learning_rate": 9.68e-06,
6800
+ "loss": 61.5,
6801
+ "step": 969
6802
+ },
6803
+ {
6804
+ "epoch": 9.7e-05,
6805
+ "grad_norm": 9.14177131652832,
6806
+ "learning_rate": 9.69e-06,
6807
+ "loss": 61.75,
6808
+ "step": 970
6809
+ },
6810
+ {
6811
+ "epoch": 9.71e-05,
6812
+ "grad_norm": 9.21058177947998,
6813
+ "learning_rate": 9.7e-06,
6814
+ "loss": 61.75,
6815
+ "step": 971
6816
+ },
6817
+ {
6818
+ "epoch": 9.72e-05,
6819
+ "grad_norm": 9.118620872497559,
6820
+ "learning_rate": 9.71e-06,
6821
+ "loss": 61.75,
6822
+ "step": 972
6823
+ },
6824
+ {
6825
+ "epoch": 9.73e-05,
6826
+ "grad_norm": 9.088716506958008,
6827
+ "learning_rate": 9.72e-06,
6828
+ "loss": 61.625,
6829
+ "step": 973
6830
+ },
6831
+ {
6832
+ "epoch": 9.74e-05,
6833
+ "grad_norm": 9.206241607666016,
6834
+ "learning_rate": 9.73e-06,
6835
+ "loss": 61.25,
6836
+ "step": 974
6837
+ },
6838
+ {
6839
+ "epoch": 9.75e-05,
6840
+ "grad_norm": 9.151651382446289,
6841
+ "learning_rate": 9.74e-06,
6842
+ "loss": 61.2188,
6843
+ "step": 975
6844
+ },
6845
+ {
6846
+ "epoch": 9.76e-05,
6847
+ "grad_norm": 9.240182876586914,
6848
+ "learning_rate": 9.75e-06,
6849
+ "loss": 61.4688,
6850
+ "step": 976
6851
+ },
6852
+ {
6853
+ "epoch": 9.77e-05,
6854
+ "grad_norm": 9.352987289428711,
6855
+ "learning_rate": 9.76e-06,
6856
+ "loss": 61.25,
6857
+ "step": 977
6858
+ },
6859
+ {
6860
+ "epoch": 9.78e-05,
6861
+ "grad_norm": 9.508001327514648,
6862
+ "learning_rate": 9.770000000000001e-06,
6863
+ "loss": 61.0,
6864
+ "step": 978
6865
+ },
6866
+ {
6867
+ "epoch": 9.79e-05,
6868
+ "grad_norm": 8.883227348327637,
6869
+ "learning_rate": 9.78e-06,
6870
+ "loss": 62.0,
6871
+ "step": 979
6872
+ },
6873
+ {
6874
+ "epoch": 9.8e-05,
6875
+ "grad_norm": 9.131720542907715,
6876
+ "learning_rate": 9.790000000000001e-06,
6877
+ "loss": 61.125,
6878
+ "step": 980
6879
+ },
6880
+ {
6881
+ "epoch": 9.81e-05,
6882
+ "grad_norm": 9.582850456237793,
6883
+ "learning_rate": 9.800000000000001e-06,
6884
+ "loss": 60.6875,
6885
+ "step": 981
6886
+ },
6887
+ {
6888
+ "epoch": 9.82e-05,
6889
+ "grad_norm": 9.125800132751465,
6890
+ "learning_rate": 9.81e-06,
6891
+ "loss": 61.0625,
6892
+ "step": 982
6893
+ },
6894
+ {
6895
+ "epoch": 9.83e-05,
6896
+ "grad_norm": 9.01922607421875,
6897
+ "learning_rate": 9.820000000000001e-06,
6898
+ "loss": 61.125,
6899
+ "step": 983
6900
+ },
6901
+ {
6902
+ "epoch": 9.84e-05,
6903
+ "grad_norm": 9.03088092803955,
6904
+ "learning_rate": 9.83e-06,
6905
+ "loss": 61.0625,
6906
+ "step": 984
6907
+ },
6908
+ {
6909
+ "epoch": 9.85e-05,
6910
+ "grad_norm": 9.083083152770996,
6911
+ "learning_rate": 9.84e-06,
6912
+ "loss": 61.0312,
6913
+ "step": 985
6914
+ },
6915
+ {
6916
+ "epoch": 9.86e-05,
6917
+ "grad_norm": 9.077742576599121,
6918
+ "learning_rate": 9.85e-06,
6919
+ "loss": 61.0312,
6920
+ "step": 986
6921
+ },
6922
+ {
6923
+ "epoch": 9.87e-05,
6924
+ "grad_norm": 9.057887077331543,
6925
+ "learning_rate": 9.859999999999999e-06,
6926
+ "loss": 60.875,
6927
+ "step": 987
6928
+ },
6929
+ {
6930
+ "epoch": 9.88e-05,
6931
+ "grad_norm": 8.55154800415039,
6932
+ "learning_rate": 9.87e-06,
6933
+ "loss": 62.375,
6934
+ "step": 988
6935
+ },
6936
+ {
6937
+ "epoch": 9.89e-05,
6938
+ "grad_norm": 8.761177062988281,
6939
+ "learning_rate": 9.88e-06,
6940
+ "loss": 61.125,
6941
+ "step": 989
6942
+ },
6943
+ {
6944
+ "epoch": 9.9e-05,
6945
+ "grad_norm": 9.016249656677246,
6946
+ "learning_rate": 9.89e-06,
6947
+ "loss": 61.2188,
6948
+ "step": 990
6949
+ },
6950
+ {
6951
+ "epoch": 9.91e-05,
6952
+ "grad_norm": 8.754048347473145,
6953
+ "learning_rate": 9.9e-06,
6954
+ "loss": 61.1875,
6955
+ "step": 991
6956
+ },
6957
+ {
6958
+ "epoch": 9.92e-05,
6959
+ "grad_norm": 9.200346946716309,
6960
+ "learning_rate": 9.91e-06,
6961
+ "loss": 60.5,
6962
+ "step": 992
6963
+ },
6964
+ {
6965
+ "epoch": 9.93e-05,
6966
+ "grad_norm": 8.876838684082031,
6967
+ "learning_rate": 9.92e-06,
6968
+ "loss": 61.0,
6969
+ "step": 993
6970
+ },
6971
+ {
6972
+ "epoch": 9.94e-05,
6973
+ "grad_norm": 8.757495880126953,
6974
+ "learning_rate": 9.93e-06,
6975
+ "loss": 61.4062,
6976
+ "step": 994
6977
+ },
6978
+ {
6979
+ "epoch": 9.95e-05,
6980
+ "grad_norm": 8.726040840148926,
6981
+ "learning_rate": 9.940000000000001e-06,
6982
+ "loss": 61.6562,
6983
+ "step": 995
6984
+ },
6985
+ {
6986
+ "epoch": 9.96e-05,
6987
+ "grad_norm": 8.834831237792969,
6988
+ "learning_rate": 9.95e-06,
6989
+ "loss": 61.125,
6990
+ "step": 996
6991
+ },
6992
+ {
6993
+ "epoch": 9.97e-05,
6994
+ "grad_norm": 8.835776329040527,
6995
+ "learning_rate": 9.96e-06,
6996
+ "loss": 60.6562,
6997
+ "step": 997
6998
+ },
6999
+ {
7000
+ "epoch": 9.98e-05,
7001
+ "grad_norm": 8.862153053283691,
7002
+ "learning_rate": 9.970000000000001e-06,
7003
+ "loss": 61.0312,
7004
+ "step": 998
7005
+ },
7006
+ {
7007
+ "epoch": 9.99e-05,
7008
+ "grad_norm": 9.10541820526123,
7009
+ "learning_rate": 9.98e-06,
7010
+ "loss": 60.5625,
7011
+ "step": 999
7012
+ },
7013
+ {
7014
+ "epoch": 0.0001,
7015
+ "grad_norm": 9.29044246673584,
7016
+ "learning_rate": 9.990000000000001e-06,
7017
+ "loss": 60.375,
7018
+ "step": 1000
7019
+ },
7020
+ {
7021
+ "epoch": 0.0001,
7022
+ "eval_loss": 7.538768768310547,
7023
+ "eval_runtime": 364.1443,
7024
+ "eval_samples_per_second": 27.462,
7025
+ "eval_steps_per_second": 1.716,
7026
+ "step": 1000
7027
  }
7028
  ],
7029
  "logging_steps": 1,