PurplelinkPL commited on
Commit
149c317
·
verified ·
1 Parent(s): 6ad49d8

Upload 10 files

Browse files
Files changed (6) hide show
  1. model.safetensors +1 -1
  2. optimizer.pt +1 -1
  3. rng_state.pth +1 -1
  4. scheduler.pt +1 -1
  5. trainer_state.json +3201 -3
  6. training_args.bin +1 -1
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ff261834fa34536f963b44d61629d171e8297d50ec29c9ecd77e55f8f4e30a75
3
  size 598635032
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4e557f28dc70179a12b755e5e60b628849ed5aef82d5494c23999f1e52f3551
3
  size 598635032
optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0f2cf42e7a86053bde9a697bcec92154da3f0357dc3b6970a4a5c01522d0c4e6
3
  size 1197359627
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93bba69e56e6b384a3969e59f495cf9a1964ef6b3dd15dc118184bc7ad1cbb79
3
  size 1197359627
rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:159e82523ca477221cb6ee71e6e1fe789822217510366cfeda983df59cb19ad5
3
  size 14645
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:119a4626b2861d53d5e22a804e127273c20e502df505ffeabd204f28a1b0f1bb
3
  size 14645
scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c6edc5e7ebf57018d51595ed4fff24582a6a8bfe9d84e42ed6a378983c113ffb
3
  size 1465
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1273113bfb573ac8637333edb62886abe994c2a9319c15312bcea1286ef43c5b
3
  size 1465
trainer_state.json CHANGED
@@ -2,9 +2,9 @@
2
  "best_global_step": null,
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
- "epoch": 0.0390714393360088,
6
  "eval_steps": 1000,
7
- "global_step": 229000,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
@@ -17877,6 +17877,3204 @@
17877
  "eval_samples_per_second": 198.036,
17878
  "eval_steps_per_second": 1.554,
17879
  "step": 229000
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17880
  }
17881
  ],
17882
  "logging_steps": 100,
@@ -17896,7 +21094,7 @@
17896
  "attributes": {}
17897
  }
17898
  },
17899
- "total_flos": 1.9985381902516224e+19,
17900
  "train_batch_size": 128,
17901
  "trial_name": null,
17902
  "trial_params": null
 
2
  "best_global_step": null,
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
+ "epoch": 0.0446530735268672,
6
  "eval_steps": 1000,
7
+ "global_step": 270000,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
 
17877
  "eval_samples_per_second": 198.036,
17878
  "eval_steps_per_second": 1.554,
17879
  "step": 229000
17880
+ },
17881
+ {
17882
+ "epoch": 0.00027908170954291995,
17883
+ "grad_norm": 2.2197189331054688,
17884
+ "learning_rate": 1.4861905406757642e-05,
17885
+ "loss": 2.2665,
17886
+ "step": 229100
17887
+ },
17888
+ {
17889
+ "epoch": 0.0005581634190858399,
17890
+ "grad_norm": 2.3529903888702393,
17891
+ "learning_rate": 1.4841671114173825e-05,
17892
+ "loss": 2.2607,
17893
+ "step": 229200
17894
+ },
17895
+ {
17896
+ "epoch": 0.0008372451286287599,
17897
+ "grad_norm": 2.280348300933838,
17898
+ "learning_rate": 1.4821444788985119e-05,
17899
+ "loss": 2.2596,
17900
+ "step": 229300
17901
+ },
17902
+ {
17903
+ "epoch": 0.0011163268381716798,
17904
+ "grad_norm": 2.2226016521453857,
17905
+ "learning_rate": 1.4801226447055449e-05,
17906
+ "loss": 2.259,
17907
+ "step": 229400
17908
+ },
17909
+ {
17910
+ "epoch": 0.0013954085477146,
17911
+ "grad_norm": 2.238063335418701,
17912
+ "learning_rate": 1.4781016104242502e-05,
17913
+ "loss": 2.2592,
17914
+ "step": 229500
17915
+ },
17916
+ {
17917
+ "epoch": 0.0016744902572575198,
17918
+ "grad_norm": 2.200965642929077,
17919
+ "learning_rate": 1.476081377639768e-05,
17920
+ "loss": 2.255,
17921
+ "step": 229600
17922
+ },
17923
+ {
17924
+ "epoch": 0.00195357196680044,
17925
+ "grad_norm": 2.0392613410949707,
17926
+ "learning_rate": 1.4740619479366114e-05,
17927
+ "loss": 2.2506,
17928
+ "step": 229700
17929
+ },
17930
+ {
17931
+ "epoch": 0.0022326536763433596,
17932
+ "grad_norm": 2.3026771545410156,
17933
+ "learning_rate": 1.47204332289866e-05,
17934
+ "loss": 2.2568,
17935
+ "step": 229800
17936
+ },
17937
+ {
17938
+ "epoch": 0.0025117353858862797,
17939
+ "grad_norm": 2.056729555130005,
17940
+ "learning_rate": 1.4700255041091663e-05,
17941
+ "loss": 2.2553,
17942
+ "step": 229900
17943
+ },
17944
+ {
17945
+ "epoch": 0.0027908170954292,
17946
+ "grad_norm": 2.2352523803710938,
17947
+ "learning_rate": 1.4680084931507482e-05,
17948
+ "loss": 2.2435,
17949
+ "step": 230000
17950
+ },
17951
+ {
17952
+ "epoch": 0.0027908170954292,
17953
+ "eval_loss": 2.144761085510254,
17954
+ "eval_runtime": 52.5572,
17955
+ "eval_samples_per_second": 193.96,
17956
+ "eval_steps_per_second": 1.522,
17957
+ "step": 230000
17958
+ },
17959
+ {
17960
+ "epoch": 0.00306989880497212,
17961
+ "grad_norm": 2.1020162105560303,
17962
+ "learning_rate": 1.4659922916053925e-05,
17963
+ "loss": 2.2631,
17964
+ "step": 230100
17965
+ },
17966
+ {
17967
+ "epoch": 0.0033489805145150396,
17968
+ "grad_norm": 2.259777545928955,
17969
+ "learning_rate": 1.4639769010544466e-05,
17970
+ "loss": 2.2601,
17971
+ "step": 230200
17972
+ },
17973
+ {
17974
+ "epoch": 0.0036280622240579597,
17975
+ "grad_norm": 2.2175509929656982,
17976
+ "learning_rate": 1.4619623230786262e-05,
17977
+ "loss": 2.249,
17978
+ "step": 230300
17979
+ },
17980
+ {
17981
+ "epoch": 0.00390714393360088,
17982
+ "grad_norm": 2.2818946838378906,
17983
+ "learning_rate": 1.459948559258007e-05,
17984
+ "loss": 2.2602,
17985
+ "step": 230400
17986
+ },
17987
+ {
17988
+ "epoch": 0.0041862256431437995,
17989
+ "grad_norm": 2.2155439853668213,
17990
+ "learning_rate": 1.4579356111720282e-05,
17991
+ "loss": 2.2534,
17992
+ "step": 230500
17993
+ },
17994
+ {
17995
+ "epoch": 0.004465307352686719,
17996
+ "grad_norm": 2.2038073539733887,
17997
+ "learning_rate": 1.455923480399488e-05,
17998
+ "loss": 2.2556,
17999
+ "step": 230600
18000
+ },
18001
+ {
18002
+ "epoch": 0.00474438906222964,
18003
+ "grad_norm": 2.2248752117156982,
18004
+ "learning_rate": 1.4539121685185426e-05,
18005
+ "loss": 2.2457,
18006
+ "step": 230700
18007
+ },
18008
+ {
18009
+ "epoch": 0.005023470771772559,
18010
+ "grad_norm": 2.232311487197876,
18011
+ "learning_rate": 1.4519016771067073e-05,
18012
+ "loss": 2.2528,
18013
+ "step": 230800
18014
+ },
18015
+ {
18016
+ "epoch": 0.00530255248131548,
18017
+ "grad_norm": 2.3795855045318604,
18018
+ "learning_rate": 1.4498920077408551e-05,
18019
+ "loss": 2.2463,
18020
+ "step": 230900
18021
+ },
18022
+ {
18023
+ "epoch": 0.0055816341908584,
18024
+ "grad_norm": 2.296515941619873,
18025
+ "learning_rate": 1.4478831619972107e-05,
18026
+ "loss": 2.2423,
18027
+ "step": 231000
18028
+ },
18029
+ {
18030
+ "epoch": 0.0055816341908584,
18031
+ "eval_loss": 2.142361640930176,
18032
+ "eval_runtime": 51.7261,
18033
+ "eval_samples_per_second": 197.077,
18034
+ "eval_steps_per_second": 1.547,
18035
+ "step": 231000
18036
+ },
18037
+ {
18038
+ "epoch": 0.005860715900401319,
18039
+ "grad_norm": 2.3775389194488525,
18040
+ "learning_rate": 1.445875141451356e-05,
18041
+ "loss": 2.2486,
18042
+ "step": 231100
18043
+ },
18044
+ {
18045
+ "epoch": 0.00613979760994424,
18046
+ "grad_norm": 2.1756324768066406,
18047
+ "learning_rate": 1.4438679476782241e-05,
18048
+ "loss": 2.2403,
18049
+ "step": 231200
18050
+ },
18051
+ {
18052
+ "epoch": 0.0064188793194871595,
18053
+ "grad_norm": 2.409360408782959,
18054
+ "learning_rate": 1.4418615822521009e-05,
18055
+ "loss": 2.2332,
18056
+ "step": 231300
18057
+ },
18058
+ {
18059
+ "epoch": 0.006697961029030079,
18060
+ "grad_norm": 2.2292256355285645,
18061
+ "learning_rate": 1.4398560467466218e-05,
18062
+ "loss": 2.2484,
18063
+ "step": 231400
18064
+ },
18065
+ {
18066
+ "epoch": 0.006977042738573,
18067
+ "grad_norm": 2.317793369293213,
18068
+ "learning_rate": 1.43785134273477e-05,
18069
+ "loss": 2.2578,
18070
+ "step": 231500
18071
+ },
18072
+ {
18073
+ "epoch": 0.0072561244481159195,
18074
+ "grad_norm": 2.13801908493042,
18075
+ "learning_rate": 1.4358474717888787e-05,
18076
+ "loss": 2.2562,
18077
+ "step": 231600
18078
+ },
18079
+ {
18080
+ "epoch": 0.007535206157658839,
18081
+ "grad_norm": 2.161449432373047,
18082
+ "learning_rate": 1.4338444354806269e-05,
18083
+ "loss": 2.2434,
18084
+ "step": 231700
18085
+ },
18086
+ {
18087
+ "epoch": 0.00781428786720176,
18088
+ "grad_norm": 2.174821376800537,
18089
+ "learning_rate": 1.4318422353810395e-05,
18090
+ "loss": 2.2448,
18091
+ "step": 231800
18092
+ },
18093
+ {
18094
+ "epoch": 0.00809336957674468,
18095
+ "grad_norm": 2.315488338470459,
18096
+ "learning_rate": 1.4298408730604845e-05,
18097
+ "loss": 2.2507,
18098
+ "step": 231900
18099
+ },
18100
+ {
18101
+ "epoch": 0.008372451286287599,
18102
+ "grad_norm": 2.229074478149414,
18103
+ "learning_rate": 1.4278403500886716e-05,
18104
+ "loss": 2.2469,
18105
+ "step": 232000
18106
+ },
18107
+ {
18108
+ "epoch": 0.008372451286287599,
18109
+ "eval_loss": 2.145362377166748,
18110
+ "eval_runtime": 51.5487,
18111
+ "eval_samples_per_second": 197.755,
18112
+ "eval_steps_per_second": 1.552,
18113
+ "step": 232000
18114
+ },
18115
+ {
18116
+ "epoch": 0.008651532995830519,
18117
+ "grad_norm": 2.3216850757598877,
18118
+ "learning_rate": 1.4258406680346559e-05,
18119
+ "loss": 2.2483,
18120
+ "step": 232100
18121
+ },
18122
+ {
18123
+ "epoch": 0.008930614705373438,
18124
+ "grad_norm": 2.247835874557495,
18125
+ "learning_rate": 1.4238418284668309e-05,
18126
+ "loss": 2.229,
18127
+ "step": 232200
18128
+ },
18129
+ {
18130
+ "epoch": 0.00920969641491636,
18131
+ "grad_norm": 2.2360994815826416,
18132
+ "learning_rate": 1.4218438329529276e-05,
18133
+ "loss": 2.2504,
18134
+ "step": 232300
18135
+ },
18136
+ {
18137
+ "epoch": 0.00948877812445928,
18138
+ "grad_norm": 2.2599055767059326,
18139
+ "learning_rate": 1.4198466830600183e-05,
18140
+ "loss": 2.238,
18141
+ "step": 232400
18142
+ },
18143
+ {
18144
+ "epoch": 0.0097678598340022,
18145
+ "grad_norm": 2.299938440322876,
18146
+ "learning_rate": 1.4178503803545096e-05,
18147
+ "loss": 2.2389,
18148
+ "step": 232500
18149
+ },
18150
+ {
18151
+ "epoch": 0.010046941543545119,
18152
+ "grad_norm": 2.140632390975952,
18153
+ "learning_rate": 1.415854926402146e-05,
18154
+ "loss": 2.2454,
18155
+ "step": 232600
18156
+ },
18157
+ {
18158
+ "epoch": 0.010326023253088039,
18159
+ "grad_norm": 2.2972567081451416,
18160
+ "learning_rate": 1.4138603227680026e-05,
18161
+ "loss": 2.2421,
18162
+ "step": 232700
18163
+ },
18164
+ {
18165
+ "epoch": 0.01060510496263096,
18166
+ "grad_norm": 2.119060516357422,
18167
+ "learning_rate": 1.4118665710164908e-05,
18168
+ "loss": 2.25,
18169
+ "step": 232800
18170
+ },
18171
+ {
18172
+ "epoch": 0.01088418667217388,
18173
+ "grad_norm": 2.258012533187866,
18174
+ "learning_rate": 1.4098736727113529e-05,
18175
+ "loss": 2.2384,
18176
+ "step": 232900
18177
+ },
18178
+ {
18179
+ "epoch": 0.0111632683817168,
18180
+ "grad_norm": 2.2731425762176514,
18181
+ "learning_rate": 1.4078816294156626e-05,
18182
+ "loss": 2.2315,
18183
+ "step": 233000
18184
+ },
18185
+ {
18186
+ "epoch": 0.0111632683817168,
18187
+ "eval_loss": 2.1522159576416016,
18188
+ "eval_runtime": 51.7751,
18189
+ "eval_samples_per_second": 196.89,
18190
+ "eval_steps_per_second": 1.545,
18191
+ "step": 233000
18192
+ },
18193
+ {
18194
+ "epoch": 0.011442350091259719,
18195
+ "grad_norm": 2.2772982120513916,
18196
+ "learning_rate": 1.405890442691821e-05,
18197
+ "loss": 2.2507,
18198
+ "step": 233100
18199
+ },
18200
+ {
18201
+ "epoch": 0.011721431800802639,
18202
+ "grad_norm": 2.1224279403686523,
18203
+ "learning_rate": 1.4039001141015595e-05,
18204
+ "loss": 2.252,
18205
+ "step": 233200
18206
+ },
18207
+ {
18208
+ "epoch": 0.012000513510345558,
18209
+ "grad_norm": 2.3541483879089355,
18210
+ "learning_rate": 1.4019106452059338e-05,
18211
+ "loss": 2.2445,
18212
+ "step": 233300
18213
+ },
18214
+ {
18215
+ "epoch": 0.01227959521988848,
18216
+ "grad_norm": 2.249394416809082,
18217
+ "learning_rate": 1.399922037565329e-05,
18218
+ "loss": 2.2282,
18219
+ "step": 233400
18220
+ },
18221
+ {
18222
+ "epoch": 0.0125586769294314,
18223
+ "grad_norm": 2.2116713523864746,
18224
+ "learning_rate": 1.3979342927394509e-05,
18225
+ "loss": 2.2295,
18226
+ "step": 233500
18227
+ },
18228
+ {
18229
+ "epoch": 0.012837758638974319,
18230
+ "grad_norm": 2.350203275680542,
18231
+ "learning_rate": 1.3959474122873311e-05,
18232
+ "loss": 2.2294,
18233
+ "step": 233600
18234
+ },
18235
+ {
18236
+ "epoch": 0.013116840348517239,
18237
+ "grad_norm": 2.2494752407073975,
18238
+ "learning_rate": 1.3939613977673227e-05,
18239
+ "loss": 2.2258,
18240
+ "step": 233700
18241
+ },
18242
+ {
18243
+ "epoch": 0.013395922058060158,
18244
+ "grad_norm": 2.126502513885498,
18245
+ "learning_rate": 1.3919762507371007e-05,
18246
+ "loss": 2.2066,
18247
+ "step": 233800
18248
+ },
18249
+ {
18250
+ "epoch": 0.013675003767603078,
18251
+ "grad_norm": 2.2300145626068115,
18252
+ "learning_rate": 1.3899919727536559e-05,
18253
+ "loss": 2.2102,
18254
+ "step": 233900
18255
+ },
18256
+ {
18257
+ "epoch": 0.013954085477146,
18258
+ "grad_norm": 2.1797988414764404,
18259
+ "learning_rate": 1.3880085653733014e-05,
18260
+ "loss": 2.2193,
18261
+ "step": 234000
18262
+ },
18263
+ {
18264
+ "epoch": 0.013954085477146,
18265
+ "eval_loss": 2.137155532836914,
18266
+ "eval_runtime": 51.6878,
18267
+ "eval_samples_per_second": 197.222,
18268
+ "eval_steps_per_second": 1.548,
18269
+ "step": 234000
18270
+ },
18271
+ {
18272
+ "epoch": 0.01423316718668892,
18273
+ "grad_norm": 2.1567306518554688,
18274
+ "learning_rate": 1.3860260301516659e-05,
18275
+ "loss": 2.2073,
18276
+ "step": 234100
18277
+ },
18278
+ {
18279
+ "epoch": 0.014512248896231839,
18280
+ "grad_norm": 2.185314178466797,
18281
+ "learning_rate": 1.3840443686436949e-05,
18282
+ "loss": 2.2035,
18283
+ "step": 234200
18284
+ },
18285
+ {
18286
+ "epoch": 0.014791330605774759,
18287
+ "grad_norm": 2.074904203414917,
18288
+ "learning_rate": 1.3820635824036482e-05,
18289
+ "loss": 2.2055,
18290
+ "step": 234300
18291
+ },
18292
+ {
18293
+ "epoch": 0.015070412315317678,
18294
+ "grad_norm": 1.9893766641616821,
18295
+ "learning_rate": 1.3800836729850972e-05,
18296
+ "loss": 2.2006,
18297
+ "step": 234400
18298
+ },
18299
+ {
18300
+ "epoch": 0.015349494024860598,
18301
+ "grad_norm": 2.0517656803131104,
18302
+ "learning_rate": 1.3781046419409294e-05,
18303
+ "loss": 2.2047,
18304
+ "step": 234500
18305
+ },
18306
+ {
18307
+ "epoch": 0.01562857573440352,
18308
+ "grad_norm": 2.0877463817596436,
18309
+ "learning_rate": 1.3761264908233395e-05,
18310
+ "loss": 2.2147,
18311
+ "step": 234600
18312
+ },
18313
+ {
18314
+ "epoch": 0.015907657443946437,
18315
+ "grad_norm": 2.173692464828491,
18316
+ "learning_rate": 1.3741492211838353e-05,
18317
+ "loss": 2.2037,
18318
+ "step": 234700
18319
+ },
18320
+ {
18321
+ "epoch": 0.01618673915348936,
18322
+ "grad_norm": 2.1288397312164307,
18323
+ "learning_rate": 1.3721728345732299e-05,
18324
+ "loss": 2.2081,
18325
+ "step": 234800
18326
+ },
18327
+ {
18328
+ "epoch": 0.01646582086303228,
18329
+ "grad_norm": 2.2528440952301025,
18330
+ "learning_rate": 1.370197332541647e-05,
18331
+ "loss": 2.2004,
18332
+ "step": 234900
18333
+ },
18334
+ {
18335
+ "epoch": 0.016744902572575198,
18336
+ "grad_norm": 2.060171127319336,
18337
+ "learning_rate": 1.3682227166385148e-05,
18338
+ "loss": 2.1902,
18339
+ "step": 235000
18340
+ },
18341
+ {
18342
+ "epoch": 0.016744902572575198,
18343
+ "eval_loss": 2.1390364170074463,
18344
+ "eval_runtime": 51.7756,
18345
+ "eval_samples_per_second": 196.888,
18346
+ "eval_steps_per_second": 1.545,
18347
+ "step": 235000
18348
+ },
18349
+ {
18350
+ "epoch": 0.01702398428211812,
18351
+ "grad_norm": 2.295802593231201,
18352
+ "learning_rate": 1.3662489884125684e-05,
18353
+ "loss": 2.199,
18354
+ "step": 235100
18355
+ },
18356
+ {
18357
+ "epoch": 0.017303065991661037,
18358
+ "grad_norm": 2.1694698333740234,
18359
+ "learning_rate": 1.3642761494118426e-05,
18360
+ "loss": 2.1802,
18361
+ "step": 235200
18362
+ },
18363
+ {
18364
+ "epoch": 0.01758214770120396,
18365
+ "grad_norm": 2.199690818786621,
18366
+ "learning_rate": 1.3623042011836784e-05,
18367
+ "loss": 2.2079,
18368
+ "step": 235300
18369
+ },
18370
+ {
18371
+ "epoch": 0.017861229410746877,
18372
+ "grad_norm": 2.1490964889526367,
18373
+ "learning_rate": 1.3603331452747176e-05,
18374
+ "loss": 2.1914,
18375
+ "step": 235400
18376
+ },
18377
+ {
18378
+ "epoch": 0.018140311120289798,
18379
+ "grad_norm": 2.0006728172302246,
18380
+ "learning_rate": 1.358362983230902e-05,
18381
+ "loss": 2.188,
18382
+ "step": 235500
18383
+ },
18384
+ {
18385
+ "epoch": 0.01841939282983272,
18386
+ "grad_norm": 2.1562349796295166,
18387
+ "learning_rate": 1.35639371659747e-05,
18388
+ "loss": 2.1896,
18389
+ "step": 235600
18390
+ },
18391
+ {
18392
+ "epoch": 0.018698474539375638,
18393
+ "grad_norm": 2.129549980163574,
18394
+ "learning_rate": 1.354425346918961e-05,
18395
+ "loss": 2.1935,
18396
+ "step": 235700
18397
+ },
18398
+ {
18399
+ "epoch": 0.01897755624891856,
18400
+ "grad_norm": 2.2000486850738525,
18401
+ "learning_rate": 1.3524578757392103e-05,
18402
+ "loss": 2.1936,
18403
+ "step": 235800
18404
+ },
18405
+ {
18406
+ "epoch": 0.019256637958461477,
18407
+ "grad_norm": 2.061960220336914,
18408
+ "learning_rate": 1.3504913046013456e-05,
18409
+ "loss": 2.1902,
18410
+ "step": 235900
18411
+ },
18412
+ {
18413
+ "epoch": 0.0195357196680044,
18414
+ "grad_norm": 2.2487034797668457,
18415
+ "learning_rate": 1.3485256350477931e-05,
18416
+ "loss": 2.1836,
18417
+ "step": 236000
18418
+ },
18419
+ {
18420
+ "epoch": 0.0195357196680044,
18421
+ "eval_loss": 2.1369080543518066,
18422
+ "eval_runtime": 51.9173,
18423
+ "eval_samples_per_second": 196.351,
18424
+ "eval_steps_per_second": 1.541,
18425
+ "step": 236000
18426
+ },
18427
+ {
18428
+ "epoch": 0.01981480137754732,
18429
+ "grad_norm": 2.135773181915283,
18430
+ "learning_rate": 1.3465608686202672e-05,
18431
+ "loss": 2.1847,
18432
+ "step": 236100
18433
+ },
18434
+ {
18435
+ "epoch": 0.020093883087090238,
18436
+ "grad_norm": 2.2150774002075195,
18437
+ "learning_rate": 1.3445970068597774e-05,
18438
+ "loss": 2.193,
18439
+ "step": 236200
18440
+ },
18441
+ {
18442
+ "epoch": 0.02037296479663316,
18443
+ "grad_norm": 2.102975368499756,
18444
+ "learning_rate": 1.342634051306624e-05,
18445
+ "loss": 2.1916,
18446
+ "step": 236300
18447
+ },
18448
+ {
18449
+ "epoch": 0.020652046506176077,
18450
+ "grad_norm": 2.2181150913238525,
18451
+ "learning_rate": 1.3406720035003928e-05,
18452
+ "loss": 2.1875,
18453
+ "step": 236400
18454
+ },
18455
+ {
18456
+ "epoch": 0.020931128215719,
18457
+ "grad_norm": 2.1293530464172363,
18458
+ "learning_rate": 1.3387108649799607e-05,
18459
+ "loss": 2.1907,
18460
+ "step": 236500
18461
+ },
18462
+ {
18463
+ "epoch": 0.02121020992526192,
18464
+ "grad_norm": 2.229583501815796,
18465
+ "learning_rate": 1.3367506372834915e-05,
18466
+ "loss": 2.1913,
18467
+ "step": 236600
18468
+ },
18469
+ {
18470
+ "epoch": 0.021489291634804838,
18471
+ "grad_norm": 2.1583094596862793,
18472
+ "learning_rate": 1.3347913219484336e-05,
18473
+ "loss": 2.1895,
18474
+ "step": 236700
18475
+ },
18476
+ {
18477
+ "epoch": 0.02176837334434776,
18478
+ "grad_norm": 2.043151617050171,
18479
+ "learning_rate": 1.3328329205115191e-05,
18480
+ "loss": 2.1967,
18481
+ "step": 236800
18482
+ },
18483
+ {
18484
+ "epoch": 0.022047455053890677,
18485
+ "grad_norm": 2.052990436553955,
18486
+ "learning_rate": 1.3308754345087646e-05,
18487
+ "loss": 2.1919,
18488
+ "step": 236900
18489
+ },
18490
+ {
18491
+ "epoch": 0.0223265367634336,
18492
+ "grad_norm": 2.242903232574463,
18493
+ "learning_rate": 1.3289188654754686e-05,
18494
+ "loss": 2.1793,
18495
+ "step": 237000
18496
+ },
18497
+ {
18498
+ "epoch": 0.0223265367634336,
18499
+ "eval_loss": 2.135045051574707,
18500
+ "eval_runtime": 51.9719,
18501
+ "eval_samples_per_second": 196.144,
18502
+ "eval_steps_per_second": 1.539,
18503
+ "step": 237000
18504
+ },
18505
+ {
18506
+ "epoch": 0.022605618472976517,
18507
+ "grad_norm": 2.035278081893921,
18508
+ "learning_rate": 1.3269632149462111e-05,
18509
+ "loss": 2.1832,
18510
+ "step": 237100
18511
+ },
18512
+ {
18513
+ "epoch": 0.022884700182519438,
18514
+ "grad_norm": 2.1866884231567383,
18515
+ "learning_rate": 1.3250084844548488e-05,
18516
+ "loss": 2.2013,
18517
+ "step": 237200
18518
+ },
18519
+ {
18520
+ "epoch": 0.02316378189206236,
18521
+ "grad_norm": 2.1690127849578857,
18522
+ "learning_rate": 1.3230546755345202e-05,
18523
+ "loss": 2.1791,
18524
+ "step": 237300
18525
+ },
18526
+ {
18527
+ "epoch": 0.023442863601605277,
18528
+ "grad_norm": 2.116481304168701,
18529
+ "learning_rate": 1.3211017897176384e-05,
18530
+ "loss": 2.1849,
18531
+ "step": 237400
18532
+ },
18533
+ {
18534
+ "epoch": 0.0237219453111482,
18535
+ "grad_norm": 2.1610612869262695,
18536
+ "learning_rate": 1.3191498285358939e-05,
18537
+ "loss": 2.18,
18538
+ "step": 237500
18539
+ },
18540
+ {
18541
+ "epoch": 0.024001027020691117,
18542
+ "grad_norm": 2.0972769260406494,
18543
+ "learning_rate": 1.317198793520253e-05,
18544
+ "loss": 2.1791,
18545
+ "step": 237600
18546
+ },
18547
+ {
18548
+ "epoch": 0.024280108730234038,
18549
+ "grad_norm": 2.1171815395355225,
18550
+ "learning_rate": 1.3152486862009521e-05,
18551
+ "loss": 2.1865,
18552
+ "step": 237700
18553
+ },
18554
+ {
18555
+ "epoch": 0.02455919043977696,
18556
+ "grad_norm": 2.1703948974609375,
18557
+ "learning_rate": 1.3132995081075038e-05,
18558
+ "loss": 2.1841,
18559
+ "step": 237800
18560
+ },
18561
+ {
18562
+ "epoch": 0.024838272149319877,
18563
+ "grad_norm": 2.2092208862304688,
18564
+ "learning_rate": 1.3113512607686895e-05,
18565
+ "loss": 2.1961,
18566
+ "step": 237900
18567
+ },
18568
+ {
18569
+ "epoch": 0.0251173538588628,
18570
+ "grad_norm": 2.2685964107513428,
18571
+ "learning_rate": 1.3094039457125623e-05,
18572
+ "loss": 2.1812,
18573
+ "step": 238000
18574
+ },
18575
+ {
18576
+ "epoch": 0.0251173538588628,
18577
+ "eval_loss": 2.133305549621582,
18578
+ "eval_runtime": 51.8888,
18579
+ "eval_samples_per_second": 196.459,
18580
+ "eval_steps_per_second": 1.542,
18581
+ "step": 238000
18582
+ },
18583
+ {
18584
+ "epoch": 0.025396435568405717,
18585
+ "grad_norm": 2.251767873764038,
18586
+ "learning_rate": 1.307457564466442e-05,
18587
+ "loss": 2.1865,
18588
+ "step": 238100
18589
+ },
18590
+ {
18591
+ "epoch": 0.025675517277948638,
18592
+ "grad_norm": 2.094947576522827,
18593
+ "learning_rate": 1.3055121185569171e-05,
18594
+ "loss": 2.1822,
18595
+ "step": 238200
18596
+ },
18597
+ {
18598
+ "epoch": 0.025954598987491556,
18599
+ "grad_norm": 2.155735731124878,
18600
+ "learning_rate": 1.3035676095098434e-05,
18601
+ "loss": 2.1836,
18602
+ "step": 238300
18603
+ },
18604
+ {
18605
+ "epoch": 0.026233680697034478,
18606
+ "grad_norm": 2.0563013553619385,
18607
+ "learning_rate": 1.3016240388503415e-05,
18608
+ "loss": 2.1685,
18609
+ "step": 238400
18610
+ },
18611
+ {
18612
+ "epoch": 0.0265127624065774,
18613
+ "grad_norm": 2.171740770339966,
18614
+ "learning_rate": 1.2996814081027936e-05,
18615
+ "loss": 2.1751,
18616
+ "step": 238500
18617
+ },
18618
+ {
18619
+ "epoch": 0.026791844116120317,
18620
+ "grad_norm": 2.242612361907959,
18621
+ "learning_rate": 1.2977397187908492e-05,
18622
+ "loss": 2.182,
18623
+ "step": 238600
18624
+ },
18625
+ {
18626
+ "epoch": 0.02707092582566324,
18627
+ "grad_norm": 2.174983263015747,
18628
+ "learning_rate": 1.2957989724374137e-05,
18629
+ "loss": 2.1864,
18630
+ "step": 238700
18631
+ },
18632
+ {
18633
+ "epoch": 0.027350007535206156,
18634
+ "grad_norm": 2.335228204727173,
18635
+ "learning_rate": 1.2938591705646591e-05,
18636
+ "loss": 2.1797,
18637
+ "step": 238800
18638
+ },
18639
+ {
18640
+ "epoch": 0.027629089244749078,
18641
+ "grad_norm": 2.131338357925415,
18642
+ "learning_rate": 1.2919203146940113e-05,
18643
+ "loss": 2.1832,
18644
+ "step": 238900
18645
+ },
18646
+ {
18647
+ "epoch": 0.027908170954292,
18648
+ "grad_norm": 2.1694304943084717,
18649
+ "learning_rate": 1.2899824063461574e-05,
18650
+ "loss": 2.1738,
18651
+ "step": 239000
18652
+ },
18653
+ {
18654
+ "epoch": 0.027908170954292,
18655
+ "eval_loss": 2.1172709465026855,
18656
+ "eval_runtime": 52.0193,
18657
+ "eval_samples_per_second": 195.966,
18658
+ "eval_steps_per_second": 1.538,
18659
+ "step": 239000
18660
+ },
18661
+ {
18662
+ "epoch": 0.028187252663834917,
18663
+ "grad_norm": 2.2490172386169434,
18664
+ "learning_rate": 1.2880454470410405e-05,
18665
+ "loss": 2.1752,
18666
+ "step": 239100
18667
+ },
18668
+ {
18669
+ "epoch": 0.02846633437337784,
18670
+ "grad_norm": 2.0559821128845215,
18671
+ "learning_rate": 1.2861094382978603e-05,
18672
+ "loss": 2.1812,
18673
+ "step": 239200
18674
+ },
18675
+ {
18676
+ "epoch": 0.028745416082920756,
18677
+ "grad_norm": 2.3845577239990234,
18678
+ "learning_rate": 1.284174381635068e-05,
18679
+ "loss": 2.1728,
18680
+ "step": 239300
18681
+ },
18682
+ {
18683
+ "epoch": 0.029024497792463678,
18684
+ "grad_norm": 2.1738946437835693,
18685
+ "learning_rate": 1.2822402785703708e-05,
18686
+ "loss": 2.1659,
18687
+ "step": 239400
18688
+ },
18689
+ {
18690
+ "epoch": 0.0293035795020066,
18691
+ "grad_norm": 2.2159125804901123,
18692
+ "learning_rate": 1.2803071306207276e-05,
18693
+ "loss": 2.1672,
18694
+ "step": 239500
18695
+ },
18696
+ {
18697
+ "epoch": 0.029582661211549517,
18698
+ "grad_norm": 2.1570019721984863,
18699
+ "learning_rate": 1.2783749393023486e-05,
18700
+ "loss": 2.1808,
18701
+ "step": 239600
18702
+ },
18703
+ {
18704
+ "epoch": 0.02986174292109244,
18705
+ "grad_norm": 2.138986110687256,
18706
+ "learning_rate": 1.2764437061306909e-05,
18707
+ "loss": 2.1742,
18708
+ "step": 239700
18709
+ },
18710
+ {
18711
+ "epoch": 0.030140824630635357,
18712
+ "grad_norm": 2.287302017211914,
18713
+ "learning_rate": 1.2745134326204638e-05,
18714
+ "loss": 2.1791,
18715
+ "step": 239800
18716
+ },
18717
+ {
18718
+ "epoch": 0.030419906340178278,
18719
+ "grad_norm": 2.1529247760772705,
18720
+ "learning_rate": 1.2725841202856203e-05,
18721
+ "loss": 2.1742,
18722
+ "step": 239900
18723
+ },
18724
+ {
18725
+ "epoch": 0.030698988049721196,
18726
+ "grad_norm": 2.148305892944336,
18727
+ "learning_rate": 1.270655770639364e-05,
18728
+ "loss": 2.1799,
18729
+ "step": 240000
18730
+ },
18731
+ {
18732
+ "epoch": 0.030698988049721196,
18733
+ "eval_loss": 2.1318085193634033,
18734
+ "eval_runtime": 51.882,
18735
+ "eval_samples_per_second": 196.484,
18736
+ "eval_steps_per_second": 1.542,
18737
+ "step": 240000
18738
+ },
18739
+ {
18740
+ "epoch": 0.030978069759264117,
18741
+ "grad_norm": 2.213149070739746,
18742
+ "learning_rate": 1.2687283851941381e-05,
18743
+ "loss": 2.1723,
18744
+ "step": 240100
18745
+ },
18746
+ {
18747
+ "epoch": 0.03125715146880704,
18748
+ "grad_norm": 2.2811832427978516,
18749
+ "learning_rate": 1.2668019654616337e-05,
18750
+ "loss": 2.176,
18751
+ "step": 240200
18752
+ },
18753
+ {
18754
+ "epoch": 0.03153623317834996,
18755
+ "grad_norm": 2.0813376903533936,
18756
+ "learning_rate": 1.2648765129527829e-05,
18757
+ "loss": 2.1701,
18758
+ "step": 240300
18759
+ },
18760
+ {
18761
+ "epoch": 0.031815314887892875,
18762
+ "grad_norm": 2.146592617034912,
18763
+ "learning_rate": 1.2629520291777597e-05,
18764
+ "loss": 2.1738,
18765
+ "step": 240400
18766
+ },
18767
+ {
18768
+ "epoch": 0.0320943965974358,
18769
+ "grad_norm": 2.2072625160217285,
18770
+ "learning_rate": 1.2610285156459783e-05,
18771
+ "loss": 2.1762,
18772
+ "step": 240500
18773
+ },
18774
+ {
18775
+ "epoch": 0.03237347830697872,
18776
+ "grad_norm": 2.159395694732666,
18777
+ "learning_rate": 1.2591059738660904e-05,
18778
+ "loss": 2.1626,
18779
+ "step": 240600
18780
+ },
18781
+ {
18782
+ "epoch": 0.032652560016521635,
18783
+ "grad_norm": 2.3325672149658203,
18784
+ "learning_rate": 1.2571844053459875e-05,
18785
+ "loss": 2.169,
18786
+ "step": 240700
18787
+ },
18788
+ {
18789
+ "epoch": 0.03293164172606456,
18790
+ "grad_norm": 2.2384254932403564,
18791
+ "learning_rate": 1.2552638115927966e-05,
18792
+ "loss": 2.1681,
18793
+ "step": 240800
18794
+ },
18795
+ {
18796
+ "epoch": 0.03321072343560748,
18797
+ "grad_norm": 2.148493766784668,
18798
+ "learning_rate": 1.253344194112882e-05,
18799
+ "loss": 2.1672,
18800
+ "step": 240900
18801
+ },
18802
+ {
18803
+ "epoch": 0.033489805145150396,
18804
+ "grad_norm": 2.0037782192230225,
18805
+ "learning_rate": 1.2514255544118387e-05,
18806
+ "loss": 2.1678,
18807
+ "step": 241000
18808
+ },
18809
+ {
18810
+ "epoch": 0.033489805145150396,
18811
+ "eval_loss": 2.127737045288086,
18812
+ "eval_runtime": 51.9867,
18813
+ "eval_samples_per_second": 196.089,
18814
+ "eval_steps_per_second": 1.539,
18815
+ "step": 241000
18816
+ },
18817
+ {
18818
+ "epoch": 0.033768886854693314,
18819
+ "grad_norm": 2.1538326740264893,
18820
+ "learning_rate": 1.2495078939944987e-05,
18821
+ "loss": 2.1763,
18822
+ "step": 241100
18823
+ },
18824
+ {
18825
+ "epoch": 0.03404796856423624,
18826
+ "grad_norm": 2.2411739826202393,
18827
+ "learning_rate": 1.2475912143649224e-05,
18828
+ "loss": 2.1725,
18829
+ "step": 241200
18830
+ },
18831
+ {
18832
+ "epoch": 0.03432705027377916,
18833
+ "grad_norm": 2.161746025085449,
18834
+ "learning_rate": 1.2456755170264047e-05,
18835
+ "loss": 2.1658,
18836
+ "step": 241300
18837
+ },
18838
+ {
18839
+ "epoch": 0.034606131983322075,
18840
+ "grad_norm": 2.173823595046997,
18841
+ "learning_rate": 1.2437608034814663e-05,
18842
+ "loss": 2.1476,
18843
+ "step": 241400
18844
+ },
18845
+ {
18846
+ "epoch": 0.034885213692865,
18847
+ "grad_norm": 2.1510214805603027,
18848
+ "learning_rate": 1.2418470752318586e-05,
18849
+ "loss": 2.1408,
18850
+ "step": 241500
18851
+ },
18852
+ {
18853
+ "epoch": 0.03516429540240792,
18854
+ "grad_norm": 2.1306464672088623,
18855
+ "learning_rate": 1.2399343337785602e-05,
18856
+ "loss": 2.1489,
18857
+ "step": 241600
18858
+ },
18859
+ {
18860
+ "epoch": 0.035443377111950836,
18861
+ "grad_norm": 2.0896074771881104,
18862
+ "learning_rate": 1.2380225806217757e-05,
18863
+ "loss": 2.1384,
18864
+ "step": 241700
18865
+ },
18866
+ {
18867
+ "epoch": 0.035722458821493754,
18868
+ "grad_norm": 2.154553174972534,
18869
+ "learning_rate": 1.2361118172609326e-05,
18870
+ "loss": 2.1427,
18871
+ "step": 241800
18872
+ },
18873
+ {
18874
+ "epoch": 0.03600154053103668,
18875
+ "grad_norm": 2.235761880874634,
18876
+ "learning_rate": 1.2342020451946843e-05,
18877
+ "loss": 2.1545,
18878
+ "step": 241900
18879
+ },
18880
+ {
18881
+ "epoch": 0.036280622240579596,
18882
+ "grad_norm": 2.165769100189209,
18883
+ "learning_rate": 1.2322932659209057e-05,
18884
+ "loss": 2.1387,
18885
+ "step": 242000
18886
+ },
18887
+ {
18888
+ "epoch": 0.036280622240579596,
18889
+ "eval_loss": 2.122012138366699,
18890
+ "eval_runtime": 52.0197,
18891
+ "eval_samples_per_second": 195.964,
18892
+ "eval_steps_per_second": 1.538,
18893
+ "step": 242000
18894
+ },
18895
+ {
18896
+ "epoch": 0.00027908170954291995,
18897
+ "grad_norm": 2.312960624694824,
18898
+ "learning_rate": 1.2303854809366949e-05,
18899
+ "loss": 2.1314,
18900
+ "step": 242100
18901
+ },
18902
+ {
18903
+ "epoch": 0.0005581634190858399,
18904
+ "grad_norm": 2.210526943206787,
18905
+ "learning_rate": 1.2284786917383661e-05,
18906
+ "loss": 2.1418,
18907
+ "step": 242200
18908
+ },
18909
+ {
18910
+ "epoch": 0.0008372451286287599,
18911
+ "grad_norm": 2.0691890716552734,
18912
+ "learning_rate": 1.2265728998214562e-05,
18913
+ "loss": 2.1397,
18914
+ "step": 242300
18915
+ },
18916
+ {
18917
+ "epoch": 0.0011163268381716798,
18918
+ "grad_norm": 2.160215377807617,
18919
+ "learning_rate": 1.2246681066807195e-05,
18920
+ "loss": 2.136,
18921
+ "step": 242400
18922
+ },
18923
+ {
18924
+ "epoch": 0.0013954085477146,
18925
+ "grad_norm": 2.01823353767395,
18926
+ "learning_rate": 1.2227643138101242e-05,
18927
+ "loss": 2.136,
18928
+ "step": 242500
18929
+ },
18930
+ {
18931
+ "epoch": 0.0016744902572575198,
18932
+ "grad_norm": 2.073930501937866,
18933
+ "learning_rate": 1.2208615227028577e-05,
18934
+ "loss": 2.1447,
18935
+ "step": 242600
18936
+ },
18937
+ {
18938
+ "epoch": 0.00195357196680044,
18939
+ "grad_norm": 2.1365602016448975,
18940
+ "learning_rate": 1.2189597348513183e-05,
18941
+ "loss": 2.1365,
18942
+ "step": 242700
18943
+ },
18944
+ {
18945
+ "epoch": 0.0022326536763433596,
18946
+ "grad_norm": 2.1082065105438232,
18947
+ "learning_rate": 1.2170589517471193e-05,
18948
+ "loss": 2.1502,
18949
+ "step": 242800
18950
+ },
18951
+ {
18952
+ "epoch": 0.0025117353858862797,
18953
+ "grad_norm": 2.0763580799102783,
18954
+ "learning_rate": 1.215159174881087e-05,
18955
+ "loss": 2.1318,
18956
+ "step": 242900
18957
+ },
18958
+ {
18959
+ "epoch": 0.0027908170954292,
18960
+ "grad_norm": 2.2010183334350586,
18961
+ "learning_rate": 1.2132604057432551e-05,
18962
+ "loss": 2.14,
18963
+ "step": 243000
18964
+ },
18965
+ {
18966
+ "epoch": 0.0027908170954292,
18967
+ "eval_loss": 2.1260664463043213,
18968
+ "eval_runtime": 51.6429,
18969
+ "eval_samples_per_second": 197.394,
18970
+ "eval_steps_per_second": 1.549,
18971
+ "step": 243000
18972
+ },
18973
+ {
18974
+ "epoch": 0.00306989880497212,
18975
+ "grad_norm": 2.1431691646575928,
18976
+ "learning_rate": 1.21136264582287e-05,
18977
+ "loss": 2.1348,
18978
+ "step": 243100
18979
+ },
18980
+ {
18981
+ "epoch": 0.0033489805145150396,
18982
+ "grad_norm": 2.1176159381866455,
18983
+ "learning_rate": 1.2094658966083853e-05,
18984
+ "loss": 2.141,
18985
+ "step": 243200
18986
+ },
18987
+ {
18988
+ "epoch": 0.0036280622240579597,
18989
+ "grad_norm": 2.1691203117370605,
18990
+ "learning_rate": 1.207570159587463e-05,
18991
+ "loss": 2.1408,
18992
+ "step": 243300
18993
+ },
18994
+ {
18995
+ "epoch": 0.00390714393360088,
18996
+ "grad_norm": 2.2299644947052,
18997
+ "learning_rate": 1.2056754362469688e-05,
18998
+ "loss": 2.1231,
18999
+ "step": 243400
19000
+ },
19001
+ {
19002
+ "epoch": 0.0041862256431437995,
19003
+ "grad_norm": 2.0889995098114014,
19004
+ "learning_rate": 1.2037817280729755e-05,
19005
+ "loss": 2.1263,
19006
+ "step": 243500
19007
+ },
19008
+ {
19009
+ "epoch": 0.004465307352686719,
19010
+ "grad_norm": 2.0144128799438477,
19011
+ "learning_rate": 1.2018890365507587e-05,
19012
+ "loss": 2.1297,
19013
+ "step": 243600
19014
+ },
19015
+ {
19016
+ "epoch": 0.00474438906222964,
19017
+ "grad_norm": 2.1843581199645996,
19018
+ "learning_rate": 1.1999973631647984e-05,
19019
+ "loss": 2.1339,
19020
+ "step": 243700
19021
+ },
19022
+ {
19023
+ "epoch": 0.005023470771772559,
19024
+ "grad_norm": 2.199553966522217,
19025
+ "learning_rate": 1.1981067093987724e-05,
19026
+ "loss": 2.1246,
19027
+ "step": 243800
19028
+ },
19029
+ {
19030
+ "epoch": 0.00530255248131548,
19031
+ "grad_norm": 2.118969440460205,
19032
+ "learning_rate": 1.1962170767355633e-05,
19033
+ "loss": 2.1357,
19034
+ "step": 243900
19035
+ },
19036
+ {
19037
+ "epoch": 0.0055816341908584,
19038
+ "grad_norm": 2.2393405437469482,
19039
+ "learning_rate": 1.1943284666572479e-05,
19040
+ "loss": 2.1312,
19041
+ "step": 244000
19042
+ },
19043
+ {
19044
+ "epoch": 0.0055816341908584,
19045
+ "eval_loss": 2.127676486968994,
19046
+ "eval_runtime": 51.305,
19047
+ "eval_samples_per_second": 198.694,
19048
+ "eval_steps_per_second": 1.559,
19049
+ "step": 244000
19050
+ },
19051
+ {
19052
+ "epoch": 0.005860715900401319,
19053
+ "grad_norm": 2.1407392024993896,
19054
+ "learning_rate": 1.192440880645105e-05,
19055
+ "loss": 2.1227,
19056
+ "step": 244100
19057
+ },
19058
+ {
19059
+ "epoch": 0.00613979760994424,
19060
+ "grad_norm": 2.188948154449463,
19061
+ "learning_rate": 1.1905543201796097e-05,
19062
+ "loss": 2.1432,
19063
+ "step": 244200
19064
+ },
19065
+ {
19066
+ "epoch": 0.0064188793194871595,
19067
+ "grad_norm": 2.155181407928467,
19068
+ "learning_rate": 1.1886687867404295e-05,
19069
+ "loss": 2.1286,
19070
+ "step": 244300
19071
+ },
19072
+ {
19073
+ "epoch": 0.006697961029030079,
19074
+ "grad_norm": 2.2504708766937256,
19075
+ "learning_rate": 1.1867842818064304e-05,
19076
+ "loss": 2.132,
19077
+ "step": 244400
19078
+ },
19079
+ {
19080
+ "epoch": 0.006977042738573,
19081
+ "grad_norm": 2.1262989044189453,
19082
+ "learning_rate": 1.1849008068556692e-05,
19083
+ "loss": 2.1373,
19084
+ "step": 244500
19085
+ },
19086
+ {
19087
+ "epoch": 0.0072561244481159195,
19088
+ "grad_norm": 2.098700523376465,
19089
+ "learning_rate": 1.1830183633653971e-05,
19090
+ "loss": 2.1201,
19091
+ "step": 244600
19092
+ },
19093
+ {
19094
+ "epoch": 0.007535206157658839,
19095
+ "grad_norm": 2.1884560585021973,
19096
+ "learning_rate": 1.181136952812053e-05,
19097
+ "loss": 2.1291,
19098
+ "step": 244700
19099
+ },
19100
+ {
19101
+ "epoch": 0.00781428786720176,
19102
+ "grad_norm": 2.124889850616455,
19103
+ "learning_rate": 1.1792565766712684e-05,
19104
+ "loss": 2.127,
19105
+ "step": 244800
19106
+ },
19107
+ {
19108
+ "epoch": 0.00809336957674468,
19109
+ "grad_norm": 2.2935962677001953,
19110
+ "learning_rate": 1.1773772364178626e-05,
19111
+ "loss": 2.1384,
19112
+ "step": 244900
19113
+ },
19114
+ {
19115
+ "epoch": 0.008372451286287599,
19116
+ "grad_norm": 2.247445821762085,
19117
+ "learning_rate": 1.1754989335258432e-05,
19118
+ "loss": 2.1298,
19119
+ "step": 245000
19120
+ },
19121
+ {
19122
+ "epoch": 0.008372451286287599,
19123
+ "eval_loss": 2.123136520385742,
19124
+ "eval_runtime": 51.1588,
19125
+ "eval_samples_per_second": 199.262,
19126
+ "eval_steps_per_second": 1.564,
19127
+ "step": 245000
19128
+ },
19129
+ {
19130
+ "epoch": 0.008651532995830519,
19131
+ "grad_norm": 2.1179420948028564,
19132
+ "learning_rate": 1.1736216694684019e-05,
19133
+ "loss": 2.1381,
19134
+ "step": 245100
19135
+ },
19136
+ {
19137
+ "epoch": 0.008930614705373438,
19138
+ "grad_norm": 2.1974310874938965,
19139
+ "learning_rate": 1.1717454457179186e-05,
19140
+ "loss": 2.1324,
19141
+ "step": 245200
19142
+ },
19143
+ {
19144
+ "epoch": 0.00920969641491636,
19145
+ "grad_norm": 2.1987204551696777,
19146
+ "learning_rate": 1.1698702637459543e-05,
19147
+ "loss": 2.1341,
19148
+ "step": 245300
19149
+ },
19150
+ {
19151
+ "epoch": 0.00948877812445928,
19152
+ "grad_norm": 2.1850931644439697,
19153
+ "learning_rate": 1.167996125023256e-05,
19154
+ "loss": 2.1222,
19155
+ "step": 245400
19156
+ },
19157
+ {
19158
+ "epoch": 0.0097678598340022,
19159
+ "grad_norm": 2.0711605548858643,
19160
+ "learning_rate": 1.1661230310197494e-05,
19161
+ "loss": 2.1305,
19162
+ "step": 245500
19163
+ },
19164
+ {
19165
+ "epoch": 0.010046941543545119,
19166
+ "grad_norm": 2.1303722858428955,
19167
+ "learning_rate": 1.1642509832045428e-05,
19168
+ "loss": 2.1223,
19169
+ "step": 245600
19170
+ },
19171
+ {
19172
+ "epoch": 0.010326023253088039,
19173
+ "grad_norm": 2.0326571464538574,
19174
+ "learning_rate": 1.1623799830459236e-05,
19175
+ "loss": 2.1458,
19176
+ "step": 245700
19177
+ },
19178
+ {
19179
+ "epoch": 0.01060510496263096,
19180
+ "grad_norm": 2.179598331451416,
19181
+ "learning_rate": 1.1605100320113585e-05,
19182
+ "loss": 2.1191,
19183
+ "step": 245800
19184
+ },
19185
+ {
19186
+ "epoch": 0.01088418667217388,
19187
+ "grad_norm": 2.1782078742980957,
19188
+ "learning_rate": 1.158641131567488e-05,
19189
+ "loss": 2.1253,
19190
+ "step": 245900
19191
+ },
19192
+ {
19193
+ "epoch": 0.0111632683817168,
19194
+ "grad_norm": 2.117875099182129,
19195
+ "learning_rate": 1.1567732831801316e-05,
19196
+ "loss": 2.1177,
19197
+ "step": 246000
19198
+ },
19199
+ {
19200
+ "epoch": 0.0111632683817168,
19201
+ "eval_loss": 2.1340432167053223,
19202
+ "eval_runtime": 51.3485,
19203
+ "eval_samples_per_second": 198.526,
19204
+ "eval_steps_per_second": 1.558,
19205
+ "step": 246000
19206
+ },
19207
+ {
19208
+ "epoch": 0.011442350091259719,
19209
+ "grad_norm": 2.203430414199829,
19210
+ "learning_rate": 1.1549064883142832e-05,
19211
+ "loss": 2.1243,
19212
+ "step": 246100
19213
+ },
19214
+ {
19215
+ "epoch": 0.011721431800802639,
19216
+ "grad_norm": 2.1262495517730713,
19217
+ "learning_rate": 1.1530407484341108e-05,
19218
+ "loss": 2.1271,
19219
+ "step": 246200
19220
+ },
19221
+ {
19222
+ "epoch": 0.012000513510345558,
19223
+ "grad_norm": 2.0860841274261475,
19224
+ "learning_rate": 1.151176065002952e-05,
19225
+ "loss": 2.1139,
19226
+ "step": 246300
19227
+ },
19228
+ {
19229
+ "epoch": 0.01227959521988848,
19230
+ "grad_norm": 2.167961835861206,
19231
+ "learning_rate": 1.1493124394833196e-05,
19232
+ "loss": 2.1299,
19233
+ "step": 246400
19234
+ },
19235
+ {
19236
+ "epoch": 0.0125586769294314,
19237
+ "grad_norm": 2.213210344314575,
19238
+ "learning_rate": 1.1474498733368957e-05,
19239
+ "loss": 2.1187,
19240
+ "step": 246500
19241
+ },
19242
+ {
19243
+ "epoch": 0.012837758638974319,
19244
+ "grad_norm": 2.1387743949890137,
19245
+ "learning_rate": 1.1455883680245285e-05,
19246
+ "loss": 2.1185,
19247
+ "step": 246600
19248
+ },
19249
+ {
19250
+ "epoch": 0.013116840348517239,
19251
+ "grad_norm": 2.21766996383667,
19252
+ "learning_rate": 1.143727925006239e-05,
19253
+ "loss": 2.1257,
19254
+ "step": 246700
19255
+ },
19256
+ {
19257
+ "epoch": 0.013395922058060158,
19258
+ "grad_norm": 2.1220784187316895,
19259
+ "learning_rate": 1.1418685457412103e-05,
19260
+ "loss": 2.1358,
19261
+ "step": 246800
19262
+ },
19263
+ {
19264
+ "epoch": 0.013675003767603078,
19265
+ "grad_norm": 2.175532341003418,
19266
+ "learning_rate": 1.1400102316877948e-05,
19267
+ "loss": 2.1127,
19268
+ "step": 246900
19269
+ },
19270
+ {
19271
+ "epoch": 0.013954085477146,
19272
+ "grad_norm": 2.1164164543151855,
19273
+ "learning_rate": 1.1381529843035077e-05,
19274
+ "loss": 2.1382,
19275
+ "step": 247000
19276
+ },
19277
+ {
19278
+ "epoch": 0.013954085477146,
19279
+ "eval_loss": 2.1271872520446777,
19280
+ "eval_runtime": 51.251,
19281
+ "eval_samples_per_second": 198.903,
19282
+ "eval_steps_per_second": 1.561,
19283
+ "step": 247000
19284
+ },
19285
+ {
19286
+ "epoch": 0.01423316718668892,
19287
+ "grad_norm": 2.076045036315918,
19288
+ "learning_rate": 1.1362968050450287e-05,
19289
+ "loss": 2.1156,
19290
+ "step": 247100
19291
+ },
19292
+ {
19293
+ "epoch": 0.014512248896231839,
19294
+ "grad_norm": 2.122551679611206,
19295
+ "learning_rate": 1.1344416953681974e-05,
19296
+ "loss": 2.1248,
19297
+ "step": 247200
19298
+ },
19299
+ {
19300
+ "epoch": 0.014791330605774759,
19301
+ "grad_norm": 2.206083059310913,
19302
+ "learning_rate": 1.132587656728017e-05,
19303
+ "loss": 2.1265,
19304
+ "step": 247300
19305
+ },
19306
+ {
19307
+ "epoch": 0.015070412315317678,
19308
+ "grad_norm": 2.175978660583496,
19309
+ "learning_rate": 1.1307346905786498e-05,
19310
+ "loss": 2.1175,
19311
+ "step": 247400
19312
+ },
19313
+ {
19314
+ "epoch": 0.015349494024860598,
19315
+ "grad_norm": 2.058006525039673,
19316
+ "learning_rate": 1.1288827983734173e-05,
19317
+ "loss": 2.1116,
19318
+ "step": 247500
19319
+ },
19320
+ {
19321
+ "epoch": 0.01562857573440352,
19322
+ "grad_norm": 2.055121660232544,
19323
+ "learning_rate": 1.1270319815647972e-05,
19324
+ "loss": 2.1198,
19325
+ "step": 247600
19326
+ },
19327
+ {
19328
+ "epoch": 0.015907657443946437,
19329
+ "grad_norm": 2.154327392578125,
19330
+ "learning_rate": 1.1251822416044252e-05,
19331
+ "loss": 2.131,
19332
+ "step": 247700
19333
+ },
19334
+ {
19335
+ "epoch": 0.01618673915348936,
19336
+ "grad_norm": 2.2314298152923584,
19337
+ "learning_rate": 1.1233335799430933e-05,
19338
+ "loss": 2.119,
19339
+ "step": 247800
19340
+ },
19341
+ {
19342
+ "epoch": 0.01646582086303228,
19343
+ "grad_norm": 2.142615795135498,
19344
+ "learning_rate": 1.1214859980307448e-05,
19345
+ "loss": 2.1223,
19346
+ "step": 247900
19347
+ },
19348
+ {
19349
+ "epoch": 0.016744902572575198,
19350
+ "grad_norm": 2.1668100357055664,
19351
+ "learning_rate": 1.1196394973164778e-05,
19352
+ "loss": 2.1211,
19353
+ "step": 248000
19354
+ },
19355
+ {
19356
+ "epoch": 0.016744902572575198,
19357
+ "eval_loss": 2.1204113960266113,
19358
+ "eval_runtime": 51.2734,
19359
+ "eval_samples_per_second": 198.816,
19360
+ "eval_steps_per_second": 1.56,
19361
+ "step": 248000
19362
+ },
19363
+ {
19364
+ "epoch": 0.01702398428211812,
19365
+ "grad_norm": 2.069814443588257,
19366
+ "learning_rate": 1.1177940792485428e-05,
19367
+ "loss": 2.1099,
19368
+ "step": 248100
19369
+ },
19370
+ {
19371
+ "epoch": 0.017303065991661037,
19372
+ "grad_norm": 2.072711229324341,
19373
+ "learning_rate": 1.1159497452743409e-05,
19374
+ "loss": 2.1382,
19375
+ "step": 248200
19376
+ },
19377
+ {
19378
+ "epoch": 0.01758214770120396,
19379
+ "grad_norm": 2.114495038986206,
19380
+ "learning_rate": 1.1141064968404236e-05,
19381
+ "loss": 2.243,
19382
+ "step": 248300
19383
+ },
19384
+ {
19385
+ "epoch": 0.017861229410746877,
19386
+ "grad_norm": 1.239190936088562,
19387
+ "learning_rate": 1.112264335392488e-05,
19388
+ "loss": 2.2247,
19389
+ "step": 248400
19390
+ },
19391
+ {
19392
+ "epoch": 0.018140311120289798,
19393
+ "grad_norm": 2.097609519958496,
19394
+ "learning_rate": 1.1104232623753824e-05,
19395
+ "loss": 2.214,
19396
+ "step": 248500
19397
+ },
19398
+ {
19399
+ "epoch": 0.01841939282983272,
19400
+ "grad_norm": 1.0227768421173096,
19401
+ "learning_rate": 1.1085832792330996e-05,
19402
+ "loss": 2.2084,
19403
+ "step": 248600
19404
+ },
19405
+ {
19406
+ "epoch": 0.018698474539375638,
19407
+ "grad_norm": 2.114027738571167,
19408
+ "learning_rate": 1.1067443874087785e-05,
19409
+ "loss": 2.1914,
19410
+ "step": 248700
19411
+ },
19412
+ {
19413
+ "epoch": 0.01897755624891856,
19414
+ "grad_norm": 1.0278912782669067,
19415
+ "learning_rate": 1.1049065883446999e-05,
19416
+ "loss": 2.2196,
19417
+ "step": 248800
19418
+ },
19419
+ {
19420
+ "epoch": 0.019256637958461477,
19421
+ "grad_norm": 2.1365833282470703,
19422
+ "learning_rate": 1.1030698834822895e-05,
19423
+ "loss": 2.2016,
19424
+ "step": 248900
19425
+ },
19426
+ {
19427
+ "epoch": 0.0195357196680044,
19428
+ "grad_norm": 2.083474636077881,
19429
+ "learning_rate": 1.1012342742621145e-05,
19430
+ "loss": 2.2319,
19431
+ "step": 249000
19432
+ },
19433
+ {
19434
+ "epoch": 0.0195357196680044,
19435
+ "eval_loss": 2.1132681369781494,
19436
+ "eval_runtime": 51.5743,
19437
+ "eval_samples_per_second": 197.657,
19438
+ "eval_steps_per_second": 1.551,
19439
+ "step": 249000
19440
+ },
19441
+ {
19442
+ "epoch": 0.01981480137754732,
19443
+ "grad_norm": 1.1456485986709595,
19444
+ "learning_rate": 1.0993997621238836e-05,
19445
+ "loss": 2.1891,
19446
+ "step": 249100
19447
+ },
19448
+ {
19449
+ "epoch": 0.020093883087090238,
19450
+ "grad_norm": 2.1170260906219482,
19451
+ "learning_rate": 1.097566348506443e-05,
19452
+ "loss": 2.1947,
19453
+ "step": 249200
19454
+ },
19455
+ {
19456
+ "epoch": 0.02037296479663316,
19457
+ "grad_norm": 2.1078946590423584,
19458
+ "learning_rate": 1.0957340348477771e-05,
19459
+ "loss": 2.183,
19460
+ "step": 249300
19461
+ },
19462
+ {
19463
+ "epoch": 0.020652046506176077,
19464
+ "grad_norm": 1.6463110446929932,
19465
+ "learning_rate": 1.09390282258501e-05,
19466
+ "loss": 2.1768,
19467
+ "step": 249400
19468
+ },
19469
+ {
19470
+ "epoch": 0.020931128215719,
19471
+ "grad_norm": 1.4292881488800049,
19472
+ "learning_rate": 1.092072713154402e-05,
19473
+ "loss": 2.1944,
19474
+ "step": 249500
19475
+ },
19476
+ {
19477
+ "epoch": 0.02121020992526192,
19478
+ "grad_norm": 2.064131259918213,
19479
+ "learning_rate": 1.0902437079913447e-05,
19480
+ "loss": 2.1829,
19481
+ "step": 249600
19482
+ },
19483
+ {
19484
+ "epoch": 0.021489291634804838,
19485
+ "grad_norm": 2.1172714233398438,
19486
+ "learning_rate": 1.088415808530367e-05,
19487
+ "loss": 2.1588,
19488
+ "step": 249700
19489
+ },
19490
+ {
19491
+ "epoch": 0.02176837334434776,
19492
+ "grad_norm": 2.112825393676758,
19493
+ "learning_rate": 1.08658901620513e-05,
19494
+ "loss": 2.1718,
19495
+ "step": 249800
19496
+ },
19497
+ {
19498
+ "epoch": 0.022047455053890677,
19499
+ "grad_norm": 2.0806221961975098,
19500
+ "learning_rate": 1.0847633324484261e-05,
19501
+ "loss": 2.1993,
19502
+ "step": 249900
19503
+ },
19504
+ {
19505
+ "epoch": 0.0223265367634336,
19506
+ "grad_norm": 1.1664646863937378,
19507
+ "learning_rate": 1.0829387586921785e-05,
19508
+ "loss": 2.185,
19509
+ "step": 250000
19510
+ },
19511
+ {
19512
+ "epoch": 0.0223265367634336,
19513
+ "eval_loss": 2.100121021270752,
19514
+ "eval_runtime": 51.6801,
19515
+ "eval_samples_per_second": 197.252,
19516
+ "eval_steps_per_second": 1.548,
19517
+ "step": 250000
19518
+ },
19519
+ {
19520
+ "epoch": 0.022605618472976517,
19521
+ "grad_norm": 2.2250313758850098,
19522
+ "learning_rate": 1.0811152963674384e-05,
19523
+ "loss": 2.1625,
19524
+ "step": 250100
19525
+ },
19526
+ {
19527
+ "epoch": 0.022884700182519438,
19528
+ "grad_norm": 2.1046063899993896,
19529
+ "learning_rate": 1.079292946904387e-05,
19530
+ "loss": 2.1639,
19531
+ "step": 250200
19532
+ },
19533
+ {
19534
+ "epoch": 0.02316378189206236,
19535
+ "grad_norm": 1.1374410390853882,
19536
+ "learning_rate": 1.077471711732333e-05,
19537
+ "loss": 2.1794,
19538
+ "step": 250300
19539
+ },
19540
+ {
19541
+ "epoch": 0.023442863601605277,
19542
+ "grad_norm": 1.7234727144241333,
19543
+ "learning_rate": 1.075651592279708e-05,
19544
+ "loss": 2.1634,
19545
+ "step": 250400
19546
+ },
19547
+ {
19548
+ "epoch": 0.0237219453111482,
19549
+ "grad_norm": 2.190262794494629,
19550
+ "learning_rate": 1.0738325899740733e-05,
19551
+ "loss": 2.1585,
19552
+ "step": 250500
19553
+ },
19554
+ {
19555
+ "epoch": 0.024001027020691117,
19556
+ "grad_norm": 2.0651297569274902,
19557
+ "learning_rate": 1.072014706242109e-05,
19558
+ "loss": 2.1685,
19559
+ "step": 250600
19560
+ },
19561
+ {
19562
+ "epoch": 0.024280108730234038,
19563
+ "grad_norm": 2.1923389434814453,
19564
+ "learning_rate": 1.0701979425096212e-05,
19565
+ "loss": 2.1704,
19566
+ "step": 250700
19567
+ },
19568
+ {
19569
+ "epoch": 0.02455919043977696,
19570
+ "grad_norm": 1.0421335697174072,
19571
+ "learning_rate": 1.0683823002015378e-05,
19572
+ "loss": 2.156,
19573
+ "step": 250800
19574
+ },
19575
+ {
19576
+ "epoch": 0.024838272149319877,
19577
+ "grad_norm": 0.9804383516311646,
19578
+ "learning_rate": 1.0665677807419038e-05,
19579
+ "loss": 2.1738,
19580
+ "step": 250900
19581
+ },
19582
+ {
19583
+ "epoch": 0.0251173538588628,
19584
+ "grad_norm": 2.0791897773742676,
19585
+ "learning_rate": 1.0647543855538871e-05,
19586
+ "loss": 2.1742,
19587
+ "step": 251000
19588
+ },
19589
+ {
19590
+ "epoch": 0.0251173538588628,
19591
+ "eval_loss": 2.099701404571533,
19592
+ "eval_runtime": 51.5047,
19593
+ "eval_samples_per_second": 197.924,
19594
+ "eval_steps_per_second": 1.553,
19595
+ "step": 251000
19596
+ },
19597
+ {
19598
+ "epoch": 0.025396435568405717,
19599
+ "grad_norm": 1.1516467332839966,
19600
+ "learning_rate": 1.0629421160597724e-05,
19601
+ "loss": 2.153,
19602
+ "step": 251100
19603
+ },
19604
+ {
19605
+ "epoch": 0.025675517277948638,
19606
+ "grad_norm": 2.0791218280792236,
19607
+ "learning_rate": 1.0611309736809618e-05,
19608
+ "loss": 2.1469,
19609
+ "step": 251200
19610
+ },
19611
+ {
19612
+ "epoch": 0.025954598987491556,
19613
+ "grad_norm": 1.9377182722091675,
19614
+ "learning_rate": 1.0593209598379719e-05,
19615
+ "loss": 2.1601,
19616
+ "step": 251300
19617
+ },
19618
+ {
19619
+ "epoch": 0.026233680697034478,
19620
+ "grad_norm": 2.2188608646392822,
19621
+ "learning_rate": 1.0575120759504362e-05,
19622
+ "loss": 2.1604,
19623
+ "step": 251400
19624
+ },
19625
+ {
19626
+ "epoch": 0.0265127624065774,
19627
+ "grad_norm": 1.0223076343536377,
19628
+ "learning_rate": 1.0557043234371006e-05,
19629
+ "loss": 2.1496,
19630
+ "step": 251500
19631
+ },
19632
+ {
19633
+ "epoch": 0.026791844116120317,
19634
+ "grad_norm": 2.205202579498291,
19635
+ "learning_rate": 1.0538977037158254e-05,
19636
+ "loss": 2.1409,
19637
+ "step": 251600
19638
+ },
19639
+ {
19640
+ "epoch": 0.02707092582566324,
19641
+ "grad_norm": 2.187514543533325,
19642
+ "learning_rate": 1.0520922182035798e-05,
19643
+ "loss": 2.1522,
19644
+ "step": 251700
19645
+ },
19646
+ {
19647
+ "epoch": 0.027350007535206156,
19648
+ "grad_norm": 2.1280906200408936,
19649
+ "learning_rate": 1.0502878683164458e-05,
19650
+ "loss": 2.1501,
19651
+ "step": 251800
19652
+ },
19653
+ {
19654
+ "epoch": 0.027629089244749078,
19655
+ "grad_norm": 1.2866263389587402,
19656
+ "learning_rate": 1.0484846554696123e-05,
19657
+ "loss": 2.1556,
19658
+ "step": 251900
19659
+ },
19660
+ {
19661
+ "epoch": 0.027908170954292,
19662
+ "grad_norm": 1.111025333404541,
19663
+ "learning_rate": 1.0466825810773796e-05,
19664
+ "loss": 2.1422,
19665
+ "step": 252000
19666
+ },
19667
+ {
19668
+ "epoch": 0.027908170954292,
19669
+ "eval_loss": 2.102551221847534,
19670
+ "eval_runtime": 51.511,
19671
+ "eval_samples_per_second": 197.899,
19672
+ "eval_steps_per_second": 1.553,
19673
+ "step": 252000
19674
+ },
19675
+ {
19676
+ "epoch": 0.028187252663834917,
19677
+ "grad_norm": 2.0444447994232178,
19678
+ "learning_rate": 1.0448816465531513e-05,
19679
+ "loss": 2.147,
19680
+ "step": 252100
19681
+ },
19682
+ {
19683
+ "epoch": 0.02846633437337784,
19684
+ "grad_norm": 2.0270159244537354,
19685
+ "learning_rate": 1.0430818533094403e-05,
19686
+ "loss": 2.1492,
19687
+ "step": 252200
19688
+ },
19689
+ {
19690
+ "epoch": 0.028745416082920756,
19691
+ "grad_norm": 2.180891275405884,
19692
+ "learning_rate": 1.0412832027578622e-05,
19693
+ "loss": 2.152,
19694
+ "step": 252300
19695
+ },
19696
+ {
19697
+ "epoch": 0.029024497792463678,
19698
+ "grad_norm": 1.3767441511154175,
19699
+ "learning_rate": 1.039485696309139e-05,
19700
+ "loss": 2.1437,
19701
+ "step": 252400
19702
+ },
19703
+ {
19704
+ "epoch": 0.0293035795020066,
19705
+ "grad_norm": 2.1430041790008545,
19706
+ "learning_rate": 1.0376893353730913e-05,
19707
+ "loss": 2.1453,
19708
+ "step": 252500
19709
+ },
19710
+ {
19711
+ "epoch": 0.029582661211549517,
19712
+ "grad_norm": 1.045134425163269,
19713
+ "learning_rate": 1.0358941213586443e-05,
19714
+ "loss": 2.1367,
19715
+ "step": 252600
19716
+ },
19717
+ {
19718
+ "epoch": 0.02986174292109244,
19719
+ "grad_norm": 2.374004602432251,
19720
+ "learning_rate": 1.0341000556738229e-05,
19721
+ "loss": 2.1499,
19722
+ "step": 252700
19723
+ },
19724
+ {
19725
+ "epoch": 0.030140824630635357,
19726
+ "grad_norm": 0.9447265863418579,
19727
+ "learning_rate": 1.0323071397257514e-05,
19728
+ "loss": 2.1664,
19729
+ "step": 252800
19730
+ },
19731
+ {
19732
+ "epoch": 0.030419906340178278,
19733
+ "grad_norm": 0.9592018127441406,
19734
+ "learning_rate": 1.0305153749206531e-05,
19735
+ "loss": 2.2281,
19736
+ "step": 252900
19737
+ },
19738
+ {
19739
+ "epoch": 0.030698988049721196,
19740
+ "grad_norm": 0.9208105802536011,
19741
+ "learning_rate": 1.0287247626638455e-05,
19742
+ "loss": 2.2157,
19743
+ "step": 253000
19744
+ },
19745
+ {
19746
+ "epoch": 0.030698988049721196,
19747
+ "eval_loss": 2.099860429763794,
19748
+ "eval_runtime": 51.562,
19749
+ "eval_samples_per_second": 197.704,
19750
+ "eval_steps_per_second": 1.552,
19751
+ "step": 253000
19752
+ },
19753
+ {
19754
+ "epoch": 0.030978069759264117,
19755
+ "grad_norm": 0.9355520606040955,
19756
+ "learning_rate": 1.0269353043597463e-05,
19757
+ "loss": 2.218,
19758
+ "step": 253100
19759
+ },
19760
+ {
19761
+ "epoch": 0.03125715146880704,
19762
+ "grad_norm": 0.9669928550720215,
19763
+ "learning_rate": 1.0251470014118641e-05,
19764
+ "loss": 2.2214,
19765
+ "step": 253200
19766
+ },
19767
+ {
19768
+ "epoch": 0.03153623317834996,
19769
+ "grad_norm": 0.9217200875282288,
19770
+ "learning_rate": 1.0233598552228049e-05,
19771
+ "loss": 2.1921,
19772
+ "step": 253300
19773
+ },
19774
+ {
19775
+ "epoch": 0.031815314887892875,
19776
+ "grad_norm": 0.9365478754043579,
19777
+ "learning_rate": 1.021573867194264e-05,
19778
+ "loss": 2.1832,
19779
+ "step": 253400
19780
+ },
19781
+ {
19782
+ "epoch": 0.0320943965974358,
19783
+ "grad_norm": 0.952186107635498,
19784
+ "learning_rate": 1.0197890387270311e-05,
19785
+ "loss": 2.1899,
19786
+ "step": 253500
19787
+ },
19788
+ {
19789
+ "epoch": 0.03237347830697872,
19790
+ "grad_norm": 0.950163722038269,
19791
+ "learning_rate": 1.0180053712209855e-05,
19792
+ "loss": 2.1778,
19793
+ "step": 253600
19794
+ },
19795
+ {
19796
+ "epoch": 0.032652560016521635,
19797
+ "grad_norm": 0.9093700051307678,
19798
+ "learning_rate": 1.0162228660750967e-05,
19799
+ "loss": 2.1641,
19800
+ "step": 253700
19801
+ },
19802
+ {
19803
+ "epoch": 0.03293164172606456,
19804
+ "grad_norm": 0.9941834211349487,
19805
+ "learning_rate": 1.0144415246874198e-05,
19806
+ "loss": 2.1762,
19807
+ "step": 253800
19808
+ },
19809
+ {
19810
+ "epoch": 0.03321072343560748,
19811
+ "grad_norm": 0.9335500001907349,
19812
+ "learning_rate": 1.0126613484550997e-05,
19813
+ "loss": 2.1727,
19814
+ "step": 253900
19815
+ },
19816
+ {
19817
+ "epoch": 0.033489805145150396,
19818
+ "grad_norm": 0.963545560836792,
19819
+ "learning_rate": 1.0108823387743674e-05,
19820
+ "loss": 2.171,
19821
+ "step": 254000
19822
+ },
19823
+ {
19824
+ "epoch": 0.033489805145150396,
19825
+ "eval_loss": 2.112656354904175,
19826
+ "eval_runtime": 51.6053,
19827
+ "eval_samples_per_second": 197.538,
19828
+ "eval_steps_per_second": 1.55,
19829
+ "step": 254000
19830
+ },
19831
+ {
19832
+ "epoch": 0.00027908170954291995,
19833
+ "grad_norm": 0.9451214075088501,
19834
+ "learning_rate": 1.0091044970405386e-05,
19835
+ "loss": 2.1437,
19836
+ "step": 254100
19837
+ },
19838
+ {
19839
+ "epoch": 0.0005581634190858399,
19840
+ "grad_norm": 0.9729546904563904,
19841
+ "learning_rate": 1.0073278246480113e-05,
19842
+ "loss": 2.1442,
19843
+ "step": 254200
19844
+ },
19845
+ {
19846
+ "epoch": 0.0008372451286287599,
19847
+ "grad_norm": 0.9474550485610962,
19848
+ "learning_rate": 1.0055523229902686e-05,
19849
+ "loss": 2.1518,
19850
+ "step": 254300
19851
+ },
19852
+ {
19853
+ "epoch": 0.0011163268381716798,
19854
+ "grad_norm": 0.984620213508606,
19855
+ "learning_rate": 1.0037779934598754e-05,
19856
+ "loss": 2.1314,
19857
+ "step": 254400
19858
+ },
19859
+ {
19860
+ "epoch": 0.0013954085477146,
19861
+ "grad_norm": 0.9437354803085327,
19862
+ "learning_rate": 1.0020048374484745e-05,
19863
+ "loss": 2.1189,
19864
+ "step": 254500
19865
+ },
19866
+ {
19867
+ "epoch": 0.0016744902572575198,
19868
+ "grad_norm": 0.9442464113235474,
19869
+ "learning_rate": 1.0002328563467917e-05,
19870
+ "loss": 2.123,
19871
+ "step": 254600
19872
+ },
19873
+ {
19874
+ "epoch": 0.00195357196680044,
19875
+ "grad_norm": 0.9733272790908813,
19876
+ "learning_rate": 9.984620515446283e-06,
19877
+ "loss": 2.1317,
19878
+ "step": 254700
19879
+ },
19880
+ {
19881
+ "epoch": 0.0022326536763433596,
19882
+ "grad_norm": 0.9501616358757019,
19883
+ "learning_rate": 9.966924244308656e-06,
19884
+ "loss": 2.1229,
19885
+ "step": 254800
19886
+ },
19887
+ {
19888
+ "epoch": 0.0025117353858862797,
19889
+ "grad_norm": 0.9912092089653015,
19890
+ "learning_rate": 9.949239763934603e-06,
19891
+ "loss": 2.1353,
19892
+ "step": 254900
19893
+ },
19894
+ {
19895
+ "epoch": 0.0027908170954292,
19896
+ "grad_norm": 0.9838760495185852,
19897
+ "learning_rate": 9.931567088194429e-06,
19898
+ "loss": 2.1175,
19899
+ "step": 255000
19900
+ },
19901
+ {
19902
+ "epoch": 0.0027908170954292,
19903
+ "eval_loss": 2.1234488487243652,
19904
+ "eval_runtime": 52.035,
19905
+ "eval_samples_per_second": 195.907,
19906
+ "eval_steps_per_second": 1.537,
19907
+ "step": 255000
19908
+ },
19909
+ {
19910
+ "epoch": 0.00306989880497212,
19911
+ "grad_norm": 0.9307495951652527,
19912
+ "learning_rate": 9.913906230949201e-06,
19913
+ "loss": 2.1272,
19914
+ "step": 255100
19915
+ },
19916
+ {
19917
+ "epoch": 0.0033489805145150396,
19918
+ "grad_norm": 0.9529586434364319,
19919
+ "learning_rate": 9.896257206050705e-06,
19920
+ "loss": 2.1184,
19921
+ "step": 255200
19922
+ },
19923
+ {
19924
+ "epoch": 0.0036280622240579597,
19925
+ "grad_norm": 1.0264242887496948,
19926
+ "learning_rate": 9.87862002734146e-06,
19927
+ "loss": 2.1008,
19928
+ "step": 255300
19929
+ },
19930
+ {
19931
+ "epoch": 0.00390714393360088,
19932
+ "grad_norm": 0.9989575147628784,
19933
+ "learning_rate": 9.860994708654663e-06,
19934
+ "loss": 2.0978,
19935
+ "step": 255400
19936
+ },
19937
+ {
19938
+ "epoch": 0.0041862256431437995,
19939
+ "grad_norm": 0.9874339699745178,
19940
+ "learning_rate": 9.843381263814242e-06,
19941
+ "loss": 2.1105,
19942
+ "step": 255500
19943
+ },
19944
+ {
19945
+ "epoch": 0.004465307352686719,
19946
+ "grad_norm": 0.8986032605171204,
19947
+ "learning_rate": 9.8257797066348e-06,
19948
+ "loss": 2.1097,
19949
+ "step": 255600
19950
+ },
19951
+ {
19952
+ "epoch": 0.00474438906222964,
19953
+ "grad_norm": 0.968845546245575,
19954
+ "learning_rate": 9.808190050921618e-06,
19955
+ "loss": 2.0813,
19956
+ "step": 255700
19957
+ },
19958
+ {
19959
+ "epoch": 0.005023470771772559,
19960
+ "grad_norm": 0.9542651176452637,
19961
+ "learning_rate": 9.790612310470637e-06,
19962
+ "loss": 2.1062,
19963
+ "step": 255800
19964
+ },
19965
+ {
19966
+ "epoch": 0.00530255248131548,
19967
+ "grad_norm": 0.9598912596702576,
19968
+ "learning_rate": 9.773046499068447e-06,
19969
+ "loss": 2.1088,
19970
+ "step": 255900
19971
+ },
19972
+ {
19973
+ "epoch": 0.0055816341908584,
19974
+ "grad_norm": 0.9718886017799377,
19975
+ "learning_rate": 9.755492630492296e-06,
19976
+ "loss": 2.1028,
19977
+ "step": 256000
19978
+ },
19979
+ {
19980
+ "epoch": 0.0055816341908584,
19981
+ "eval_loss": 2.1331796646118164,
19982
+ "eval_runtime": 51.757,
19983
+ "eval_samples_per_second": 196.959,
19984
+ "eval_steps_per_second": 1.546,
19985
+ "step": 256000
19986
+ },
19987
+ {
19988
+ "epoch": 0.005860715900401319,
19989
+ "grad_norm": 0.9607440829277039,
19990
+ "learning_rate": 9.73795071851006e-06,
19991
+ "loss": 2.0957,
19992
+ "step": 256100
19993
+ },
19994
+ {
19995
+ "epoch": 0.00613979760994424,
19996
+ "grad_norm": 0.9715221524238586,
19997
+ "learning_rate": 9.720420776880248e-06,
19998
+ "loss": 2.0837,
19999
+ "step": 256200
20000
+ },
20001
+ {
20002
+ "epoch": 0.0064188793194871595,
20003
+ "grad_norm": 0.9565144777297974,
20004
+ "learning_rate": 9.70290281935195e-06,
20005
+ "loss": 2.0889,
20006
+ "step": 256300
20007
+ },
20008
+ {
20009
+ "epoch": 0.006697961029030079,
20010
+ "grad_norm": 0.921954333782196,
20011
+ "learning_rate": 9.685396859664883e-06,
20012
+ "loss": 2.0754,
20013
+ "step": 256400
20014
+ },
20015
+ {
20016
+ "epoch": 0.006977042738573,
20017
+ "grad_norm": 0.9448678493499756,
20018
+ "learning_rate": 9.667902911549348e-06,
20019
+ "loss": 2.0773,
20020
+ "step": 256500
20021
+ },
20022
+ {
20023
+ "epoch": 0.0072561244481159195,
20024
+ "grad_norm": 0.9528698325157166,
20025
+ "learning_rate": 9.650420988726231e-06,
20026
+ "loss": 2.0708,
20027
+ "step": 256600
20028
+ },
20029
+ {
20030
+ "epoch": 0.007535206157658839,
20031
+ "grad_norm": 0.9941433668136597,
20032
+ "learning_rate": 9.632951104906962e-06,
20033
+ "loss": 2.0687,
20034
+ "step": 256700
20035
+ },
20036
+ {
20037
+ "epoch": 0.00781428786720176,
20038
+ "grad_norm": 0.9805740118026733,
20039
+ "learning_rate": 9.615493273793555e-06,
20040
+ "loss": 2.0667,
20041
+ "step": 256800
20042
+ },
20043
+ {
20044
+ "epoch": 0.00809336957674468,
20045
+ "grad_norm": 1.0303276777267456,
20046
+ "learning_rate": 9.598047509078562e-06,
20047
+ "loss": 2.072,
20048
+ "step": 256900
20049
+ },
20050
+ {
20051
+ "epoch": 0.008372451286287599,
20052
+ "grad_norm": 0.9494913816452026,
20053
+ "learning_rate": 9.580613824445076e-06,
20054
+ "loss": 2.0597,
20055
+ "step": 257000
20056
+ },
20057
+ {
20058
+ "epoch": 0.008372451286287599,
20059
+ "eval_loss": 2.1403818130493164,
20060
+ "eval_runtime": 51.6861,
20061
+ "eval_samples_per_second": 197.229,
20062
+ "eval_steps_per_second": 1.548,
20063
+ "step": 257000
20064
+ },
20065
+ {
20066
+ "epoch": 0.008651532995830519,
20067
+ "grad_norm": 0.9716615080833435,
20068
+ "learning_rate": 9.563192233566701e-06,
20069
+ "loss": 2.0605,
20070
+ "step": 257100
20071
+ },
20072
+ {
20073
+ "epoch": 0.008930614705373438,
20074
+ "grad_norm": 0.9704664349555969,
20075
+ "learning_rate": 9.54578275010756e-06,
20076
+ "loss": 2.0707,
20077
+ "step": 257200
20078
+ },
20079
+ {
20080
+ "epoch": 0.00920969641491636,
20081
+ "grad_norm": 0.981453537940979,
20082
+ "learning_rate": 9.528385387722285e-06,
20083
+ "loss": 2.058,
20084
+ "step": 257300
20085
+ },
20086
+ {
20087
+ "epoch": 0.00948877812445928,
20088
+ "grad_norm": 0.9496264457702637,
20089
+ "learning_rate": 9.511000160056016e-06,
20090
+ "loss": 2.0587,
20091
+ "step": 257400
20092
+ },
20093
+ {
20094
+ "epoch": 0.0097678598340022,
20095
+ "grad_norm": 0.9844343662261963,
20096
+ "learning_rate": 9.493627080744341e-06,
20097
+ "loss": 2.0689,
20098
+ "step": 257500
20099
+ },
20100
+ {
20101
+ "epoch": 0.010046941543545119,
20102
+ "grad_norm": 1.0004667043685913,
20103
+ "learning_rate": 9.476266163413345e-06,
20104
+ "loss": 2.0594,
20105
+ "step": 257600
20106
+ },
20107
+ {
20108
+ "epoch": 0.010326023253088039,
20109
+ "grad_norm": 1.0279428958892822,
20110
+ "learning_rate": 9.458917421679568e-06,
20111
+ "loss": 2.055,
20112
+ "step": 257700
20113
+ },
20114
+ {
20115
+ "epoch": 0.01060510496263096,
20116
+ "grad_norm": 0.9908036589622498,
20117
+ "learning_rate": 9.44158086915001e-06,
20118
+ "loss": 2.0572,
20119
+ "step": 257800
20120
+ },
20121
+ {
20122
+ "epoch": 0.01088418667217388,
20123
+ "grad_norm": 0.9733301401138306,
20124
+ "learning_rate": 9.42425651942208e-06,
20125
+ "loss": 2.0424,
20126
+ "step": 257900
20127
+ },
20128
+ {
20129
+ "epoch": 0.0111632683817168,
20130
+ "grad_norm": 0.98853999376297,
20131
+ "learning_rate": 9.406944386083652e-06,
20132
+ "loss": 2.0598,
20133
+ "step": 258000
20134
+ },
20135
+ {
20136
+ "epoch": 0.0111632683817168,
20137
+ "eval_loss": 2.1505215167999268,
20138
+ "eval_runtime": 51.7324,
20139
+ "eval_samples_per_second": 197.053,
20140
+ "eval_steps_per_second": 1.546,
20141
+ "step": 258000
20142
+ },
20143
+ {
20144
+ "epoch": 0.011442350091259719,
20145
+ "grad_norm": 0.9915367960929871,
20146
+ "learning_rate": 9.389644482712997e-06,
20147
+ "loss": 2.0376,
20148
+ "step": 258100
20149
+ },
20150
+ {
20151
+ "epoch": 0.011721431800802639,
20152
+ "grad_norm": 0.9439958930015564,
20153
+ "learning_rate": 9.372356822878813e-06,
20154
+ "loss": 2.0403,
20155
+ "step": 258200
20156
+ },
20157
+ {
20158
+ "epoch": 0.012000513510345558,
20159
+ "grad_norm": 0.9918619394302368,
20160
+ "learning_rate": 9.355081420140164e-06,
20161
+ "loss": 2.0297,
20162
+ "step": 258300
20163
+ },
20164
+ {
20165
+ "epoch": 0.01227959521988848,
20166
+ "grad_norm": 0.9755305647850037,
20167
+ "learning_rate": 9.337818288046535e-06,
20168
+ "loss": 2.042,
20169
+ "step": 258400
20170
+ },
20171
+ {
20172
+ "epoch": 0.0125586769294314,
20173
+ "grad_norm": 0.9416148662567139,
20174
+ "learning_rate": 9.32056744013775e-06,
20175
+ "loss": 2.0301,
20176
+ "step": 258500
20177
+ },
20178
+ {
20179
+ "epoch": 0.012837758638974319,
20180
+ "grad_norm": 0.9932397603988647,
20181
+ "learning_rate": 9.303328889944044e-06,
20182
+ "loss": 2.0513,
20183
+ "step": 258600
20184
+ },
20185
+ {
20186
+ "epoch": 0.013116840348517239,
20187
+ "grad_norm": 0.9738903045654297,
20188
+ "learning_rate": 9.286102650985957e-06,
20189
+ "loss": 2.0442,
20190
+ "step": 258700
20191
+ },
20192
+ {
20193
+ "epoch": 0.013395922058060158,
20194
+ "grad_norm": 0.9329037070274353,
20195
+ "learning_rate": 9.268888736774408e-06,
20196
+ "loss": 2.0121,
20197
+ "step": 258800
20198
+ },
20199
+ {
20200
+ "epoch": 0.013675003767603078,
20201
+ "grad_norm": 0.9853471517562866,
20202
+ "learning_rate": 9.251687160810643e-06,
20203
+ "loss": 2.0339,
20204
+ "step": 258900
20205
+ },
20206
+ {
20207
+ "epoch": 0.013954085477146,
20208
+ "grad_norm": 0.9692677855491638,
20209
+ "learning_rate": 9.23449793658622e-06,
20210
+ "loss": 2.0281,
20211
+ "step": 259000
20212
+ },
20213
+ {
20214
+ "epoch": 0.013954085477146,
20215
+ "eval_loss": 2.1587061882019043,
20216
+ "eval_runtime": 51.7564,
20217
+ "eval_samples_per_second": 196.961,
20218
+ "eval_steps_per_second": 1.546,
20219
+ "step": 259000
20220
+ },
20221
+ {
20222
+ "epoch": 0.01423316718668892,
20223
+ "grad_norm": 0.9744108319282532,
20224
+ "learning_rate": 9.21732107758303e-06,
20225
+ "loss": 2.0165,
20226
+ "step": 259100
20227
+ },
20228
+ {
20229
+ "epoch": 0.014512248896231839,
20230
+ "grad_norm": 0.9786638617515564,
20231
+ "learning_rate": 9.200156597273235e-06,
20232
+ "loss": 2.0186,
20233
+ "step": 259200
20234
+ },
20235
+ {
20236
+ "epoch": 0.014791330605774759,
20237
+ "grad_norm": 0.9883900880813599,
20238
+ "learning_rate": 9.183004509119308e-06,
20239
+ "loss": 2.0363,
20240
+ "step": 259300
20241
+ },
20242
+ {
20243
+ "epoch": 0.015070412315317678,
20244
+ "grad_norm": 0.9344263076782227,
20245
+ "learning_rate": 9.165864826574003e-06,
20246
+ "loss": 2.0281,
20247
+ "step": 259400
20248
+ },
20249
+ {
20250
+ "epoch": 0.015349494024860598,
20251
+ "grad_norm": 0.9897043704986572,
20252
+ "learning_rate": 9.148737563080348e-06,
20253
+ "loss": 2.0178,
20254
+ "step": 259500
20255
+ },
20256
+ {
20257
+ "epoch": 0.01562857573440352,
20258
+ "grad_norm": 1.0104656219482422,
20259
+ "learning_rate": 9.131622732071607e-06,
20260
+ "loss": 2.0277,
20261
+ "step": 259600
20262
+ },
20263
+ {
20264
+ "epoch": 0.015907657443946437,
20265
+ "grad_norm": 0.9919978976249695,
20266
+ "learning_rate": 9.114520346971324e-06,
20267
+ "loss": 2.012,
20268
+ "step": 259700
20269
+ },
20270
+ {
20271
+ "epoch": 0.01618673915348936,
20272
+ "grad_norm": 0.9721410274505615,
20273
+ "learning_rate": 9.097430421193254e-06,
20274
+ "loss": 2.0232,
20275
+ "step": 259800
20276
+ },
20277
+ {
20278
+ "epoch": 0.01646582086303228,
20279
+ "grad_norm": 0.9424166083335876,
20280
+ "learning_rate": 9.080352968141404e-06,
20281
+ "loss": 2.0118,
20282
+ "step": 259900
20283
+ },
20284
+ {
20285
+ "epoch": 0.016744902572575198,
20286
+ "grad_norm": 0.9954734444618225,
20287
+ "learning_rate": 9.063288001209969e-06,
20288
+ "loss": 2.017,
20289
+ "step": 260000
20290
+ },
20291
+ {
20292
+ "epoch": 0.016744902572575198,
20293
+ "eval_loss": 2.156609296798706,
20294
+ "eval_runtime": 51.7856,
20295
+ "eval_samples_per_second": 196.85,
20296
+ "eval_steps_per_second": 1.545,
20297
+ "step": 260000
20298
+ },
20299
+ {
20300
+ "epoch": 0.01702398428211812,
20301
+ "grad_norm": 0.9650399684906006,
20302
+ "learning_rate": 9.046235533783381e-06,
20303
+ "loss": 2.0188,
20304
+ "step": 260100
20305
+ },
20306
+ {
20307
+ "epoch": 0.017303065991661037,
20308
+ "grad_norm": 1.014950156211853,
20309
+ "learning_rate": 9.029195579236252e-06,
20310
+ "loss": 2.0116,
20311
+ "step": 260200
20312
+ },
20313
+ {
20314
+ "epoch": 0.01758214770120396,
20315
+ "grad_norm": 1.0187995433807373,
20316
+ "learning_rate": 9.012168150933394e-06,
20317
+ "loss": 2.015,
20318
+ "step": 260300
20319
+ },
20320
+ {
20321
+ "epoch": 0.017861229410746877,
20322
+ "grad_norm": 0.959968090057373,
20323
+ "learning_rate": 8.995153262229769e-06,
20324
+ "loss": 2.009,
20325
+ "step": 260400
20326
+ },
20327
+ {
20328
+ "epoch": 0.018140311120289798,
20329
+ "grad_norm": 0.9517911076545715,
20330
+ "learning_rate": 8.978150926470524e-06,
20331
+ "loss": 1.986,
20332
+ "step": 260500
20333
+ },
20334
+ {
20335
+ "epoch": 0.01841939282983272,
20336
+ "grad_norm": 0.9722422957420349,
20337
+ "learning_rate": 8.961161156990958e-06,
20338
+ "loss": 1.9976,
20339
+ "step": 260600
20340
+ },
20341
+ {
20342
+ "epoch": 0.018698474539375638,
20343
+ "grad_norm": 0.9573680758476257,
20344
+ "learning_rate": 8.944183967116519e-06,
20345
+ "loss": 2.0034,
20346
+ "step": 260700
20347
+ },
20348
+ {
20349
+ "epoch": 0.01897755624891856,
20350
+ "grad_norm": 0.9799074530601501,
20351
+ "learning_rate": 8.92721937016276e-06,
20352
+ "loss": 1.992,
20353
+ "step": 260800
20354
+ },
20355
+ {
20356
+ "epoch": 0.019256637958461477,
20357
+ "grad_norm": 0.9554262161254883,
20358
+ "learning_rate": 8.910267379435391e-06,
20359
+ "loss": 2.009,
20360
+ "step": 260900
20361
+ },
20362
+ {
20363
+ "epoch": 0.0195357196680044,
20364
+ "grad_norm": 0.988070547580719,
20365
+ "learning_rate": 8.893328008230231e-06,
20366
+ "loss": 1.9862,
20367
+ "step": 261000
20368
+ },
20369
+ {
20370
+ "epoch": 0.0195357196680044,
20371
+ "eval_loss": 2.1644506454467773,
20372
+ "eval_runtime": 51.9499,
20373
+ "eval_samples_per_second": 196.227,
20374
+ "eval_steps_per_second": 1.54,
20375
+ "step": 261000
20376
+ },
20377
+ {
20378
+ "epoch": 0.01981480137754732,
20379
+ "grad_norm": 0.9487972259521484,
20380
+ "learning_rate": 8.876401269833173e-06,
20381
+ "loss": 1.9909,
20382
+ "step": 261100
20383
+ },
20384
+ {
20385
+ "epoch": 0.020093883087090238,
20386
+ "grad_norm": 0.9944525957107544,
20387
+ "learning_rate": 8.859487177520237e-06,
20388
+ "loss": 2.0028,
20389
+ "step": 261200
20390
+ },
20391
+ {
20392
+ "epoch": 0.02037296479663316,
20393
+ "grad_norm": 1.0023555755615234,
20394
+ "learning_rate": 8.842585744557493e-06,
20395
+ "loss": 1.9972,
20396
+ "step": 261300
20397
+ },
20398
+ {
20399
+ "epoch": 0.020652046506176077,
20400
+ "grad_norm": 0.9712923765182495,
20401
+ "learning_rate": 8.825696984201107e-06,
20402
+ "loss": 1.996,
20403
+ "step": 261400
20404
+ },
20405
+ {
20406
+ "epoch": 0.020931128215719,
20407
+ "grad_norm": 0.9709236025810242,
20408
+ "learning_rate": 8.8088209096973e-06,
20409
+ "loss": 1.9829,
20410
+ "step": 261500
20411
+ },
20412
+ {
20413
+ "epoch": 0.02121020992526192,
20414
+ "grad_norm": 1.0036200284957886,
20415
+ "learning_rate": 8.791957534282322e-06,
20416
+ "loss": 1.9891,
20417
+ "step": 261600
20418
+ },
20419
+ {
20420
+ "epoch": 0.021489291634804838,
20421
+ "grad_norm": 1.0096827745437622,
20422
+ "learning_rate": 8.775106871182492e-06,
20423
+ "loss": 1.9834,
20424
+ "step": 261700
20425
+ },
20426
+ {
20427
+ "epoch": 0.02176837334434776,
20428
+ "grad_norm": 1.0069483518600464,
20429
+ "learning_rate": 8.758268933614148e-06,
20430
+ "loss": 1.9952,
20431
+ "step": 261800
20432
+ },
20433
+ {
20434
+ "epoch": 0.022047455053890677,
20435
+ "grad_norm": 0.9731675982475281,
20436
+ "learning_rate": 8.741443734783646e-06,
20437
+ "loss": 1.9898,
20438
+ "step": 261900
20439
+ },
20440
+ {
20441
+ "epoch": 0.0223265367634336,
20442
+ "grad_norm": 0.9884027242660522,
20443
+ "learning_rate": 8.724631287887342e-06,
20444
+ "loss": 1.9831,
20445
+ "step": 262000
20446
+ },
20447
+ {
20448
+ "epoch": 0.0223265367634336,
20449
+ "eval_loss": 2.1643013954162598,
20450
+ "eval_runtime": 52.068,
20451
+ "eval_samples_per_second": 195.783,
20452
+ "eval_steps_per_second": 1.536,
20453
+ "step": 262000
20454
+ },
20455
+ {
20456
+ "epoch": 0.022605618472976517,
20457
+ "grad_norm": 0.9630898237228394,
20458
+ "learning_rate": 8.7078316061116e-06,
20459
+ "loss": 1.9869,
20460
+ "step": 262100
20461
+ },
20462
+ {
20463
+ "epoch": 0.022884700182519438,
20464
+ "grad_norm": 0.9709094762802124,
20465
+ "learning_rate": 8.691044702632775e-06,
20466
+ "loss": 1.9807,
20467
+ "step": 262200
20468
+ },
20469
+ {
20470
+ "epoch": 0.02316378189206236,
20471
+ "grad_norm": 0.9887747168540955,
20472
+ "learning_rate": 8.674270590617201e-06,
20473
+ "loss": 1.9784,
20474
+ "step": 262300
20475
+ },
20476
+ {
20477
+ "epoch": 0.023442863601605277,
20478
+ "grad_norm": 0.9830183386802673,
20479
+ "learning_rate": 8.657509283221157e-06,
20480
+ "loss": 1.9951,
20481
+ "step": 262400
20482
+ },
20483
+ {
20484
+ "epoch": 0.0237219453111482,
20485
+ "grad_norm": 0.9983903765678406,
20486
+ "learning_rate": 8.640760793590915e-06,
20487
+ "loss": 1.982,
20488
+ "step": 262500
20489
+ },
20490
+ {
20491
+ "epoch": 0.024001027020691117,
20492
+ "grad_norm": 0.9943532347679138,
20493
+ "learning_rate": 8.624025134862654e-06,
20494
+ "loss": 1.9767,
20495
+ "step": 262600
20496
+ },
20497
+ {
20498
+ "epoch": 0.024280108730234038,
20499
+ "grad_norm": 1.0101203918457031,
20500
+ "learning_rate": 8.607302320162522e-06,
20501
+ "loss": 1.9672,
20502
+ "step": 262700
20503
+ },
20504
+ {
20505
+ "epoch": 0.02455919043977696,
20506
+ "grad_norm": 0.9875810146331787,
20507
+ "learning_rate": 8.590592362606587e-06,
20508
+ "loss": 1.987,
20509
+ "step": 262800
20510
+ },
20511
+ {
20512
+ "epoch": 0.024838272149319877,
20513
+ "grad_norm": 0.997082531452179,
20514
+ "learning_rate": 8.573895275300811e-06,
20515
+ "loss": 1.9738,
20516
+ "step": 262900
20517
+ },
20518
+ {
20519
+ "epoch": 0.0251173538588628,
20520
+ "grad_norm": 0.9400111436843872,
20521
+ "learning_rate": 8.557211071341084e-06,
20522
+ "loss": 1.9804,
20523
+ "step": 263000
20524
+ },
20525
+ {
20526
+ "epoch": 0.0251173538588628,
20527
+ "eval_loss": 2.1664023399353027,
20528
+ "eval_runtime": 52.0751,
20529
+ "eval_samples_per_second": 195.756,
20530
+ "eval_steps_per_second": 1.536,
20531
+ "step": 263000
20532
+ },
20533
+ {
20534
+ "epoch": 0.025396435568405717,
20535
+ "grad_norm": 1.0243192911148071,
20536
+ "learning_rate": 8.540539763813187e-06,
20537
+ "loss": 1.9642,
20538
+ "step": 263100
20539
+ },
20540
+ {
20541
+ "epoch": 0.025675517277948638,
20542
+ "grad_norm": 1.0003769397735596,
20543
+ "learning_rate": 8.523881365792794e-06,
20544
+ "loss": 1.9634,
20545
+ "step": 263200
20546
+ },
20547
+ {
20548
+ "epoch": 0.025954598987491556,
20549
+ "grad_norm": 0.9745362997055054,
20550
+ "learning_rate": 8.507235890345424e-06,
20551
+ "loss": 1.9727,
20552
+ "step": 263300
20553
+ },
20554
+ {
20555
+ "epoch": 0.026233680697034478,
20556
+ "grad_norm": 0.9894079566001892,
20557
+ "learning_rate": 8.490603350526489e-06,
20558
+ "loss": 1.9687,
20559
+ "step": 263400
20560
+ },
20561
+ {
20562
+ "epoch": 0.0265127624065774,
20563
+ "grad_norm": 1.0135899782180786,
20564
+ "learning_rate": 8.473983759381247e-06,
20565
+ "loss": 1.9791,
20566
+ "step": 263500
20567
+ },
20568
+ {
20569
+ "epoch": 0.026791844116120317,
20570
+ "grad_norm": 0.9646082520484924,
20571
+ "learning_rate": 8.457377129944805e-06,
20572
+ "loss": 1.9704,
20573
+ "step": 263600
20574
+ },
20575
+ {
20576
+ "epoch": 0.02707092582566324,
20577
+ "grad_norm": 0.9717702269554138,
20578
+ "learning_rate": 8.440783475242086e-06,
20579
+ "loss": 1.9584,
20580
+ "step": 263700
20581
+ },
20582
+ {
20583
+ "epoch": 0.027350007535206156,
20584
+ "grad_norm": 0.9921123385429382,
20585
+ "learning_rate": 8.424202808287865e-06,
20586
+ "loss": 1.9765,
20587
+ "step": 263800
20588
+ },
20589
+ {
20590
+ "epoch": 0.027629089244749078,
20591
+ "grad_norm": 0.9913526177406311,
20592
+ "learning_rate": 8.407635142086698e-06,
20593
+ "loss": 1.9592,
20594
+ "step": 263900
20595
+ },
20596
+ {
20597
+ "epoch": 0.027908170954292,
20598
+ "grad_norm": 0.9921115040779114,
20599
+ "learning_rate": 8.391080489632974e-06,
20600
+ "loss": 1.9656,
20601
+ "step": 264000
20602
+ },
20603
+ {
20604
+ "epoch": 0.027908170954292,
20605
+ "eval_loss": 2.1658823490142822,
20606
+ "eval_runtime": 52.007,
20607
+ "eval_samples_per_second": 196.012,
20608
+ "eval_steps_per_second": 1.538,
20609
+ "step": 264000
20610
+ },
20611
+ {
20612
+ "epoch": 0.028187252663834917,
20613
+ "grad_norm": 1.003160834312439,
20614
+ "learning_rate": 8.37453886391085e-06,
20615
+ "loss": 1.956,
20616
+ "step": 264100
20617
+ },
20618
+ {
20619
+ "epoch": 0.02846633437337784,
20620
+ "grad_norm": 0.9790710806846619,
20621
+ "learning_rate": 8.358010277894282e-06,
20622
+ "loss": 1.9423,
20623
+ "step": 264200
20624
+ },
20625
+ {
20626
+ "epoch": 0.028745416082920756,
20627
+ "grad_norm": 1.0112133026123047,
20628
+ "learning_rate": 8.341494744546995e-06,
20629
+ "loss": 1.9547,
20630
+ "step": 264300
20631
+ },
20632
+ {
20633
+ "epoch": 0.029024497792463678,
20634
+ "grad_norm": 1.0514979362487793,
20635
+ "learning_rate": 8.324992276822489e-06,
20636
+ "loss": 1.9609,
20637
+ "step": 264400
20638
+ },
20639
+ {
20640
+ "epoch": 0.0293035795020066,
20641
+ "grad_norm": 1.0433865785598755,
20642
+ "learning_rate": 8.30850288766398e-06,
20643
+ "loss": 1.9587,
20644
+ "step": 264500
20645
+ },
20646
+ {
20647
+ "epoch": 0.029582661211549517,
20648
+ "grad_norm": 0.986564040184021,
20649
+ "learning_rate": 8.29202659000446e-06,
20650
+ "loss": 1.9445,
20651
+ "step": 264600
20652
+ },
20653
+ {
20654
+ "epoch": 0.02986174292109244,
20655
+ "grad_norm": 0.9676020741462708,
20656
+ "learning_rate": 8.275563396766643e-06,
20657
+ "loss": 1.9563,
20658
+ "step": 264700
20659
+ },
20660
+ {
20661
+ "epoch": 0.030140824630635357,
20662
+ "grad_norm": 1.0368075370788574,
20663
+ "learning_rate": 8.259113320862971e-06,
20664
+ "loss": 1.9514,
20665
+ "step": 264800
20666
+ },
20667
+ {
20668
+ "epoch": 0.030419906340178278,
20669
+ "grad_norm": 1.046763300895691,
20670
+ "learning_rate": 8.24267637519558e-06,
20671
+ "loss": 1.9756,
20672
+ "step": 264900
20673
+ },
20674
+ {
20675
+ "epoch": 0.030698988049721196,
20676
+ "grad_norm": 0.9866767525672913,
20677
+ "learning_rate": 8.22625257265632e-06,
20678
+ "loss": 1.9415,
20679
+ "step": 265000
20680
+ },
20681
+ {
20682
+ "epoch": 0.030698988049721196,
20683
+ "eval_loss": 2.180581569671631,
20684
+ "eval_runtime": 52.0428,
20685
+ "eval_samples_per_second": 195.877,
20686
+ "eval_steps_per_second": 1.537,
20687
+ "step": 265000
20688
+ },
20689
+ {
20690
+ "epoch": 0.030978069759264117,
20691
+ "grad_norm": 0.9820157289505005,
20692
+ "learning_rate": 8.209841926126744e-06,
20693
+ "loss": 1.9674,
20694
+ "step": 265100
20695
+ },
20696
+ {
20697
+ "epoch": 0.03125715146880704,
20698
+ "grad_norm": 0.9998511672019958,
20699
+ "learning_rate": 8.193444448478054e-06,
20700
+ "loss": 1.9582,
20701
+ "step": 265200
20702
+ },
20703
+ {
20704
+ "epoch": 0.03153623317834996,
20705
+ "grad_norm": 1.0134938955307007,
20706
+ "learning_rate": 8.177060152571165e-06,
20707
+ "loss": 1.9455,
20708
+ "step": 265300
20709
+ },
20710
+ {
20711
+ "epoch": 0.031815314887892875,
20712
+ "grad_norm": 0.993373692035675,
20713
+ "learning_rate": 8.16068905125661e-06,
20714
+ "loss": 1.9548,
20715
+ "step": 265400
20716
+ },
20717
+ {
20718
+ "epoch": 0.0320943965974358,
20719
+ "grad_norm": 0.9485350847244263,
20720
+ "learning_rate": 8.144331157374604e-06,
20721
+ "loss": 1.936,
20722
+ "step": 265500
20723
+ },
20724
+ {
20725
+ "epoch": 0.03237347830697872,
20726
+ "grad_norm": 0.9534901976585388,
20727
+ "learning_rate": 8.127986483754996e-06,
20728
+ "loss": 1.9494,
20729
+ "step": 265600
20730
+ },
20731
+ {
20732
+ "epoch": 0.032652560016521635,
20733
+ "grad_norm": 0.989976167678833,
20734
+ "learning_rate": 8.111655043217274e-06,
20735
+ "loss": 1.9456,
20736
+ "step": 265700
20737
+ },
20738
+ {
20739
+ "epoch": 0.03293164172606456,
20740
+ "grad_norm": 0.9938270449638367,
20741
+ "learning_rate": 8.095336848570512e-06,
20742
+ "loss": 1.9265,
20743
+ "step": 265800
20744
+ },
20745
+ {
20746
+ "epoch": 0.03321072343560748,
20747
+ "grad_norm": 1.0147507190704346,
20748
+ "learning_rate": 8.079031912613436e-06,
20749
+ "loss": 1.9714,
20750
+ "step": 265900
20751
+ },
20752
+ {
20753
+ "epoch": 0.033489805145150396,
20754
+ "grad_norm": 0.9483819007873535,
20755
+ "learning_rate": 8.06274024813435e-06,
20756
+ "loss": 1.9384,
20757
+ "step": 266000
20758
+ },
20759
+ {
20760
+ "epoch": 0.033489805145150396,
20761
+ "eval_loss": 2.1801016330718994,
20762
+ "eval_runtime": 52.1929,
20763
+ "eval_samples_per_second": 195.314,
20764
+ "eval_steps_per_second": 1.533,
20765
+ "step": 266000
20766
+ },
20767
+ {
20768
+ "epoch": 0.033768886854693314,
20769
+ "grad_norm": 0.9632075428962708,
20770
+ "learning_rate": 8.046461867911173e-06,
20771
+ "loss": 1.9424,
20772
+ "step": 266100
20773
+ },
20774
+ {
20775
+ "epoch": 0.03404796856423624,
20776
+ "grad_norm": 0.9533110857009888,
20777
+ "learning_rate": 8.030196784711364e-06,
20778
+ "loss": 1.9376,
20779
+ "step": 266200
20780
+ },
20781
+ {
20782
+ "epoch": 0.03432705027377916,
20783
+ "grad_norm": 1.0330917835235596,
20784
+ "learning_rate": 8.013945011291996e-06,
20785
+ "loss": 1.9395,
20786
+ "step": 266300
20787
+ },
20788
+ {
20789
+ "epoch": 0.034606131983322075,
20790
+ "grad_norm": 1.0646517276763916,
20791
+ "learning_rate": 7.997706560399665e-06,
20792
+ "loss": 1.931,
20793
+ "step": 266400
20794
+ },
20795
+ {
20796
+ "epoch": 0.034885213692865,
20797
+ "grad_norm": 0.9834697842597961,
20798
+ "learning_rate": 7.981481444770552e-06,
20799
+ "loss": 1.9239,
20800
+ "step": 266500
20801
+ },
20802
+ {
20803
+ "epoch": 0.03516429540240792,
20804
+ "grad_norm": 0.9982201457023621,
20805
+ "learning_rate": 7.965269677130349e-06,
20806
+ "loss": 1.9457,
20807
+ "step": 266600
20808
+ },
20809
+ {
20810
+ "epoch": 0.035443377111950836,
20811
+ "grad_norm": 0.9639807343482971,
20812
+ "learning_rate": 7.949071270194303e-06,
20813
+ "loss": 1.951,
20814
+ "step": 266700
20815
+ },
20816
+ {
20817
+ "epoch": 0.035722458821493754,
20818
+ "grad_norm": 0.959434986114502,
20819
+ "learning_rate": 7.932886236667163e-06,
20820
+ "loss": 1.9321,
20821
+ "step": 266800
20822
+ },
20823
+ {
20824
+ "epoch": 0.03600154053103668,
20825
+ "grad_norm": 1.0044972896575928,
20826
+ "learning_rate": 7.916714589243215e-06,
20827
+ "loss": 1.9204,
20828
+ "step": 266900
20829
+ },
20830
+ {
20831
+ "epoch": 0.036280622240579596,
20832
+ "grad_norm": 1.0055923461914062,
20833
+ "learning_rate": 7.90055634060621e-06,
20834
+ "loss": 1.9338,
20835
+ "step": 267000
20836
+ },
20837
+ {
20838
+ "epoch": 0.036280622240579596,
20839
+ "eval_loss": 2.1793017387390137,
20840
+ "eval_runtime": 52.1353,
20841
+ "eval_samples_per_second": 195.53,
20842
+ "eval_steps_per_second": 1.534,
20843
+ "step": 267000
20844
+ },
20845
+ {
20846
+ "epoch": 0.036559703950122514,
20847
+ "grad_norm": 1.01926851272583,
20848
+ "learning_rate": 7.884411503429415e-06,
20849
+ "loss": 1.9398,
20850
+ "step": 267100
20851
+ },
20852
+ {
20853
+ "epoch": 0.03683878565966544,
20854
+ "grad_norm": 0.9919995665550232,
20855
+ "learning_rate": 7.868280090375574e-06,
20856
+ "loss": 1.9266,
20857
+ "step": 267200
20858
+ },
20859
+ {
20860
+ "epoch": 0.03711786736920836,
20861
+ "grad_norm": 1.0640407800674438,
20862
+ "learning_rate": 7.852162114096905e-06,
20863
+ "loss": 1.9299,
20864
+ "step": 267300
20865
+ },
20866
+ {
20867
+ "epoch": 0.037396949078751275,
20868
+ "grad_norm": 0.9906590580940247,
20869
+ "learning_rate": 7.836057587235068e-06,
20870
+ "loss": 1.9352,
20871
+ "step": 267400
20872
+ },
20873
+ {
20874
+ "epoch": 0.0376760307882942,
20875
+ "grad_norm": 1.0245193243026733,
20876
+ "learning_rate": 7.819966522421199e-06,
20877
+ "loss": 1.9367,
20878
+ "step": 267500
20879
+ },
20880
+ {
20881
+ "epoch": 0.03795511249783712,
20882
+ "grad_norm": 1.0126677751541138,
20883
+ "learning_rate": 7.803888932275872e-06,
20884
+ "loss": 1.9239,
20885
+ "step": 267600
20886
+ },
20887
+ {
20888
+ "epoch": 0.038234194207380036,
20889
+ "grad_norm": 1.0562597513198853,
20890
+ "learning_rate": 7.787824829409066e-06,
20891
+ "loss": 1.9371,
20892
+ "step": 267700
20893
+ },
20894
+ {
20895
+ "epoch": 0.038513275916922954,
20896
+ "grad_norm": 1.034492015838623,
20897
+ "learning_rate": 7.771774226420219e-06,
20898
+ "loss": 1.9432,
20899
+ "step": 267800
20900
+ },
20901
+ {
20902
+ "epoch": 0.03879235762646588,
20903
+ "grad_norm": 1.0192219018936157,
20904
+ "learning_rate": 7.75573713589815e-06,
20905
+ "loss": 1.9177,
20906
+ "step": 267900
20907
+ },
20908
+ {
20909
+ "epoch": 0.0390714393360088,
20910
+ "grad_norm": 0.9676885008811951,
20911
+ "learning_rate": 7.739713570421098e-06,
20912
+ "loss": 1.9144,
20913
+ "step": 268000
20914
+ },
20915
+ {
20916
+ "epoch": 0.0390714393360088,
20917
+ "eval_loss": 2.1744418144226074,
20918
+ "eval_runtime": 52.0494,
20919
+ "eval_samples_per_second": 195.852,
20920
+ "eval_steps_per_second": 1.537,
20921
+ "step": 268000
20922
+ },
20923
+ {
20924
+ "epoch": 0.039350521045551715,
20925
+ "grad_norm": 1.0004535913467407,
20926
+ "learning_rate": 7.72370354255669e-06,
20927
+ "loss": 1.9302,
20928
+ "step": 268100
20929
+ },
20930
+ {
20931
+ "epoch": 0.03962960275509464,
20932
+ "grad_norm": 0.9848925471305847,
20933
+ "learning_rate": 7.707707064861941e-06,
20934
+ "loss": 1.913,
20935
+ "step": 268200
20936
+ },
20937
+ {
20938
+ "epoch": 0.03990868446463756,
20939
+ "grad_norm": 0.9667465090751648,
20940
+ "learning_rate": 7.691724149883217e-06,
20941
+ "loss": 1.9257,
20942
+ "step": 268300
20943
+ },
20944
+ {
20945
+ "epoch": 0.040187766174180475,
20946
+ "grad_norm": 1.0000005960464478,
20947
+ "learning_rate": 7.67575481015627e-06,
20948
+ "loss": 1.905,
20949
+ "step": 268400
20950
+ },
20951
+ {
20952
+ "epoch": 0.04046684788372339,
20953
+ "grad_norm": 0.9792052507400513,
20954
+ "learning_rate": 7.659799058206188e-06,
20955
+ "loss": 1.9354,
20956
+ "step": 268500
20957
+ },
20958
+ {
20959
+ "epoch": 0.04074592959326632,
20960
+ "grad_norm": 0.9806801676750183,
20961
+ "learning_rate": 7.643856906547425e-06,
20962
+ "loss": 1.9173,
20963
+ "step": 268600
20964
+ },
20965
+ {
20966
+ "epoch": 0.041025011302809236,
20967
+ "grad_norm": 1.023319125175476,
20968
+ "learning_rate": 7.627928367683735e-06,
20969
+ "loss": 1.919,
20970
+ "step": 268700
20971
+ },
20972
+ {
20973
+ "epoch": 0.041304093012352154,
20974
+ "grad_norm": 1.0140260457992554,
20975
+ "learning_rate": 7.612013454108219e-06,
20976
+ "loss": 1.9271,
20977
+ "step": 268800
20978
+ },
20979
+ {
20980
+ "epoch": 0.04158317472189508,
20981
+ "grad_norm": 1.0070114135742188,
20982
+ "learning_rate": 7.596112178303291e-06,
20983
+ "loss": 1.918,
20984
+ "step": 268900
20985
+ },
20986
+ {
20987
+ "epoch": 0.041862256431438,
20988
+ "grad_norm": 0.9720276594161987,
20989
+ "learning_rate": 7.58022455274065e-06,
20990
+ "loss": 1.9272,
20991
+ "step": 269000
20992
+ },
20993
+ {
20994
+ "epoch": 0.041862256431438,
20995
+ "eval_loss": 2.1881604194641113,
20996
+ "eval_runtime": 52.0271,
20997
+ "eval_samples_per_second": 195.936,
20998
+ "eval_steps_per_second": 1.538,
20999
+ "step": 269000
21000
+ },
21001
+ {
21002
+ "epoch": 0.042141338140980915,
21003
+ "grad_norm": 1.0044163465499878,
21004
+ "learning_rate": 7.564350589881317e-06,
21005
+ "loss": 1.9334,
21006
+ "step": 269100
21007
+ },
21008
+ {
21009
+ "epoch": 0.04242041985052384,
21010
+ "grad_norm": 0.9554632902145386,
21011
+ "learning_rate": 7.548490302175565e-06,
21012
+ "loss": 1.9105,
21013
+ "step": 269200
21014
+ },
21015
+ {
21016
+ "epoch": 0.04269950156006676,
21017
+ "grad_norm": 0.9780552387237549,
21018
+ "learning_rate": 7.532643702062963e-06,
21019
+ "loss": 1.9146,
21020
+ "step": 269300
21021
+ },
21022
+ {
21023
+ "epoch": 0.042978583269609676,
21024
+ "grad_norm": 0.9981115460395813,
21025
+ "learning_rate": 7.516810801972348e-06,
21026
+ "loss": 1.9328,
21027
+ "step": 269400
21028
+ },
21029
+ {
21030
+ "epoch": 0.043257664979152594,
21031
+ "grad_norm": 1.0094027519226074,
21032
+ "learning_rate": 7.500991614321792e-06,
21033
+ "loss": 1.9343,
21034
+ "step": 269500
21035
+ },
21036
+ {
21037
+ "epoch": 0.04353674668869552,
21038
+ "grad_norm": 0.993590772151947,
21039
+ "learning_rate": 7.485186151518625e-06,
21040
+ "loss": 1.9142,
21041
+ "step": 269600
21042
+ },
21043
+ {
21044
+ "epoch": 0.043815828398238436,
21045
+ "grad_norm": 0.9575207829475403,
21046
+ "learning_rate": 7.469394425959411e-06,
21047
+ "loss": 1.9234,
21048
+ "step": 269700
21049
+ },
21050
+ {
21051
+ "epoch": 0.044094910107781354,
21052
+ "grad_norm": 1.001413106918335,
21053
+ "learning_rate": 7.453616450029951e-06,
21054
+ "loss": 1.9087,
21055
+ "step": 269800
21056
+ },
21057
+ {
21058
+ "epoch": 0.04437399181732428,
21059
+ "grad_norm": 0.9725953340530396,
21060
+ "learning_rate": 7.437852236105231e-06,
21061
+ "loss": 1.9153,
21062
+ "step": 269900
21063
+ },
21064
+ {
21065
+ "epoch": 0.0446530735268672,
21066
+ "grad_norm": 0.9421985149383545,
21067
+ "learning_rate": 7.422101796549466e-06,
21068
+ "loss": 1.8918,
21069
+ "step": 270000
21070
+ },
21071
+ {
21072
+ "epoch": 0.0446530735268672,
21073
+ "eval_loss": 2.1810696125030518,
21074
+ "eval_runtime": 52.1502,
21075
+ "eval_samples_per_second": 195.474,
21076
+ "eval_steps_per_second": 1.534,
21077
+ "step": 270000
21078
  }
21079
  ],
21080
  "logging_steps": 100,
 
21094
  "attributes": {}
21095
  }
21096
  },
21097
+ "total_flos": 2.356355071475712e+19,
21098
  "train_batch_size": 128,
21099
  "trial_name": null,
21100
  "trial_params": null
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8f7b845168445732fd0c73bfeaca5509fec78a0bea7de873006a9dc759b752ca
3
  size 5777
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a02f378eb43a11dd8f33d3260f082abc6de1be9c9ee104cd03f04d37fc9b629
3
  size 5777