gary109 commited on
Commit
4dc1362
·
1 Parent(s): f202794

End of training

Browse files
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 19.99,
3
+ "eval_accuracy": 0.7371052344006119,
4
+ "eval_loss": 1.2142982482910156,
5
+ "eval_runtime": 13.1659,
6
+ "eval_samples": 496,
7
+ "eval_samples_per_second": 37.673,
8
+ "eval_steps_per_second": 4.709,
9
+ "perplexity": 3.3679297843679636,
10
+ "train_loss": 1.3203882475157043,
11
+ "train_runtime": 9311.1992,
12
+ "train_samples": 4798,
13
+ "train_samples_per_second": 10.306,
14
+ "train_steps_per_second": 0.079
15
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 19.99,
3
+ "eval_accuracy": 0.7371052344006119,
4
+ "eval_loss": 1.2142982482910156,
5
+ "eval_runtime": 13.1659,
6
+ "eval_samples": 496,
7
+ "eval_samples_per_second": 37.673,
8
+ "eval_steps_per_second": 4.709,
9
+ "perplexity": 3.3679297843679636
10
+ }
runs/Jun17_03-59-07_2a431abae87c/events.out.tfevents.1655448203.2a431abae87c.2935.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:38eb53c598cbec41a0316d0cdba06841ceaf65ec618888753d97f4cdf8b707e8
3
+ size 363
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 19.99,
3
+ "train_loss": 1.3203882475157043,
4
+ "train_runtime": 9311.1992,
5
+ "train_samples": 4798,
6
+ "train_samples_per_second": 10.306,
7
+ "train_steps_per_second": 0.079
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,4645 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 1.2186108827590942,
3
+ "best_model_checkpoint": "wikitext_roberta-base/checkpoint-666",
4
+ "epoch": 19.986666666666668,
5
+ "global_step": 740,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.03,
12
+ "learning_rate": 1.0000000000000002e-06,
13
+ "loss": 1.8988,
14
+ "step": 1
15
+ },
16
+ {
17
+ "epoch": 0.05,
18
+ "learning_rate": 2.0000000000000003e-06,
19
+ "loss": 1.8877,
20
+ "step": 2
21
+ },
22
+ {
23
+ "epoch": 0.08,
24
+ "learning_rate": 3e-06,
25
+ "loss": 1.9457,
26
+ "step": 3
27
+ },
28
+ {
29
+ "epoch": 0.11,
30
+ "learning_rate": 4.000000000000001e-06,
31
+ "loss": 1.8727,
32
+ "step": 4
33
+ },
34
+ {
35
+ "epoch": 0.13,
36
+ "learning_rate": 5e-06,
37
+ "loss": 1.9502,
38
+ "step": 5
39
+ },
40
+ {
41
+ "epoch": 0.16,
42
+ "learning_rate": 6e-06,
43
+ "loss": 1.911,
44
+ "step": 6
45
+ },
46
+ {
47
+ "epoch": 0.19,
48
+ "learning_rate": 7.000000000000001e-06,
49
+ "loss": 1.925,
50
+ "step": 7
51
+ },
52
+ {
53
+ "epoch": 0.21,
54
+ "learning_rate": 8.000000000000001e-06,
55
+ "loss": 1.8293,
56
+ "step": 8
57
+ },
58
+ {
59
+ "epoch": 0.24,
60
+ "learning_rate": 9e-06,
61
+ "loss": 1.7548,
62
+ "step": 9
63
+ },
64
+ {
65
+ "epoch": 0.27,
66
+ "learning_rate": 1e-05,
67
+ "loss": 1.8128,
68
+ "step": 10
69
+ },
70
+ {
71
+ "epoch": 0.29,
72
+ "learning_rate": 1.1000000000000001e-05,
73
+ "loss": 1.7672,
74
+ "step": 11
75
+ },
76
+ {
77
+ "epoch": 0.32,
78
+ "learning_rate": 1.2e-05,
79
+ "loss": 1.789,
80
+ "step": 12
81
+ },
82
+ {
83
+ "epoch": 0.35,
84
+ "learning_rate": 1.3000000000000001e-05,
85
+ "loss": 1.7381,
86
+ "step": 13
87
+ },
88
+ {
89
+ "epoch": 0.37,
90
+ "learning_rate": 1.4000000000000001e-05,
91
+ "loss": 1.6687,
92
+ "step": 14
93
+ },
94
+ {
95
+ "epoch": 0.4,
96
+ "learning_rate": 1.5e-05,
97
+ "loss": 1.7386,
98
+ "step": 15
99
+ },
100
+ {
101
+ "epoch": 0.43,
102
+ "learning_rate": 1.6000000000000003e-05,
103
+ "loss": 1.6627,
104
+ "step": 16
105
+ },
106
+ {
107
+ "epoch": 0.45,
108
+ "learning_rate": 1.7000000000000003e-05,
109
+ "loss": 1.5889,
110
+ "step": 17
111
+ },
112
+ {
113
+ "epoch": 0.48,
114
+ "learning_rate": 1.8e-05,
115
+ "loss": 1.5649,
116
+ "step": 18
117
+ },
118
+ {
119
+ "epoch": 0.51,
120
+ "learning_rate": 1.9e-05,
121
+ "loss": 1.5465,
122
+ "step": 19
123
+ },
124
+ {
125
+ "epoch": 0.53,
126
+ "learning_rate": 2e-05,
127
+ "loss": 1.5523,
128
+ "step": 20
129
+ },
130
+ {
131
+ "epoch": 0.56,
132
+ "learning_rate": 2e-05,
133
+ "loss": 1.5625,
134
+ "step": 21
135
+ },
136
+ {
137
+ "epoch": 0.59,
138
+ "learning_rate": 2.1e-05,
139
+ "loss": 1.639,
140
+ "step": 22
141
+ },
142
+ {
143
+ "epoch": 0.61,
144
+ "learning_rate": 2.2000000000000003e-05,
145
+ "loss": 1.5218,
146
+ "step": 23
147
+ },
148
+ {
149
+ "epoch": 0.64,
150
+ "learning_rate": 2.3000000000000003e-05,
151
+ "loss": 1.5603,
152
+ "step": 24
153
+ },
154
+ {
155
+ "epoch": 0.67,
156
+ "learning_rate": 2.4e-05,
157
+ "loss": 1.5555,
158
+ "step": 25
159
+ },
160
+ {
161
+ "epoch": 0.69,
162
+ "learning_rate": 2.5e-05,
163
+ "loss": 1.5245,
164
+ "step": 26
165
+ },
166
+ {
167
+ "epoch": 0.72,
168
+ "learning_rate": 2.6000000000000002e-05,
169
+ "loss": 1.5195,
170
+ "step": 27
171
+ },
172
+ {
173
+ "epoch": 0.75,
174
+ "learning_rate": 2.7000000000000002e-05,
175
+ "loss": 1.5227,
176
+ "step": 28
177
+ },
178
+ {
179
+ "epoch": 0.77,
180
+ "learning_rate": 2.8000000000000003e-05,
181
+ "loss": 1.4743,
182
+ "step": 29
183
+ },
184
+ {
185
+ "epoch": 0.8,
186
+ "learning_rate": 2.9e-05,
187
+ "loss": 1.5171,
188
+ "step": 30
189
+ },
190
+ {
191
+ "epoch": 0.83,
192
+ "learning_rate": 3e-05,
193
+ "loss": 1.4961,
194
+ "step": 31
195
+ },
196
+ {
197
+ "epoch": 0.85,
198
+ "learning_rate": 3.1e-05,
199
+ "loss": 1.5427,
200
+ "step": 32
201
+ },
202
+ {
203
+ "epoch": 0.88,
204
+ "learning_rate": 3.2000000000000005e-05,
205
+ "loss": 1.4519,
206
+ "step": 33
207
+ },
208
+ {
209
+ "epoch": 0.91,
210
+ "learning_rate": 3.3e-05,
211
+ "loss": 1.4714,
212
+ "step": 34
213
+ },
214
+ {
215
+ "epoch": 0.93,
216
+ "learning_rate": 3.4000000000000007e-05,
217
+ "loss": 1.4477,
218
+ "step": 35
219
+ },
220
+ {
221
+ "epoch": 0.96,
222
+ "learning_rate": 3.5e-05,
223
+ "loss": 1.4796,
224
+ "step": 36
225
+ },
226
+ {
227
+ "epoch": 0.99,
228
+ "learning_rate": 3.6e-05,
229
+ "loss": 1.4175,
230
+ "step": 37
231
+ },
232
+ {
233
+ "epoch": 0.99,
234
+ "eval_accuracy": 0.7193657677192031,
235
+ "eval_loss": 1.3355050086975098,
236
+ "eval_runtime": 13.2653,
237
+ "eval_samples_per_second": 37.391,
238
+ "eval_steps_per_second": 4.674,
239
+ "step": 37
240
+ },
241
+ {
242
+ "epoch": 1.03,
243
+ "learning_rate": 3.7e-05,
244
+ "loss": 2.1673,
245
+ "step": 38
246
+ },
247
+ {
248
+ "epoch": 1.05,
249
+ "learning_rate": 3.8e-05,
250
+ "loss": 1.4677,
251
+ "step": 39
252
+ },
253
+ {
254
+ "epoch": 1.08,
255
+ "learning_rate": 3.9000000000000006e-05,
256
+ "loss": 1.4678,
257
+ "step": 40
258
+ },
259
+ {
260
+ "epoch": 1.11,
261
+ "learning_rate": 4e-05,
262
+ "loss": 1.4979,
263
+ "step": 41
264
+ },
265
+ {
266
+ "epoch": 1.13,
267
+ "learning_rate": 4.1e-05,
268
+ "loss": 1.4639,
269
+ "step": 42
270
+ },
271
+ {
272
+ "epoch": 1.16,
273
+ "learning_rate": 4.2e-05,
274
+ "loss": 1.4553,
275
+ "step": 43
276
+ },
277
+ {
278
+ "epoch": 1.19,
279
+ "learning_rate": 4.3e-05,
280
+ "loss": 1.3852,
281
+ "step": 44
282
+ },
283
+ {
284
+ "epoch": 1.21,
285
+ "learning_rate": 4.4000000000000006e-05,
286
+ "loss": 1.4783,
287
+ "step": 45
288
+ },
289
+ {
290
+ "epoch": 1.24,
291
+ "learning_rate": 4.5e-05,
292
+ "loss": 1.416,
293
+ "step": 46
294
+ },
295
+ {
296
+ "epoch": 1.27,
297
+ "learning_rate": 4.600000000000001e-05,
298
+ "loss": 1.4261,
299
+ "step": 47
300
+ },
301
+ {
302
+ "epoch": 1.29,
303
+ "learning_rate": 4.7e-05,
304
+ "loss": 1.3403,
305
+ "step": 48
306
+ },
307
+ {
308
+ "epoch": 1.32,
309
+ "learning_rate": 4.8e-05,
310
+ "loss": 1.4469,
311
+ "step": 49
312
+ },
313
+ {
314
+ "epoch": 1.35,
315
+ "learning_rate": 4.9e-05,
316
+ "loss": 1.3988,
317
+ "step": 50
318
+ },
319
+ {
320
+ "epoch": 1.37,
321
+ "learning_rate": 5e-05,
322
+ "loss": 1.412,
323
+ "step": 51
324
+ },
325
+ {
326
+ "epoch": 1.4,
327
+ "learning_rate": 4.9927536231884056e-05,
328
+ "loss": 1.4766,
329
+ "step": 52
330
+ },
331
+ {
332
+ "epoch": 1.43,
333
+ "learning_rate": 4.985507246376812e-05,
334
+ "loss": 1.4986,
335
+ "step": 53
336
+ },
337
+ {
338
+ "epoch": 1.45,
339
+ "learning_rate": 4.9782608695652176e-05,
340
+ "loss": 1.4841,
341
+ "step": 54
342
+ },
343
+ {
344
+ "epoch": 1.48,
345
+ "learning_rate": 4.9710144927536237e-05,
346
+ "loss": 1.4311,
347
+ "step": 55
348
+ },
349
+ {
350
+ "epoch": 1.51,
351
+ "learning_rate": 4.963768115942029e-05,
352
+ "loss": 1.4505,
353
+ "step": 56
354
+ },
355
+ {
356
+ "epoch": 1.53,
357
+ "learning_rate": 4.956521739130435e-05,
358
+ "loss": 1.4436,
359
+ "step": 57
360
+ },
361
+ {
362
+ "epoch": 1.56,
363
+ "learning_rate": 4.949275362318841e-05,
364
+ "loss": 1.3686,
365
+ "step": 58
366
+ },
367
+ {
368
+ "epoch": 1.59,
369
+ "learning_rate": 4.9420289855072464e-05,
370
+ "loss": 1.4193,
371
+ "step": 59
372
+ },
373
+ {
374
+ "epoch": 1.61,
375
+ "learning_rate": 4.9347826086956524e-05,
376
+ "loss": 1.4409,
377
+ "step": 60
378
+ },
379
+ {
380
+ "epoch": 1.64,
381
+ "learning_rate": 4.9275362318840584e-05,
382
+ "loss": 1.4257,
383
+ "step": 61
384
+ },
385
+ {
386
+ "epoch": 1.67,
387
+ "learning_rate": 4.920289855072464e-05,
388
+ "loss": 1.3458,
389
+ "step": 62
390
+ },
391
+ {
392
+ "epoch": 1.69,
393
+ "learning_rate": 4.91304347826087e-05,
394
+ "loss": 1.3916,
395
+ "step": 63
396
+ },
397
+ {
398
+ "epoch": 1.72,
399
+ "learning_rate": 4.905797101449275e-05,
400
+ "loss": 1.3797,
401
+ "step": 64
402
+ },
403
+ {
404
+ "epoch": 1.75,
405
+ "learning_rate": 4.898550724637682e-05,
406
+ "loss": 1.4372,
407
+ "step": 65
408
+ },
409
+ {
410
+ "epoch": 1.77,
411
+ "learning_rate": 4.891304347826087e-05,
412
+ "loss": 1.4756,
413
+ "step": 66
414
+ },
415
+ {
416
+ "epoch": 1.8,
417
+ "learning_rate": 4.884057971014493e-05,
418
+ "loss": 1.3883,
419
+ "step": 67
420
+ },
421
+ {
422
+ "epoch": 1.83,
423
+ "learning_rate": 4.8768115942028986e-05,
424
+ "loss": 1.3913,
425
+ "step": 68
426
+ },
427
+ {
428
+ "epoch": 1.85,
429
+ "learning_rate": 4.8695652173913046e-05,
430
+ "loss": 1.3826,
431
+ "step": 69
432
+ },
433
+ {
434
+ "epoch": 1.88,
435
+ "learning_rate": 4.8623188405797106e-05,
436
+ "loss": 1.4326,
437
+ "step": 70
438
+ },
439
+ {
440
+ "epoch": 1.91,
441
+ "learning_rate": 4.855072463768116e-05,
442
+ "loss": 1.4112,
443
+ "step": 71
444
+ },
445
+ {
446
+ "epoch": 1.93,
447
+ "learning_rate": 4.847826086956522e-05,
448
+ "loss": 1.4015,
449
+ "step": 72
450
+ },
451
+ {
452
+ "epoch": 1.96,
453
+ "learning_rate": 4.840579710144928e-05,
454
+ "loss": 1.3996,
455
+ "step": 73
456
+ },
457
+ {
458
+ "epoch": 1.99,
459
+ "learning_rate": 4.8333333333333334e-05,
460
+ "loss": 1.438,
461
+ "step": 74
462
+ },
463
+ {
464
+ "epoch": 1.99,
465
+ "eval_accuracy": 0.7249340724395889,
466
+ "eval_loss": 1.2952723503112793,
467
+ "eval_runtime": 13.2107,
468
+ "eval_samples_per_second": 37.545,
469
+ "eval_steps_per_second": 4.693,
470
+ "step": 74
471
+ },
472
+ {
473
+ "epoch": 2.03,
474
+ "learning_rate": 4.8260869565217394e-05,
475
+ "loss": 2.0645,
476
+ "step": 75
477
+ },
478
+ {
479
+ "epoch": 2.05,
480
+ "learning_rate": 4.818840579710145e-05,
481
+ "loss": 1.4472,
482
+ "step": 76
483
+ },
484
+ {
485
+ "epoch": 2.08,
486
+ "learning_rate": 4.8115942028985514e-05,
487
+ "loss": 1.3352,
488
+ "step": 77
489
+ },
490
+ {
491
+ "epoch": 2.11,
492
+ "learning_rate": 4.804347826086957e-05,
493
+ "loss": 1.4051,
494
+ "step": 78
495
+ },
496
+ {
497
+ "epoch": 2.13,
498
+ "learning_rate": 4.797101449275362e-05,
499
+ "loss": 1.4046,
500
+ "step": 79
501
+ },
502
+ {
503
+ "epoch": 2.16,
504
+ "learning_rate": 4.789855072463768e-05,
505
+ "loss": 1.39,
506
+ "step": 80
507
+ },
508
+ {
509
+ "epoch": 2.19,
510
+ "learning_rate": 4.782608695652174e-05,
511
+ "loss": 1.4223,
512
+ "step": 81
513
+ },
514
+ {
515
+ "epoch": 2.21,
516
+ "learning_rate": 4.77536231884058e-05,
517
+ "loss": 1.3412,
518
+ "step": 82
519
+ },
520
+ {
521
+ "epoch": 2.24,
522
+ "learning_rate": 4.7681159420289855e-05,
523
+ "loss": 1.3806,
524
+ "step": 83
525
+ },
526
+ {
527
+ "epoch": 2.27,
528
+ "learning_rate": 4.7608695652173916e-05,
529
+ "loss": 1.4172,
530
+ "step": 84
531
+ },
532
+ {
533
+ "epoch": 2.29,
534
+ "learning_rate": 4.7536231884057976e-05,
535
+ "loss": 1.3621,
536
+ "step": 85
537
+ },
538
+ {
539
+ "epoch": 2.32,
540
+ "learning_rate": 4.746376811594203e-05,
541
+ "loss": 1.403,
542
+ "step": 86
543
+ },
544
+ {
545
+ "epoch": 2.35,
546
+ "learning_rate": 4.739130434782609e-05,
547
+ "loss": 1.3762,
548
+ "step": 87
549
+ },
550
+ {
551
+ "epoch": 2.37,
552
+ "learning_rate": 4.731884057971015e-05,
553
+ "loss": 1.3764,
554
+ "step": 88
555
+ },
556
+ {
557
+ "epoch": 2.4,
558
+ "learning_rate": 4.72463768115942e-05,
559
+ "loss": 1.3957,
560
+ "step": 89
561
+ },
562
+ {
563
+ "epoch": 2.43,
564
+ "learning_rate": 4.7173913043478264e-05,
565
+ "loss": 1.3773,
566
+ "step": 90
567
+ },
568
+ {
569
+ "epoch": 2.45,
570
+ "learning_rate": 4.710144927536232e-05,
571
+ "loss": 1.3872,
572
+ "step": 91
573
+ },
574
+ {
575
+ "epoch": 2.48,
576
+ "learning_rate": 4.7028985507246384e-05,
577
+ "loss": 1.3579,
578
+ "step": 92
579
+ },
580
+ {
581
+ "epoch": 2.51,
582
+ "learning_rate": 4.695652173913044e-05,
583
+ "loss": 1.3718,
584
+ "step": 93
585
+ },
586
+ {
587
+ "epoch": 2.53,
588
+ "learning_rate": 4.68840579710145e-05,
589
+ "loss": 1.3576,
590
+ "step": 94
591
+ },
592
+ {
593
+ "epoch": 2.56,
594
+ "learning_rate": 4.681159420289855e-05,
595
+ "loss": 1.3508,
596
+ "step": 95
597
+ },
598
+ {
599
+ "epoch": 2.59,
600
+ "learning_rate": 4.673913043478261e-05,
601
+ "loss": 1.3476,
602
+ "step": 96
603
+ },
604
+ {
605
+ "epoch": 2.61,
606
+ "learning_rate": 4.666666666666667e-05,
607
+ "loss": 1.3608,
608
+ "step": 97
609
+ },
610
+ {
611
+ "epoch": 2.64,
612
+ "learning_rate": 4.6594202898550725e-05,
613
+ "loss": 1.3911,
614
+ "step": 98
615
+ },
616
+ {
617
+ "epoch": 2.67,
618
+ "learning_rate": 4.6521739130434785e-05,
619
+ "loss": 1.3748,
620
+ "step": 99
621
+ },
622
+ {
623
+ "epoch": 2.69,
624
+ "learning_rate": 4.6449275362318846e-05,
625
+ "loss": 1.3628,
626
+ "step": 100
627
+ },
628
+ {
629
+ "epoch": 2.72,
630
+ "learning_rate": 4.63768115942029e-05,
631
+ "loss": 1.3678,
632
+ "step": 101
633
+ },
634
+ {
635
+ "epoch": 2.75,
636
+ "learning_rate": 4.630434782608696e-05,
637
+ "loss": 1.3286,
638
+ "step": 102
639
+ },
640
+ {
641
+ "epoch": 2.77,
642
+ "learning_rate": 4.623188405797101e-05,
643
+ "loss": 1.3592,
644
+ "step": 103
645
+ },
646
+ {
647
+ "epoch": 2.8,
648
+ "learning_rate": 4.615942028985508e-05,
649
+ "loss": 1.3754,
650
+ "step": 104
651
+ },
652
+ {
653
+ "epoch": 2.83,
654
+ "learning_rate": 4.608695652173913e-05,
655
+ "loss": 1.3061,
656
+ "step": 105
657
+ },
658
+ {
659
+ "epoch": 2.85,
660
+ "learning_rate": 4.601449275362319e-05,
661
+ "loss": 1.3843,
662
+ "step": 106
663
+ },
664
+ {
665
+ "epoch": 2.88,
666
+ "learning_rate": 4.594202898550725e-05,
667
+ "loss": 1.3153,
668
+ "step": 107
669
+ },
670
+ {
671
+ "epoch": 2.91,
672
+ "learning_rate": 4.586956521739131e-05,
673
+ "loss": 1.3638,
674
+ "step": 108
675
+ },
676
+ {
677
+ "epoch": 2.93,
678
+ "learning_rate": 4.579710144927537e-05,
679
+ "loss": 1.3712,
680
+ "step": 109
681
+ },
682
+ {
683
+ "epoch": 2.96,
684
+ "learning_rate": 4.572463768115942e-05,
685
+ "loss": 1.3601,
686
+ "step": 110
687
+ },
688
+ {
689
+ "epoch": 2.99,
690
+ "learning_rate": 4.565217391304348e-05,
691
+ "loss": 1.4363,
692
+ "step": 111
693
+ },
694
+ {
695
+ "epoch": 2.99,
696
+ "eval_accuracy": 0.7276007863625347,
697
+ "eval_loss": 1.2758572101593018,
698
+ "eval_runtime": 13.2518,
699
+ "eval_samples_per_second": 37.429,
700
+ "eval_steps_per_second": 4.679,
701
+ "step": 111
702
+ },
703
+ {
704
+ "epoch": 3.03,
705
+ "learning_rate": 4.557971014492754e-05,
706
+ "loss": 2.0676,
707
+ "step": 112
708
+ },
709
+ {
710
+ "epoch": 3.05,
711
+ "learning_rate": 4.5507246376811595e-05,
712
+ "loss": 1.3703,
713
+ "step": 113
714
+ },
715
+ {
716
+ "epoch": 3.08,
717
+ "learning_rate": 4.5434782608695655e-05,
718
+ "loss": 1.3295,
719
+ "step": 114
720
+ },
721
+ {
722
+ "epoch": 3.11,
723
+ "learning_rate": 4.5362318840579715e-05,
724
+ "loss": 1.3613,
725
+ "step": 115
726
+ },
727
+ {
728
+ "epoch": 3.13,
729
+ "learning_rate": 4.528985507246377e-05,
730
+ "loss": 1.4229,
731
+ "step": 116
732
+ },
733
+ {
734
+ "epoch": 3.16,
735
+ "learning_rate": 4.521739130434783e-05,
736
+ "loss": 1.4115,
737
+ "step": 117
738
+ },
739
+ {
740
+ "epoch": 3.19,
741
+ "learning_rate": 4.514492753623188e-05,
742
+ "loss": 1.4115,
743
+ "step": 118
744
+ },
745
+ {
746
+ "epoch": 3.21,
747
+ "learning_rate": 4.507246376811595e-05,
748
+ "loss": 1.3827,
749
+ "step": 119
750
+ },
751
+ {
752
+ "epoch": 3.24,
753
+ "learning_rate": 4.5e-05,
754
+ "loss": 1.338,
755
+ "step": 120
756
+ },
757
+ {
758
+ "epoch": 3.27,
759
+ "learning_rate": 4.492753623188406e-05,
760
+ "loss": 1.434,
761
+ "step": 121
762
+ },
763
+ {
764
+ "epoch": 3.29,
765
+ "learning_rate": 4.4855072463768117e-05,
766
+ "loss": 1.3443,
767
+ "step": 122
768
+ },
769
+ {
770
+ "epoch": 3.32,
771
+ "learning_rate": 4.478260869565218e-05,
772
+ "loss": 1.3195,
773
+ "step": 123
774
+ },
775
+ {
776
+ "epoch": 3.35,
777
+ "learning_rate": 4.471014492753624e-05,
778
+ "loss": 1.3589,
779
+ "step": 124
780
+ },
781
+ {
782
+ "epoch": 3.37,
783
+ "learning_rate": 4.463768115942029e-05,
784
+ "loss": 1.3323,
785
+ "step": 125
786
+ },
787
+ {
788
+ "epoch": 3.4,
789
+ "learning_rate": 4.456521739130435e-05,
790
+ "loss": 1.3151,
791
+ "step": 126
792
+ },
793
+ {
794
+ "epoch": 3.43,
795
+ "learning_rate": 4.449275362318841e-05,
796
+ "loss": 1.3844,
797
+ "step": 127
798
+ },
799
+ {
800
+ "epoch": 3.45,
801
+ "learning_rate": 4.4420289855072464e-05,
802
+ "loss": 1.3213,
803
+ "step": 128
804
+ },
805
+ {
806
+ "epoch": 3.48,
807
+ "learning_rate": 4.4347826086956525e-05,
808
+ "loss": 1.3764,
809
+ "step": 129
810
+ },
811
+ {
812
+ "epoch": 3.51,
813
+ "learning_rate": 4.427536231884058e-05,
814
+ "loss": 1.3449,
815
+ "step": 130
816
+ },
817
+ {
818
+ "epoch": 3.53,
819
+ "learning_rate": 4.4202898550724645e-05,
820
+ "loss": 1.3704,
821
+ "step": 131
822
+ },
823
+ {
824
+ "epoch": 3.56,
825
+ "learning_rate": 4.41304347826087e-05,
826
+ "loss": 1.3712,
827
+ "step": 132
828
+ },
829
+ {
830
+ "epoch": 3.59,
831
+ "learning_rate": 4.405797101449275e-05,
832
+ "loss": 1.3191,
833
+ "step": 133
834
+ },
835
+ {
836
+ "epoch": 3.61,
837
+ "learning_rate": 4.398550724637681e-05,
838
+ "loss": 1.3795,
839
+ "step": 134
840
+ },
841
+ {
842
+ "epoch": 3.64,
843
+ "learning_rate": 4.391304347826087e-05,
844
+ "loss": 1.3025,
845
+ "step": 135
846
+ },
847
+ {
848
+ "epoch": 3.67,
849
+ "learning_rate": 4.384057971014493e-05,
850
+ "loss": 1.3285,
851
+ "step": 136
852
+ },
853
+ {
854
+ "epoch": 3.69,
855
+ "learning_rate": 4.3768115942028986e-05,
856
+ "loss": 1.3112,
857
+ "step": 137
858
+ },
859
+ {
860
+ "epoch": 3.72,
861
+ "learning_rate": 4.3695652173913046e-05,
862
+ "loss": 1.4008,
863
+ "step": 138
864
+ },
865
+ {
866
+ "epoch": 3.75,
867
+ "learning_rate": 4.362318840579711e-05,
868
+ "loss": 1.3598,
869
+ "step": 139
870
+ },
871
+ {
872
+ "epoch": 3.77,
873
+ "learning_rate": 4.355072463768116e-05,
874
+ "loss": 1.3402,
875
+ "step": 140
876
+ },
877
+ {
878
+ "epoch": 3.8,
879
+ "learning_rate": 4.347826086956522e-05,
880
+ "loss": 1.3282,
881
+ "step": 141
882
+ },
883
+ {
884
+ "epoch": 3.83,
885
+ "learning_rate": 4.3405797101449274e-05,
886
+ "loss": 1.4,
887
+ "step": 142
888
+ },
889
+ {
890
+ "epoch": 3.85,
891
+ "learning_rate": 4.3333333333333334e-05,
892
+ "loss": 1.276,
893
+ "step": 143
894
+ },
895
+ {
896
+ "epoch": 3.88,
897
+ "learning_rate": 4.3260869565217394e-05,
898
+ "loss": 1.3375,
899
+ "step": 144
900
+ },
901
+ {
902
+ "epoch": 3.91,
903
+ "learning_rate": 4.318840579710145e-05,
904
+ "loss": 1.3169,
905
+ "step": 145
906
+ },
907
+ {
908
+ "epoch": 3.93,
909
+ "learning_rate": 4.3115942028985515e-05,
910
+ "loss": 1.3835,
911
+ "step": 146
912
+ },
913
+ {
914
+ "epoch": 3.96,
915
+ "learning_rate": 4.304347826086957e-05,
916
+ "loss": 1.3977,
917
+ "step": 147
918
+ },
919
+ {
920
+ "epoch": 3.99,
921
+ "learning_rate": 4.297101449275363e-05,
922
+ "loss": 1.3391,
923
+ "step": 148
924
+ },
925
+ {
926
+ "epoch": 3.99,
927
+ "eval_accuracy": 0.7251519097222222,
928
+ "eval_loss": 1.2903902530670166,
929
+ "eval_runtime": 13.26,
930
+ "eval_samples_per_second": 37.406,
931
+ "eval_steps_per_second": 4.676,
932
+ "step": 148
933
+ },
934
+ {
935
+ "epoch": 4.03,
936
+ "learning_rate": 4.289855072463768e-05,
937
+ "loss": 2.05,
938
+ "step": 149
939
+ },
940
+ {
941
+ "epoch": 4.05,
942
+ "learning_rate": 4.282608695652174e-05,
943
+ "loss": 1.3425,
944
+ "step": 150
945
+ },
946
+ {
947
+ "epoch": 4.08,
948
+ "learning_rate": 4.27536231884058e-05,
949
+ "loss": 1.3443,
950
+ "step": 151
951
+ },
952
+ {
953
+ "epoch": 4.11,
954
+ "learning_rate": 4.2681159420289856e-05,
955
+ "loss": 1.3464,
956
+ "step": 152
957
+ },
958
+ {
959
+ "epoch": 4.13,
960
+ "learning_rate": 4.2608695652173916e-05,
961
+ "loss": 1.355,
962
+ "step": 153
963
+ },
964
+ {
965
+ "epoch": 4.16,
966
+ "learning_rate": 4.2536231884057976e-05,
967
+ "loss": 1.3285,
968
+ "step": 154
969
+ },
970
+ {
971
+ "epoch": 4.19,
972
+ "learning_rate": 4.246376811594203e-05,
973
+ "loss": 1.3246,
974
+ "step": 155
975
+ },
976
+ {
977
+ "epoch": 4.21,
978
+ "learning_rate": 4.239130434782609e-05,
979
+ "loss": 1.3213,
980
+ "step": 156
981
+ },
982
+ {
983
+ "epoch": 4.24,
984
+ "learning_rate": 4.2318840579710143e-05,
985
+ "loss": 1.297,
986
+ "step": 157
987
+ },
988
+ {
989
+ "epoch": 4.27,
990
+ "learning_rate": 4.224637681159421e-05,
991
+ "loss": 1.3569,
992
+ "step": 158
993
+ },
994
+ {
995
+ "epoch": 4.29,
996
+ "learning_rate": 4.2173913043478264e-05,
997
+ "loss": 1.3392,
998
+ "step": 159
999
+ },
1000
+ {
1001
+ "epoch": 4.32,
1002
+ "learning_rate": 4.210144927536232e-05,
1003
+ "loss": 1.2817,
1004
+ "step": 160
1005
+ },
1006
+ {
1007
+ "epoch": 4.35,
1008
+ "learning_rate": 4.202898550724638e-05,
1009
+ "loss": 1.3187,
1010
+ "step": 161
1011
+ },
1012
+ {
1013
+ "epoch": 4.37,
1014
+ "learning_rate": 4.195652173913044e-05,
1015
+ "loss": 1.3094,
1016
+ "step": 162
1017
+ },
1018
+ {
1019
+ "epoch": 4.4,
1020
+ "learning_rate": 4.18840579710145e-05,
1021
+ "loss": 1.4001,
1022
+ "step": 163
1023
+ },
1024
+ {
1025
+ "epoch": 4.43,
1026
+ "learning_rate": 4.181159420289855e-05,
1027
+ "loss": 1.3204,
1028
+ "step": 164
1029
+ },
1030
+ {
1031
+ "epoch": 4.45,
1032
+ "learning_rate": 4.1739130434782605e-05,
1033
+ "loss": 1.3482,
1034
+ "step": 165
1035
+ },
1036
+ {
1037
+ "epoch": 4.48,
1038
+ "learning_rate": 4.166666666666667e-05,
1039
+ "loss": 1.3674,
1040
+ "step": 166
1041
+ },
1042
+ {
1043
+ "epoch": 4.51,
1044
+ "learning_rate": 4.1594202898550726e-05,
1045
+ "loss": 1.3334,
1046
+ "step": 167
1047
+ },
1048
+ {
1049
+ "epoch": 4.53,
1050
+ "learning_rate": 4.1521739130434786e-05,
1051
+ "loss": 1.3333,
1052
+ "step": 168
1053
+ },
1054
+ {
1055
+ "epoch": 4.56,
1056
+ "learning_rate": 4.144927536231884e-05,
1057
+ "loss": 1.3723,
1058
+ "step": 169
1059
+ },
1060
+ {
1061
+ "epoch": 4.59,
1062
+ "learning_rate": 4.13768115942029e-05,
1063
+ "loss": 1.3502,
1064
+ "step": 170
1065
+ },
1066
+ {
1067
+ "epoch": 4.61,
1068
+ "learning_rate": 4.130434782608696e-05,
1069
+ "loss": 1.3256,
1070
+ "step": 171
1071
+ },
1072
+ {
1073
+ "epoch": 4.64,
1074
+ "learning_rate": 4.123188405797101e-05,
1075
+ "loss": 1.4001,
1076
+ "step": 172
1077
+ },
1078
+ {
1079
+ "epoch": 4.67,
1080
+ "learning_rate": 4.115942028985507e-05,
1081
+ "loss": 1.3288,
1082
+ "step": 173
1083
+ },
1084
+ {
1085
+ "epoch": 4.69,
1086
+ "learning_rate": 4.1086956521739134e-05,
1087
+ "loss": 1.31,
1088
+ "step": 174
1089
+ },
1090
+ {
1091
+ "epoch": 4.72,
1092
+ "learning_rate": 4.101449275362319e-05,
1093
+ "loss": 1.3456,
1094
+ "step": 175
1095
+ },
1096
+ {
1097
+ "epoch": 4.75,
1098
+ "learning_rate": 4.094202898550725e-05,
1099
+ "loss": 1.3317,
1100
+ "step": 176
1101
+ },
1102
+ {
1103
+ "epoch": 4.77,
1104
+ "learning_rate": 4.086956521739131e-05,
1105
+ "loss": 1.3362,
1106
+ "step": 177
1107
+ },
1108
+ {
1109
+ "epoch": 4.8,
1110
+ "learning_rate": 4.079710144927537e-05,
1111
+ "loss": 1.3092,
1112
+ "step": 178
1113
+ },
1114
+ {
1115
+ "epoch": 4.83,
1116
+ "learning_rate": 4.072463768115942e-05,
1117
+ "loss": 1.2915,
1118
+ "step": 179
1119
+ },
1120
+ {
1121
+ "epoch": 4.85,
1122
+ "learning_rate": 4.065217391304348e-05,
1123
+ "loss": 1.3801,
1124
+ "step": 180
1125
+ },
1126
+ {
1127
+ "epoch": 4.88,
1128
+ "learning_rate": 4.057971014492754e-05,
1129
+ "loss": 1.2969,
1130
+ "step": 181
1131
+ },
1132
+ {
1133
+ "epoch": 4.91,
1134
+ "learning_rate": 4.0507246376811595e-05,
1135
+ "loss": 1.3184,
1136
+ "step": 182
1137
+ },
1138
+ {
1139
+ "epoch": 4.93,
1140
+ "learning_rate": 4.0434782608695655e-05,
1141
+ "loss": 1.2744,
1142
+ "step": 183
1143
+ },
1144
+ {
1145
+ "epoch": 4.96,
1146
+ "learning_rate": 4.036231884057971e-05,
1147
+ "loss": 1.3823,
1148
+ "step": 184
1149
+ },
1150
+ {
1151
+ "epoch": 4.99,
1152
+ "learning_rate": 4.028985507246377e-05,
1153
+ "loss": 1.3741,
1154
+ "step": 185
1155
+ },
1156
+ {
1157
+ "epoch": 4.99,
1158
+ "eval_accuracy": 0.7290188004551119,
1159
+ "eval_loss": 1.2620676755905151,
1160
+ "eval_runtime": 13.2557,
1161
+ "eval_samples_per_second": 37.418,
1162
+ "eval_steps_per_second": 4.677,
1163
+ "step": 185
1164
+ },
1165
+ {
1166
+ "epoch": 5.03,
1167
+ "learning_rate": 4.021739130434783e-05,
1168
+ "loss": 1.9576,
1169
+ "step": 186
1170
+ },
1171
+ {
1172
+ "epoch": 5.05,
1173
+ "learning_rate": 4.014492753623188e-05,
1174
+ "loss": 1.4131,
1175
+ "step": 187
1176
+ },
1177
+ {
1178
+ "epoch": 5.08,
1179
+ "learning_rate": 4.007246376811594e-05,
1180
+ "loss": 1.308,
1181
+ "step": 188
1182
+ },
1183
+ {
1184
+ "epoch": 5.11,
1185
+ "learning_rate": 4e-05,
1186
+ "loss": 1.3495,
1187
+ "step": 189
1188
+ },
1189
+ {
1190
+ "epoch": 5.13,
1191
+ "learning_rate": 3.9927536231884064e-05,
1192
+ "loss": 1.2944,
1193
+ "step": 190
1194
+ },
1195
+ {
1196
+ "epoch": 5.16,
1197
+ "learning_rate": 3.985507246376812e-05,
1198
+ "loss": 1.3534,
1199
+ "step": 191
1200
+ },
1201
+ {
1202
+ "epoch": 5.19,
1203
+ "learning_rate": 3.978260869565217e-05,
1204
+ "loss": 1.3448,
1205
+ "step": 192
1206
+ },
1207
+ {
1208
+ "epoch": 5.21,
1209
+ "learning_rate": 3.971014492753624e-05,
1210
+ "loss": 1.3493,
1211
+ "step": 193
1212
+ },
1213
+ {
1214
+ "epoch": 5.24,
1215
+ "learning_rate": 3.963768115942029e-05,
1216
+ "loss": 1.3033,
1217
+ "step": 194
1218
+ },
1219
+ {
1220
+ "epoch": 5.27,
1221
+ "learning_rate": 3.956521739130435e-05,
1222
+ "loss": 1.3327,
1223
+ "step": 195
1224
+ },
1225
+ {
1226
+ "epoch": 5.29,
1227
+ "learning_rate": 3.9492753623188405e-05,
1228
+ "loss": 1.3037,
1229
+ "step": 196
1230
+ },
1231
+ {
1232
+ "epoch": 5.32,
1233
+ "learning_rate": 3.9420289855072465e-05,
1234
+ "loss": 1.2676,
1235
+ "step": 197
1236
+ },
1237
+ {
1238
+ "epoch": 5.35,
1239
+ "learning_rate": 3.9347826086956525e-05,
1240
+ "loss": 1.277,
1241
+ "step": 198
1242
+ },
1243
+ {
1244
+ "epoch": 5.37,
1245
+ "learning_rate": 3.927536231884058e-05,
1246
+ "loss": 1.306,
1247
+ "step": 199
1248
+ },
1249
+ {
1250
+ "epoch": 5.4,
1251
+ "learning_rate": 3.920289855072464e-05,
1252
+ "loss": 1.3317,
1253
+ "step": 200
1254
+ },
1255
+ {
1256
+ "epoch": 5.43,
1257
+ "learning_rate": 3.91304347826087e-05,
1258
+ "loss": 1.2872,
1259
+ "step": 201
1260
+ },
1261
+ {
1262
+ "epoch": 5.45,
1263
+ "learning_rate": 3.905797101449275e-05,
1264
+ "loss": 1.2878,
1265
+ "step": 202
1266
+ },
1267
+ {
1268
+ "epoch": 5.48,
1269
+ "learning_rate": 3.898550724637681e-05,
1270
+ "loss": 1.3232,
1271
+ "step": 203
1272
+ },
1273
+ {
1274
+ "epoch": 5.51,
1275
+ "learning_rate": 3.8913043478260866e-05,
1276
+ "loss": 1.2771,
1277
+ "step": 204
1278
+ },
1279
+ {
1280
+ "epoch": 5.53,
1281
+ "learning_rate": 3.884057971014493e-05,
1282
+ "loss": 1.3035,
1283
+ "step": 205
1284
+ },
1285
+ {
1286
+ "epoch": 5.56,
1287
+ "learning_rate": 3.876811594202899e-05,
1288
+ "loss": 1.3175,
1289
+ "step": 206
1290
+ },
1291
+ {
1292
+ "epoch": 5.59,
1293
+ "learning_rate": 3.869565217391305e-05,
1294
+ "loss": 1.2661,
1295
+ "step": 207
1296
+ },
1297
+ {
1298
+ "epoch": 5.61,
1299
+ "learning_rate": 3.862318840579711e-05,
1300
+ "loss": 1.3559,
1301
+ "step": 208
1302
+ },
1303
+ {
1304
+ "epoch": 5.64,
1305
+ "learning_rate": 3.855072463768116e-05,
1306
+ "loss": 1.3458,
1307
+ "step": 209
1308
+ },
1309
+ {
1310
+ "epoch": 5.67,
1311
+ "learning_rate": 3.847826086956522e-05,
1312
+ "loss": 1.2947,
1313
+ "step": 210
1314
+ },
1315
+ {
1316
+ "epoch": 5.69,
1317
+ "learning_rate": 3.8405797101449274e-05,
1318
+ "loss": 1.2938,
1319
+ "step": 211
1320
+ },
1321
+ {
1322
+ "epoch": 5.72,
1323
+ "learning_rate": 3.8333333333333334e-05,
1324
+ "loss": 1.3266,
1325
+ "step": 212
1326
+ },
1327
+ {
1328
+ "epoch": 5.75,
1329
+ "learning_rate": 3.8260869565217395e-05,
1330
+ "loss": 1.2853,
1331
+ "step": 213
1332
+ },
1333
+ {
1334
+ "epoch": 5.77,
1335
+ "learning_rate": 3.818840579710145e-05,
1336
+ "loss": 1.3009,
1337
+ "step": 214
1338
+ },
1339
+ {
1340
+ "epoch": 5.8,
1341
+ "learning_rate": 3.811594202898551e-05,
1342
+ "loss": 1.3023,
1343
+ "step": 215
1344
+ },
1345
+ {
1346
+ "epoch": 5.83,
1347
+ "learning_rate": 3.804347826086957e-05,
1348
+ "loss": 1.3105,
1349
+ "step": 216
1350
+ },
1351
+ {
1352
+ "epoch": 5.85,
1353
+ "learning_rate": 3.797101449275363e-05,
1354
+ "loss": 1.343,
1355
+ "step": 217
1356
+ },
1357
+ {
1358
+ "epoch": 5.88,
1359
+ "learning_rate": 3.789855072463768e-05,
1360
+ "loss": 1.2957,
1361
+ "step": 218
1362
+ },
1363
+ {
1364
+ "epoch": 5.91,
1365
+ "learning_rate": 3.7826086956521736e-05,
1366
+ "loss": 1.307,
1367
+ "step": 219
1368
+ },
1369
+ {
1370
+ "epoch": 5.93,
1371
+ "learning_rate": 3.77536231884058e-05,
1372
+ "loss": 1.3477,
1373
+ "step": 220
1374
+ },
1375
+ {
1376
+ "epoch": 5.96,
1377
+ "learning_rate": 3.7681159420289856e-05,
1378
+ "loss": 1.347,
1379
+ "step": 221
1380
+ },
1381
+ {
1382
+ "epoch": 5.99,
1383
+ "learning_rate": 3.7608695652173917e-05,
1384
+ "loss": 1.2771,
1385
+ "step": 222
1386
+ },
1387
+ {
1388
+ "epoch": 5.99,
1389
+ "eval_accuracy": 0.7353204415394212,
1390
+ "eval_loss": 1.2311729192733765,
1391
+ "eval_runtime": 13.2727,
1392
+ "eval_samples_per_second": 37.37,
1393
+ "eval_steps_per_second": 4.671,
1394
+ "step": 222
1395
+ },
1396
+ {
1397
+ "epoch": 6.03,
1398
+ "learning_rate": 3.753623188405797e-05,
1399
+ "loss": 1.9712,
1400
+ "step": 223
1401
+ },
1402
+ {
1403
+ "epoch": 6.05,
1404
+ "learning_rate": 3.746376811594203e-05,
1405
+ "loss": 1.2942,
1406
+ "step": 224
1407
+ },
1408
+ {
1409
+ "epoch": 6.08,
1410
+ "learning_rate": 3.739130434782609e-05,
1411
+ "loss": 1.3117,
1412
+ "step": 225
1413
+ },
1414
+ {
1415
+ "epoch": 6.11,
1416
+ "learning_rate": 3.7318840579710144e-05,
1417
+ "loss": 1.3395,
1418
+ "step": 226
1419
+ },
1420
+ {
1421
+ "epoch": 6.13,
1422
+ "learning_rate": 3.7246376811594204e-05,
1423
+ "loss": 1.3447,
1424
+ "step": 227
1425
+ },
1426
+ {
1427
+ "epoch": 6.16,
1428
+ "learning_rate": 3.7173913043478264e-05,
1429
+ "loss": 1.2895,
1430
+ "step": 228
1431
+ },
1432
+ {
1433
+ "epoch": 6.19,
1434
+ "learning_rate": 3.710144927536232e-05,
1435
+ "loss": 1.2623,
1436
+ "step": 229
1437
+ },
1438
+ {
1439
+ "epoch": 6.21,
1440
+ "learning_rate": 3.702898550724638e-05,
1441
+ "loss": 1.2987,
1442
+ "step": 230
1443
+ },
1444
+ {
1445
+ "epoch": 6.24,
1446
+ "learning_rate": 3.695652173913043e-05,
1447
+ "loss": 1.3021,
1448
+ "step": 231
1449
+ },
1450
+ {
1451
+ "epoch": 6.27,
1452
+ "learning_rate": 3.68840579710145e-05,
1453
+ "loss": 1.3305,
1454
+ "step": 232
1455
+ },
1456
+ {
1457
+ "epoch": 6.29,
1458
+ "learning_rate": 3.681159420289855e-05,
1459
+ "loss": 1.2959,
1460
+ "step": 233
1461
+ },
1462
+ {
1463
+ "epoch": 6.32,
1464
+ "learning_rate": 3.673913043478261e-05,
1465
+ "loss": 1.2879,
1466
+ "step": 234
1467
+ },
1468
+ {
1469
+ "epoch": 6.35,
1470
+ "learning_rate": 3.6666666666666666e-05,
1471
+ "loss": 1.348,
1472
+ "step": 235
1473
+ },
1474
+ {
1475
+ "epoch": 6.37,
1476
+ "learning_rate": 3.6594202898550726e-05,
1477
+ "loss": 1.3434,
1478
+ "step": 236
1479
+ },
1480
+ {
1481
+ "epoch": 6.4,
1482
+ "learning_rate": 3.6521739130434786e-05,
1483
+ "loss": 1.3034,
1484
+ "step": 237
1485
+ },
1486
+ {
1487
+ "epoch": 6.43,
1488
+ "learning_rate": 3.644927536231884e-05,
1489
+ "loss": 1.3372,
1490
+ "step": 238
1491
+ },
1492
+ {
1493
+ "epoch": 6.45,
1494
+ "learning_rate": 3.63768115942029e-05,
1495
+ "loss": 1.2488,
1496
+ "step": 239
1497
+ },
1498
+ {
1499
+ "epoch": 6.48,
1500
+ "learning_rate": 3.630434782608696e-05,
1501
+ "loss": 1.2779,
1502
+ "step": 240
1503
+ },
1504
+ {
1505
+ "epoch": 6.51,
1506
+ "learning_rate": 3.6231884057971014e-05,
1507
+ "loss": 1.3044,
1508
+ "step": 241
1509
+ },
1510
+ {
1511
+ "epoch": 6.53,
1512
+ "learning_rate": 3.6159420289855074e-05,
1513
+ "loss": 1.3089,
1514
+ "step": 242
1515
+ },
1516
+ {
1517
+ "epoch": 6.56,
1518
+ "learning_rate": 3.6086956521739134e-05,
1519
+ "loss": 1.3803,
1520
+ "step": 243
1521
+ },
1522
+ {
1523
+ "epoch": 6.59,
1524
+ "learning_rate": 3.6014492753623194e-05,
1525
+ "loss": 1.3559,
1526
+ "step": 244
1527
+ },
1528
+ {
1529
+ "epoch": 6.61,
1530
+ "learning_rate": 3.594202898550725e-05,
1531
+ "loss": 1.3486,
1532
+ "step": 245
1533
+ },
1534
+ {
1535
+ "epoch": 6.64,
1536
+ "learning_rate": 3.58695652173913e-05,
1537
+ "loss": 1.2817,
1538
+ "step": 246
1539
+ },
1540
+ {
1541
+ "epoch": 6.67,
1542
+ "learning_rate": 3.579710144927537e-05,
1543
+ "loss": 1.2304,
1544
+ "step": 247
1545
+ },
1546
+ {
1547
+ "epoch": 6.69,
1548
+ "learning_rate": 3.572463768115942e-05,
1549
+ "loss": 1.3637,
1550
+ "step": 248
1551
+ },
1552
+ {
1553
+ "epoch": 6.72,
1554
+ "learning_rate": 3.565217391304348e-05,
1555
+ "loss": 1.3177,
1556
+ "step": 249
1557
+ },
1558
+ {
1559
+ "epoch": 6.75,
1560
+ "learning_rate": 3.5579710144927535e-05,
1561
+ "loss": 1.3273,
1562
+ "step": 250
1563
+ },
1564
+ {
1565
+ "epoch": 6.77,
1566
+ "learning_rate": 3.5507246376811596e-05,
1567
+ "loss": 1.3522,
1568
+ "step": 251
1569
+ },
1570
+ {
1571
+ "epoch": 6.8,
1572
+ "learning_rate": 3.5434782608695656e-05,
1573
+ "loss": 1.3314,
1574
+ "step": 252
1575
+ },
1576
+ {
1577
+ "epoch": 6.83,
1578
+ "learning_rate": 3.536231884057971e-05,
1579
+ "loss": 1.2812,
1580
+ "step": 253
1581
+ },
1582
+ {
1583
+ "epoch": 6.85,
1584
+ "learning_rate": 3.528985507246377e-05,
1585
+ "loss": 1.2961,
1586
+ "step": 254
1587
+ },
1588
+ {
1589
+ "epoch": 6.88,
1590
+ "learning_rate": 3.521739130434783e-05,
1591
+ "loss": 1.358,
1592
+ "step": 255
1593
+ },
1594
+ {
1595
+ "epoch": 6.91,
1596
+ "learning_rate": 3.514492753623188e-05,
1597
+ "loss": 1.2733,
1598
+ "step": 256
1599
+ },
1600
+ {
1601
+ "epoch": 6.93,
1602
+ "learning_rate": 3.5072463768115943e-05,
1603
+ "loss": 1.2509,
1604
+ "step": 257
1605
+ },
1606
+ {
1607
+ "epoch": 6.96,
1608
+ "learning_rate": 3.5e-05,
1609
+ "loss": 1.2686,
1610
+ "step": 258
1611
+ },
1612
+ {
1613
+ "epoch": 6.99,
1614
+ "learning_rate": 3.4927536231884064e-05,
1615
+ "loss": 1.287,
1616
+ "step": 259
1617
+ },
1618
+ {
1619
+ "epoch": 6.99,
1620
+ "eval_accuracy": 0.7288652772101893,
1621
+ "eval_loss": 1.2542475461959839,
1622
+ "eval_runtime": 13.3151,
1623
+ "eval_samples_per_second": 37.251,
1624
+ "eval_steps_per_second": 4.656,
1625
+ "step": 259
1626
+ },
1627
+ {
1628
+ "epoch": 7.03,
1629
+ "learning_rate": 3.485507246376812e-05,
1630
+ "loss": 2.0198,
1631
+ "step": 260
1632
+ },
1633
+ {
1634
+ "epoch": 7.05,
1635
+ "learning_rate": 3.478260869565218e-05,
1636
+ "loss": 1.3153,
1637
+ "step": 261
1638
+ },
1639
+ {
1640
+ "epoch": 7.08,
1641
+ "learning_rate": 3.471014492753623e-05,
1642
+ "loss": 1.2692,
1643
+ "step": 262
1644
+ },
1645
+ {
1646
+ "epoch": 7.11,
1647
+ "learning_rate": 3.463768115942029e-05,
1648
+ "loss": 1.327,
1649
+ "step": 263
1650
+ },
1651
+ {
1652
+ "epoch": 7.13,
1653
+ "learning_rate": 3.456521739130435e-05,
1654
+ "loss": 1.2767,
1655
+ "step": 264
1656
+ },
1657
+ {
1658
+ "epoch": 7.16,
1659
+ "learning_rate": 3.4492753623188405e-05,
1660
+ "loss": 1.3097,
1661
+ "step": 265
1662
+ },
1663
+ {
1664
+ "epoch": 7.19,
1665
+ "learning_rate": 3.4420289855072465e-05,
1666
+ "loss": 1.2951,
1667
+ "step": 266
1668
+ },
1669
+ {
1670
+ "epoch": 7.21,
1671
+ "learning_rate": 3.4347826086956526e-05,
1672
+ "loss": 1.2827,
1673
+ "step": 267
1674
+ },
1675
+ {
1676
+ "epoch": 7.24,
1677
+ "learning_rate": 3.427536231884058e-05,
1678
+ "loss": 1.2769,
1679
+ "step": 268
1680
+ },
1681
+ {
1682
+ "epoch": 7.27,
1683
+ "learning_rate": 3.420289855072464e-05,
1684
+ "loss": 1.3052,
1685
+ "step": 269
1686
+ },
1687
+ {
1688
+ "epoch": 7.29,
1689
+ "learning_rate": 3.413043478260869e-05,
1690
+ "loss": 1.3424,
1691
+ "step": 270
1692
+ },
1693
+ {
1694
+ "epoch": 7.32,
1695
+ "learning_rate": 3.405797101449276e-05,
1696
+ "loss": 1.3514,
1697
+ "step": 271
1698
+ },
1699
+ {
1700
+ "epoch": 7.35,
1701
+ "learning_rate": 3.398550724637681e-05,
1702
+ "loss": 1.3662,
1703
+ "step": 272
1704
+ },
1705
+ {
1706
+ "epoch": 7.37,
1707
+ "learning_rate": 3.3913043478260867e-05,
1708
+ "loss": 1.3694,
1709
+ "step": 273
1710
+ },
1711
+ {
1712
+ "epoch": 7.4,
1713
+ "learning_rate": 3.3840579710144934e-05,
1714
+ "loss": 1.2747,
1715
+ "step": 274
1716
+ },
1717
+ {
1718
+ "epoch": 7.43,
1719
+ "learning_rate": 3.376811594202899e-05,
1720
+ "loss": 1.3502,
1721
+ "step": 275
1722
+ },
1723
+ {
1724
+ "epoch": 7.45,
1725
+ "learning_rate": 3.369565217391305e-05,
1726
+ "loss": 1.2687,
1727
+ "step": 276
1728
+ },
1729
+ {
1730
+ "epoch": 7.48,
1731
+ "learning_rate": 3.36231884057971e-05,
1732
+ "loss": 1.2702,
1733
+ "step": 277
1734
+ },
1735
+ {
1736
+ "epoch": 7.51,
1737
+ "learning_rate": 3.355072463768116e-05,
1738
+ "loss": 1.2983,
1739
+ "step": 278
1740
+ },
1741
+ {
1742
+ "epoch": 7.53,
1743
+ "learning_rate": 3.347826086956522e-05,
1744
+ "loss": 1.3027,
1745
+ "step": 279
1746
+ },
1747
+ {
1748
+ "epoch": 7.56,
1749
+ "learning_rate": 3.3405797101449275e-05,
1750
+ "loss": 1.2854,
1751
+ "step": 280
1752
+ },
1753
+ {
1754
+ "epoch": 7.59,
1755
+ "learning_rate": 3.3333333333333335e-05,
1756
+ "loss": 1.2679,
1757
+ "step": 281
1758
+ },
1759
+ {
1760
+ "epoch": 7.61,
1761
+ "learning_rate": 3.3260869565217395e-05,
1762
+ "loss": 1.379,
1763
+ "step": 282
1764
+ },
1765
+ {
1766
+ "epoch": 7.64,
1767
+ "learning_rate": 3.318840579710145e-05,
1768
+ "loss": 1.3008,
1769
+ "step": 283
1770
+ },
1771
+ {
1772
+ "epoch": 7.67,
1773
+ "learning_rate": 3.311594202898551e-05,
1774
+ "loss": 1.343,
1775
+ "step": 284
1776
+ },
1777
+ {
1778
+ "epoch": 7.69,
1779
+ "learning_rate": 3.304347826086956e-05,
1780
+ "loss": 1.266,
1781
+ "step": 285
1782
+ },
1783
+ {
1784
+ "epoch": 7.72,
1785
+ "learning_rate": 3.297101449275363e-05,
1786
+ "loss": 1.3153,
1787
+ "step": 286
1788
+ },
1789
+ {
1790
+ "epoch": 7.75,
1791
+ "learning_rate": 3.289855072463768e-05,
1792
+ "loss": 1.2899,
1793
+ "step": 287
1794
+ },
1795
+ {
1796
+ "epoch": 7.77,
1797
+ "learning_rate": 3.282608695652174e-05,
1798
+ "loss": 1.2609,
1799
+ "step": 288
1800
+ },
1801
+ {
1802
+ "epoch": 7.8,
1803
+ "learning_rate": 3.2753623188405796e-05,
1804
+ "loss": 1.3002,
1805
+ "step": 289
1806
+ },
1807
+ {
1808
+ "epoch": 7.83,
1809
+ "learning_rate": 3.268115942028986e-05,
1810
+ "loss": 1.2858,
1811
+ "step": 290
1812
+ },
1813
+ {
1814
+ "epoch": 7.85,
1815
+ "learning_rate": 3.260869565217392e-05,
1816
+ "loss": 1.3049,
1817
+ "step": 291
1818
+ },
1819
+ {
1820
+ "epoch": 7.88,
1821
+ "learning_rate": 3.253623188405797e-05,
1822
+ "loss": 1.2891,
1823
+ "step": 292
1824
+ },
1825
+ {
1826
+ "epoch": 7.91,
1827
+ "learning_rate": 3.246376811594203e-05,
1828
+ "loss": 1.209,
1829
+ "step": 293
1830
+ },
1831
+ {
1832
+ "epoch": 7.93,
1833
+ "learning_rate": 3.239130434782609e-05,
1834
+ "loss": 1.2867,
1835
+ "step": 294
1836
+ },
1837
+ {
1838
+ "epoch": 7.96,
1839
+ "learning_rate": 3.2318840579710144e-05,
1840
+ "loss": 1.2934,
1841
+ "step": 295
1842
+ },
1843
+ {
1844
+ "epoch": 7.99,
1845
+ "learning_rate": 3.2246376811594205e-05,
1846
+ "loss": 1.29,
1847
+ "step": 296
1848
+ },
1849
+ {
1850
+ "epoch": 7.99,
1851
+ "eval_accuracy": 0.7345346311640254,
1852
+ "eval_loss": 1.2290480136871338,
1853
+ "eval_runtime": 13.2843,
1854
+ "eval_samples_per_second": 37.337,
1855
+ "eval_steps_per_second": 4.667,
1856
+ "step": 296
1857
+ },
1858
+ {
1859
+ "epoch": 8.03,
1860
+ "learning_rate": 3.217391304347826e-05,
1861
+ "loss": 1.9383,
1862
+ "step": 297
1863
+ },
1864
+ {
1865
+ "epoch": 8.05,
1866
+ "learning_rate": 3.2101449275362325e-05,
1867
+ "loss": 1.2926,
1868
+ "step": 298
1869
+ },
1870
+ {
1871
+ "epoch": 8.08,
1872
+ "learning_rate": 3.202898550724638e-05,
1873
+ "loss": 1.3316,
1874
+ "step": 299
1875
+ },
1876
+ {
1877
+ "epoch": 8.11,
1878
+ "learning_rate": 3.195652173913043e-05,
1879
+ "loss": 1.2614,
1880
+ "step": 300
1881
+ },
1882
+ {
1883
+ "epoch": 8.13,
1884
+ "learning_rate": 3.188405797101449e-05,
1885
+ "loss": 1.316,
1886
+ "step": 301
1887
+ },
1888
+ {
1889
+ "epoch": 8.16,
1890
+ "learning_rate": 3.181159420289855e-05,
1891
+ "loss": 1.2777,
1892
+ "step": 302
1893
+ },
1894
+ {
1895
+ "epoch": 8.19,
1896
+ "learning_rate": 3.173913043478261e-05,
1897
+ "loss": 1.3079,
1898
+ "step": 303
1899
+ },
1900
+ {
1901
+ "epoch": 8.21,
1902
+ "learning_rate": 3.1666666666666666e-05,
1903
+ "loss": 1.3451,
1904
+ "step": 304
1905
+ },
1906
+ {
1907
+ "epoch": 8.24,
1908
+ "learning_rate": 3.1594202898550726e-05,
1909
+ "loss": 1.2871,
1910
+ "step": 305
1911
+ },
1912
+ {
1913
+ "epoch": 8.27,
1914
+ "learning_rate": 3.152173913043479e-05,
1915
+ "loss": 1.3431,
1916
+ "step": 306
1917
+ },
1918
+ {
1919
+ "epoch": 8.29,
1920
+ "learning_rate": 3.144927536231884e-05,
1921
+ "loss": 1.2507,
1922
+ "step": 307
1923
+ },
1924
+ {
1925
+ "epoch": 8.32,
1926
+ "learning_rate": 3.13768115942029e-05,
1927
+ "loss": 1.292,
1928
+ "step": 308
1929
+ },
1930
+ {
1931
+ "epoch": 8.35,
1932
+ "learning_rate": 3.130434782608696e-05,
1933
+ "loss": 1.2764,
1934
+ "step": 309
1935
+ },
1936
+ {
1937
+ "epoch": 8.37,
1938
+ "learning_rate": 3.1231884057971014e-05,
1939
+ "loss": 1.3385,
1940
+ "step": 310
1941
+ },
1942
+ {
1943
+ "epoch": 8.4,
1944
+ "learning_rate": 3.1159420289855074e-05,
1945
+ "loss": 1.3285,
1946
+ "step": 311
1947
+ },
1948
+ {
1949
+ "epoch": 8.43,
1950
+ "learning_rate": 3.108695652173913e-05,
1951
+ "loss": 1.2385,
1952
+ "step": 312
1953
+ },
1954
+ {
1955
+ "epoch": 8.45,
1956
+ "learning_rate": 3.1014492753623195e-05,
1957
+ "loss": 1.2528,
1958
+ "step": 313
1959
+ },
1960
+ {
1961
+ "epoch": 8.48,
1962
+ "learning_rate": 3.094202898550725e-05,
1963
+ "loss": 1.3026,
1964
+ "step": 314
1965
+ },
1966
+ {
1967
+ "epoch": 8.51,
1968
+ "learning_rate": 3.086956521739131e-05,
1969
+ "loss": 1.3108,
1970
+ "step": 315
1971
+ },
1972
+ {
1973
+ "epoch": 8.53,
1974
+ "learning_rate": 3.079710144927536e-05,
1975
+ "loss": 1.2307,
1976
+ "step": 316
1977
+ },
1978
+ {
1979
+ "epoch": 8.56,
1980
+ "learning_rate": 3.072463768115942e-05,
1981
+ "loss": 1.2586,
1982
+ "step": 317
1983
+ },
1984
+ {
1985
+ "epoch": 8.59,
1986
+ "learning_rate": 3.065217391304348e-05,
1987
+ "loss": 1.3263,
1988
+ "step": 318
1989
+ },
1990
+ {
1991
+ "epoch": 8.61,
1992
+ "learning_rate": 3.0579710144927536e-05,
1993
+ "loss": 1.2522,
1994
+ "step": 319
1995
+ },
1996
+ {
1997
+ "epoch": 8.64,
1998
+ "learning_rate": 3.0507246376811593e-05,
1999
+ "loss": 1.2695,
2000
+ "step": 320
2001
+ },
2002
+ {
2003
+ "epoch": 8.67,
2004
+ "learning_rate": 3.0434782608695656e-05,
2005
+ "loss": 1.2588,
2006
+ "step": 321
2007
+ },
2008
+ {
2009
+ "epoch": 8.69,
2010
+ "learning_rate": 3.0362318840579713e-05,
2011
+ "loss": 1.2759,
2012
+ "step": 322
2013
+ },
2014
+ {
2015
+ "epoch": 8.72,
2016
+ "learning_rate": 3.028985507246377e-05,
2017
+ "loss": 1.3229,
2018
+ "step": 323
2019
+ },
2020
+ {
2021
+ "epoch": 8.75,
2022
+ "learning_rate": 3.0217391304347827e-05,
2023
+ "loss": 1.2949,
2024
+ "step": 324
2025
+ },
2026
+ {
2027
+ "epoch": 8.77,
2028
+ "learning_rate": 3.0144927536231887e-05,
2029
+ "loss": 1.3185,
2030
+ "step": 325
2031
+ },
2032
+ {
2033
+ "epoch": 8.8,
2034
+ "learning_rate": 3.0072463768115944e-05,
2035
+ "loss": 1.2303,
2036
+ "step": 326
2037
+ },
2038
+ {
2039
+ "epoch": 8.83,
2040
+ "learning_rate": 3e-05,
2041
+ "loss": 1.2309,
2042
+ "step": 327
2043
+ },
2044
+ {
2045
+ "epoch": 8.85,
2046
+ "learning_rate": 2.9927536231884058e-05,
2047
+ "loss": 1.2804,
2048
+ "step": 328
2049
+ },
2050
+ {
2051
+ "epoch": 8.88,
2052
+ "learning_rate": 2.9855072463768118e-05,
2053
+ "loss": 1.2964,
2054
+ "step": 329
2055
+ },
2056
+ {
2057
+ "epoch": 8.91,
2058
+ "learning_rate": 2.9782608695652175e-05,
2059
+ "loss": 1.2547,
2060
+ "step": 330
2061
+ },
2062
+ {
2063
+ "epoch": 8.93,
2064
+ "learning_rate": 2.971014492753623e-05,
2065
+ "loss": 1.2711,
2066
+ "step": 331
2067
+ },
2068
+ {
2069
+ "epoch": 8.96,
2070
+ "learning_rate": 2.963768115942029e-05,
2071
+ "loss": 1.2578,
2072
+ "step": 332
2073
+ },
2074
+ {
2075
+ "epoch": 8.99,
2076
+ "learning_rate": 2.9565217391304352e-05,
2077
+ "loss": 1.2948,
2078
+ "step": 333
2079
+ },
2080
+ {
2081
+ "epoch": 8.99,
2082
+ "eval_accuracy": 0.7286482668694991,
2083
+ "eval_loss": 1.2536934614181519,
2084
+ "eval_runtime": 13.2456,
2085
+ "eval_samples_per_second": 37.446,
2086
+ "eval_steps_per_second": 4.681,
2087
+ "step": 333
2088
+ },
2089
+ {
2090
+ "epoch": 9.03,
2091
+ "learning_rate": 2.949275362318841e-05,
2092
+ "loss": 1.9494,
2093
+ "step": 334
2094
+ },
2095
+ {
2096
+ "epoch": 9.05,
2097
+ "learning_rate": 2.9420289855072462e-05,
2098
+ "loss": 1.2457,
2099
+ "step": 335
2100
+ },
2101
+ {
2102
+ "epoch": 9.08,
2103
+ "learning_rate": 2.9347826086956526e-05,
2104
+ "loss": 1.2984,
2105
+ "step": 336
2106
+ },
2107
+ {
2108
+ "epoch": 9.11,
2109
+ "learning_rate": 2.9275362318840583e-05,
2110
+ "loss": 1.315,
2111
+ "step": 337
2112
+ },
2113
+ {
2114
+ "epoch": 9.13,
2115
+ "learning_rate": 2.920289855072464e-05,
2116
+ "loss": 1.2701,
2117
+ "step": 338
2118
+ },
2119
+ {
2120
+ "epoch": 9.16,
2121
+ "learning_rate": 2.9130434782608696e-05,
2122
+ "loss": 1.2902,
2123
+ "step": 339
2124
+ },
2125
+ {
2126
+ "epoch": 9.19,
2127
+ "learning_rate": 2.9057971014492757e-05,
2128
+ "loss": 1.3615,
2129
+ "step": 340
2130
+ },
2131
+ {
2132
+ "epoch": 9.21,
2133
+ "learning_rate": 2.8985507246376814e-05,
2134
+ "loss": 1.2761,
2135
+ "step": 341
2136
+ },
2137
+ {
2138
+ "epoch": 9.24,
2139
+ "learning_rate": 2.891304347826087e-05,
2140
+ "loss": 1.2678,
2141
+ "step": 342
2142
+ },
2143
+ {
2144
+ "epoch": 9.27,
2145
+ "learning_rate": 2.8840579710144927e-05,
2146
+ "loss": 1.2982,
2147
+ "step": 343
2148
+ },
2149
+ {
2150
+ "epoch": 9.29,
2151
+ "learning_rate": 2.8768115942028988e-05,
2152
+ "loss": 1.2626,
2153
+ "step": 344
2154
+ },
2155
+ {
2156
+ "epoch": 9.32,
2157
+ "learning_rate": 2.8695652173913044e-05,
2158
+ "loss": 1.2179,
2159
+ "step": 345
2160
+ },
2161
+ {
2162
+ "epoch": 9.35,
2163
+ "learning_rate": 2.86231884057971e-05,
2164
+ "loss": 1.2992,
2165
+ "step": 346
2166
+ },
2167
+ {
2168
+ "epoch": 9.37,
2169
+ "learning_rate": 2.8550724637681158e-05,
2170
+ "loss": 1.3423,
2171
+ "step": 347
2172
+ },
2173
+ {
2174
+ "epoch": 9.4,
2175
+ "learning_rate": 2.847826086956522e-05,
2176
+ "loss": 1.2727,
2177
+ "step": 348
2178
+ },
2179
+ {
2180
+ "epoch": 9.43,
2181
+ "learning_rate": 2.840579710144928e-05,
2182
+ "loss": 1.2516,
2183
+ "step": 349
2184
+ },
2185
+ {
2186
+ "epoch": 9.45,
2187
+ "learning_rate": 2.8333333333333335e-05,
2188
+ "loss": 1.259,
2189
+ "step": 350
2190
+ },
2191
+ {
2192
+ "epoch": 9.48,
2193
+ "learning_rate": 2.826086956521739e-05,
2194
+ "loss": 1.28,
2195
+ "step": 351
2196
+ },
2197
+ {
2198
+ "epoch": 9.51,
2199
+ "learning_rate": 2.8188405797101452e-05,
2200
+ "loss": 1.2764,
2201
+ "step": 352
2202
+ },
2203
+ {
2204
+ "epoch": 9.53,
2205
+ "learning_rate": 2.811594202898551e-05,
2206
+ "loss": 1.2954,
2207
+ "step": 353
2208
+ },
2209
+ {
2210
+ "epoch": 9.56,
2211
+ "learning_rate": 2.8043478260869566e-05,
2212
+ "loss": 1.3081,
2213
+ "step": 354
2214
+ },
2215
+ {
2216
+ "epoch": 9.59,
2217
+ "learning_rate": 2.7971014492753623e-05,
2218
+ "loss": 1.3355,
2219
+ "step": 355
2220
+ },
2221
+ {
2222
+ "epoch": 9.61,
2223
+ "learning_rate": 2.7898550724637683e-05,
2224
+ "loss": 1.3269,
2225
+ "step": 356
2226
+ },
2227
+ {
2228
+ "epoch": 9.64,
2229
+ "learning_rate": 2.782608695652174e-05,
2230
+ "loss": 1.2538,
2231
+ "step": 357
2232
+ },
2233
+ {
2234
+ "epoch": 9.67,
2235
+ "learning_rate": 2.7753623188405797e-05,
2236
+ "loss": 1.2652,
2237
+ "step": 358
2238
+ },
2239
+ {
2240
+ "epoch": 9.69,
2241
+ "learning_rate": 2.7681159420289854e-05,
2242
+ "loss": 1.2239,
2243
+ "step": 359
2244
+ },
2245
+ {
2246
+ "epoch": 9.72,
2247
+ "learning_rate": 2.7608695652173917e-05,
2248
+ "loss": 1.2764,
2249
+ "step": 360
2250
+ },
2251
+ {
2252
+ "epoch": 9.75,
2253
+ "learning_rate": 2.753623188405797e-05,
2254
+ "loss": 1.2484,
2255
+ "step": 361
2256
+ },
2257
+ {
2258
+ "epoch": 9.77,
2259
+ "learning_rate": 2.7463768115942028e-05,
2260
+ "loss": 1.3045,
2261
+ "step": 362
2262
+ },
2263
+ {
2264
+ "epoch": 9.8,
2265
+ "learning_rate": 2.7391304347826085e-05,
2266
+ "loss": 1.2679,
2267
+ "step": 363
2268
+ },
2269
+ {
2270
+ "epoch": 9.83,
2271
+ "learning_rate": 2.7318840579710148e-05,
2272
+ "loss": 1.3149,
2273
+ "step": 364
2274
+ },
2275
+ {
2276
+ "epoch": 9.85,
2277
+ "learning_rate": 2.7246376811594205e-05,
2278
+ "loss": 1.2347,
2279
+ "step": 365
2280
+ },
2281
+ {
2282
+ "epoch": 9.88,
2283
+ "learning_rate": 2.7173913043478262e-05,
2284
+ "loss": 1.2483,
2285
+ "step": 366
2286
+ },
2287
+ {
2288
+ "epoch": 9.91,
2289
+ "learning_rate": 2.7101449275362322e-05,
2290
+ "loss": 1.2693,
2291
+ "step": 367
2292
+ },
2293
+ {
2294
+ "epoch": 9.93,
2295
+ "learning_rate": 2.702898550724638e-05,
2296
+ "loss": 1.2849,
2297
+ "step": 368
2298
+ },
2299
+ {
2300
+ "epoch": 9.96,
2301
+ "learning_rate": 2.6956521739130436e-05,
2302
+ "loss": 1.2808,
2303
+ "step": 369
2304
+ },
2305
+ {
2306
+ "epoch": 9.99,
2307
+ "learning_rate": 2.6884057971014493e-05,
2308
+ "loss": 1.2741,
2309
+ "step": 370
2310
+ },
2311
+ {
2312
+ "epoch": 9.99,
2313
+ "eval_accuracy": 0.7354103508012126,
2314
+ "eval_loss": 1.2199150323867798,
2315
+ "eval_runtime": 13.2633,
2316
+ "eval_samples_per_second": 37.397,
2317
+ "eval_steps_per_second": 4.675,
2318
+ "step": 370
2319
+ },
2320
+ {
2321
+ "epoch": 10.03,
2322
+ "learning_rate": 2.6811594202898553e-05,
2323
+ "loss": 1.8651,
2324
+ "step": 371
2325
+ },
2326
+ {
2327
+ "epoch": 10.05,
2328
+ "learning_rate": 2.673913043478261e-05,
2329
+ "loss": 1.3189,
2330
+ "step": 372
2331
+ },
2332
+ {
2333
+ "epoch": 10.08,
2334
+ "learning_rate": 2.6666666666666667e-05,
2335
+ "loss": 1.2528,
2336
+ "step": 373
2337
+ },
2338
+ {
2339
+ "epoch": 10.11,
2340
+ "learning_rate": 2.6594202898550723e-05,
2341
+ "loss": 1.319,
2342
+ "step": 374
2343
+ },
2344
+ {
2345
+ "epoch": 10.13,
2346
+ "learning_rate": 2.6521739130434787e-05,
2347
+ "loss": 1.2792,
2348
+ "step": 375
2349
+ },
2350
+ {
2351
+ "epoch": 10.16,
2352
+ "learning_rate": 2.6449275362318844e-05,
2353
+ "loss": 1.2567,
2354
+ "step": 376
2355
+ },
2356
+ {
2357
+ "epoch": 10.19,
2358
+ "learning_rate": 2.63768115942029e-05,
2359
+ "loss": 1.2522,
2360
+ "step": 377
2361
+ },
2362
+ {
2363
+ "epoch": 10.21,
2364
+ "learning_rate": 2.6304347826086954e-05,
2365
+ "loss": 1.2899,
2366
+ "step": 378
2367
+ },
2368
+ {
2369
+ "epoch": 10.24,
2370
+ "learning_rate": 2.6231884057971018e-05,
2371
+ "loss": 1.2804,
2372
+ "step": 379
2373
+ },
2374
+ {
2375
+ "epoch": 10.27,
2376
+ "learning_rate": 2.6159420289855075e-05,
2377
+ "loss": 1.2721,
2378
+ "step": 380
2379
+ },
2380
+ {
2381
+ "epoch": 10.29,
2382
+ "learning_rate": 2.608695652173913e-05,
2383
+ "loss": 1.2901,
2384
+ "step": 381
2385
+ },
2386
+ {
2387
+ "epoch": 10.32,
2388
+ "learning_rate": 2.601449275362319e-05,
2389
+ "loss": 1.3134,
2390
+ "step": 382
2391
+ },
2392
+ {
2393
+ "epoch": 10.35,
2394
+ "learning_rate": 2.594202898550725e-05,
2395
+ "loss": 1.2504,
2396
+ "step": 383
2397
+ },
2398
+ {
2399
+ "epoch": 10.37,
2400
+ "learning_rate": 2.5869565217391305e-05,
2401
+ "loss": 1.3061,
2402
+ "step": 384
2403
+ },
2404
+ {
2405
+ "epoch": 10.4,
2406
+ "learning_rate": 2.5797101449275362e-05,
2407
+ "loss": 1.2553,
2408
+ "step": 385
2409
+ },
2410
+ {
2411
+ "epoch": 10.43,
2412
+ "learning_rate": 2.572463768115942e-05,
2413
+ "loss": 1.2534,
2414
+ "step": 386
2415
+ },
2416
+ {
2417
+ "epoch": 10.45,
2418
+ "learning_rate": 2.5652173913043483e-05,
2419
+ "loss": 1.2761,
2420
+ "step": 387
2421
+ },
2422
+ {
2423
+ "epoch": 10.48,
2424
+ "learning_rate": 2.5579710144927536e-05,
2425
+ "loss": 1.3338,
2426
+ "step": 388
2427
+ },
2428
+ {
2429
+ "epoch": 10.51,
2430
+ "learning_rate": 2.5507246376811593e-05,
2431
+ "loss": 1.256,
2432
+ "step": 389
2433
+ },
2434
+ {
2435
+ "epoch": 10.53,
2436
+ "learning_rate": 2.543478260869565e-05,
2437
+ "loss": 1.1987,
2438
+ "step": 390
2439
+ },
2440
+ {
2441
+ "epoch": 10.56,
2442
+ "learning_rate": 2.5362318840579714e-05,
2443
+ "loss": 1.2809,
2444
+ "step": 391
2445
+ },
2446
+ {
2447
+ "epoch": 10.59,
2448
+ "learning_rate": 2.528985507246377e-05,
2449
+ "loss": 1.238,
2450
+ "step": 392
2451
+ },
2452
+ {
2453
+ "epoch": 10.61,
2454
+ "learning_rate": 2.5217391304347827e-05,
2455
+ "loss": 1.2326,
2456
+ "step": 393
2457
+ },
2458
+ {
2459
+ "epoch": 10.64,
2460
+ "learning_rate": 2.5144927536231884e-05,
2461
+ "loss": 1.2594,
2462
+ "step": 394
2463
+ },
2464
+ {
2465
+ "epoch": 10.67,
2466
+ "learning_rate": 2.5072463768115944e-05,
2467
+ "loss": 1.278,
2468
+ "step": 395
2469
+ },
2470
+ {
2471
+ "epoch": 10.69,
2472
+ "learning_rate": 2.5e-05,
2473
+ "loss": 1.2594,
2474
+ "step": 396
2475
+ },
2476
+ {
2477
+ "epoch": 10.72,
2478
+ "learning_rate": 2.492753623188406e-05,
2479
+ "loss": 1.2381,
2480
+ "step": 397
2481
+ },
2482
+ {
2483
+ "epoch": 10.75,
2484
+ "learning_rate": 2.4855072463768118e-05,
2485
+ "loss": 1.2416,
2486
+ "step": 398
2487
+ },
2488
+ {
2489
+ "epoch": 10.77,
2490
+ "learning_rate": 2.4782608695652175e-05,
2491
+ "loss": 1.2395,
2492
+ "step": 399
2493
+ },
2494
+ {
2495
+ "epoch": 10.8,
2496
+ "learning_rate": 2.4710144927536232e-05,
2497
+ "loss": 1.2736,
2498
+ "step": 400
2499
+ },
2500
+ {
2501
+ "epoch": 10.83,
2502
+ "learning_rate": 2.4637681159420292e-05,
2503
+ "loss": 1.2334,
2504
+ "step": 401
2505
+ },
2506
+ {
2507
+ "epoch": 10.85,
2508
+ "learning_rate": 2.456521739130435e-05,
2509
+ "loss": 1.2722,
2510
+ "step": 402
2511
+ },
2512
+ {
2513
+ "epoch": 10.88,
2514
+ "learning_rate": 2.449275362318841e-05,
2515
+ "loss": 1.2749,
2516
+ "step": 403
2517
+ },
2518
+ {
2519
+ "epoch": 10.91,
2520
+ "learning_rate": 2.4420289855072466e-05,
2521
+ "loss": 1.3086,
2522
+ "step": 404
2523
+ },
2524
+ {
2525
+ "epoch": 10.93,
2526
+ "learning_rate": 2.4347826086956523e-05,
2527
+ "loss": 1.3194,
2528
+ "step": 405
2529
+ },
2530
+ {
2531
+ "epoch": 10.96,
2532
+ "learning_rate": 2.427536231884058e-05,
2533
+ "loss": 1.2443,
2534
+ "step": 406
2535
+ },
2536
+ {
2537
+ "epoch": 10.99,
2538
+ "learning_rate": 2.420289855072464e-05,
2539
+ "loss": 1.2342,
2540
+ "step": 407
2541
+ },
2542
+ {
2543
+ "epoch": 10.99,
2544
+ "eval_accuracy": 0.730909385375334,
2545
+ "eval_loss": 1.2519582509994507,
2546
+ "eval_runtime": 13.3427,
2547
+ "eval_samples_per_second": 37.174,
2548
+ "eval_steps_per_second": 4.647,
2549
+ "step": 407
2550
+ },
2551
+ {
2552
+ "epoch": 11.03,
2553
+ "learning_rate": 2.4130434782608697e-05,
2554
+ "loss": 1.9108,
2555
+ "step": 408
2556
+ },
2557
+ {
2558
+ "epoch": 11.05,
2559
+ "learning_rate": 2.4057971014492757e-05,
2560
+ "loss": 1.2788,
2561
+ "step": 409
2562
+ },
2563
+ {
2564
+ "epoch": 11.08,
2565
+ "learning_rate": 2.398550724637681e-05,
2566
+ "loss": 1.2999,
2567
+ "step": 410
2568
+ },
2569
+ {
2570
+ "epoch": 11.11,
2571
+ "learning_rate": 2.391304347826087e-05,
2572
+ "loss": 1.2402,
2573
+ "step": 411
2574
+ },
2575
+ {
2576
+ "epoch": 11.13,
2577
+ "learning_rate": 2.3840579710144928e-05,
2578
+ "loss": 1.2587,
2579
+ "step": 412
2580
+ },
2581
+ {
2582
+ "epoch": 11.16,
2583
+ "learning_rate": 2.3768115942028988e-05,
2584
+ "loss": 1.3018,
2585
+ "step": 413
2586
+ },
2587
+ {
2588
+ "epoch": 11.19,
2589
+ "learning_rate": 2.3695652173913045e-05,
2590
+ "loss": 1.2506,
2591
+ "step": 414
2592
+ },
2593
+ {
2594
+ "epoch": 11.21,
2595
+ "learning_rate": 2.36231884057971e-05,
2596
+ "loss": 1.2885,
2597
+ "step": 415
2598
+ },
2599
+ {
2600
+ "epoch": 11.24,
2601
+ "learning_rate": 2.355072463768116e-05,
2602
+ "loss": 1.331,
2603
+ "step": 416
2604
+ },
2605
+ {
2606
+ "epoch": 11.27,
2607
+ "learning_rate": 2.347826086956522e-05,
2608
+ "loss": 1.2617,
2609
+ "step": 417
2610
+ },
2611
+ {
2612
+ "epoch": 11.29,
2613
+ "learning_rate": 2.3405797101449276e-05,
2614
+ "loss": 1.3315,
2615
+ "step": 418
2616
+ },
2617
+ {
2618
+ "epoch": 11.32,
2619
+ "learning_rate": 2.3333333333333336e-05,
2620
+ "loss": 1.2644,
2621
+ "step": 419
2622
+ },
2623
+ {
2624
+ "epoch": 11.35,
2625
+ "learning_rate": 2.3260869565217393e-05,
2626
+ "loss": 1.2276,
2627
+ "step": 420
2628
+ },
2629
+ {
2630
+ "epoch": 11.37,
2631
+ "learning_rate": 2.318840579710145e-05,
2632
+ "loss": 1.2159,
2633
+ "step": 421
2634
+ },
2635
+ {
2636
+ "epoch": 11.4,
2637
+ "learning_rate": 2.3115942028985506e-05,
2638
+ "loss": 1.2182,
2639
+ "step": 422
2640
+ },
2641
+ {
2642
+ "epoch": 11.43,
2643
+ "learning_rate": 2.3043478260869567e-05,
2644
+ "loss": 1.2267,
2645
+ "step": 423
2646
+ },
2647
+ {
2648
+ "epoch": 11.45,
2649
+ "learning_rate": 2.2971014492753623e-05,
2650
+ "loss": 1.3009,
2651
+ "step": 424
2652
+ },
2653
+ {
2654
+ "epoch": 11.48,
2655
+ "learning_rate": 2.2898550724637684e-05,
2656
+ "loss": 1.2403,
2657
+ "step": 425
2658
+ },
2659
+ {
2660
+ "epoch": 11.51,
2661
+ "learning_rate": 2.282608695652174e-05,
2662
+ "loss": 1.2692,
2663
+ "step": 426
2664
+ },
2665
+ {
2666
+ "epoch": 11.53,
2667
+ "learning_rate": 2.2753623188405797e-05,
2668
+ "loss": 1.2098,
2669
+ "step": 427
2670
+ },
2671
+ {
2672
+ "epoch": 11.56,
2673
+ "learning_rate": 2.2681159420289858e-05,
2674
+ "loss": 1.2321,
2675
+ "step": 428
2676
+ },
2677
+ {
2678
+ "epoch": 11.59,
2679
+ "learning_rate": 2.2608695652173914e-05,
2680
+ "loss": 1.2206,
2681
+ "step": 429
2682
+ },
2683
+ {
2684
+ "epoch": 11.61,
2685
+ "learning_rate": 2.2536231884057975e-05,
2686
+ "loss": 1.2684,
2687
+ "step": 430
2688
+ },
2689
+ {
2690
+ "epoch": 11.64,
2691
+ "learning_rate": 2.246376811594203e-05,
2692
+ "loss": 1.1974,
2693
+ "step": 431
2694
+ },
2695
+ {
2696
+ "epoch": 11.67,
2697
+ "learning_rate": 2.239130434782609e-05,
2698
+ "loss": 1.2054,
2699
+ "step": 432
2700
+ },
2701
+ {
2702
+ "epoch": 11.69,
2703
+ "learning_rate": 2.2318840579710145e-05,
2704
+ "loss": 1.2492,
2705
+ "step": 433
2706
+ },
2707
+ {
2708
+ "epoch": 11.72,
2709
+ "learning_rate": 2.2246376811594205e-05,
2710
+ "loss": 1.2817,
2711
+ "step": 434
2712
+ },
2713
+ {
2714
+ "epoch": 11.75,
2715
+ "learning_rate": 2.2173913043478262e-05,
2716
+ "loss": 1.25,
2717
+ "step": 435
2718
+ },
2719
+ {
2720
+ "epoch": 11.77,
2721
+ "learning_rate": 2.2101449275362323e-05,
2722
+ "loss": 1.2499,
2723
+ "step": 436
2724
+ },
2725
+ {
2726
+ "epoch": 11.8,
2727
+ "learning_rate": 2.2028985507246376e-05,
2728
+ "loss": 1.2213,
2729
+ "step": 437
2730
+ },
2731
+ {
2732
+ "epoch": 11.83,
2733
+ "learning_rate": 2.1956521739130436e-05,
2734
+ "loss": 1.2857,
2735
+ "step": 438
2736
+ },
2737
+ {
2738
+ "epoch": 11.85,
2739
+ "learning_rate": 2.1884057971014493e-05,
2740
+ "loss": 1.2746,
2741
+ "step": 439
2742
+ },
2743
+ {
2744
+ "epoch": 11.88,
2745
+ "learning_rate": 2.1811594202898553e-05,
2746
+ "loss": 1.2603,
2747
+ "step": 440
2748
+ },
2749
+ {
2750
+ "epoch": 11.91,
2751
+ "learning_rate": 2.173913043478261e-05,
2752
+ "loss": 1.2651,
2753
+ "step": 441
2754
+ },
2755
+ {
2756
+ "epoch": 11.93,
2757
+ "learning_rate": 2.1666666666666667e-05,
2758
+ "loss": 1.2772,
2759
+ "step": 442
2760
+ },
2761
+ {
2762
+ "epoch": 11.96,
2763
+ "learning_rate": 2.1594202898550724e-05,
2764
+ "loss": 1.2843,
2765
+ "step": 443
2766
+ },
2767
+ {
2768
+ "epoch": 11.99,
2769
+ "learning_rate": 2.1521739130434784e-05,
2770
+ "loss": 1.2199,
2771
+ "step": 444
2772
+ },
2773
+ {
2774
+ "epoch": 11.99,
2775
+ "eval_accuracy": 0.7259977842029887,
2776
+ "eval_loss": 1.273816704750061,
2777
+ "eval_runtime": 13.2756,
2778
+ "eval_samples_per_second": 37.362,
2779
+ "eval_steps_per_second": 4.67,
2780
+ "step": 444
2781
+ },
2782
+ {
2783
+ "epoch": 12.03,
2784
+ "learning_rate": 2.144927536231884e-05,
2785
+ "loss": 1.8625,
2786
+ "step": 445
2787
+ },
2788
+ {
2789
+ "epoch": 12.05,
2790
+ "learning_rate": 2.13768115942029e-05,
2791
+ "loss": 1.2381,
2792
+ "step": 446
2793
+ },
2794
+ {
2795
+ "epoch": 12.08,
2796
+ "learning_rate": 2.1304347826086958e-05,
2797
+ "loss": 1.2111,
2798
+ "step": 447
2799
+ },
2800
+ {
2801
+ "epoch": 12.11,
2802
+ "learning_rate": 2.1231884057971015e-05,
2803
+ "loss": 1.2404,
2804
+ "step": 448
2805
+ },
2806
+ {
2807
+ "epoch": 12.13,
2808
+ "learning_rate": 2.1159420289855072e-05,
2809
+ "loss": 1.2764,
2810
+ "step": 449
2811
+ },
2812
+ {
2813
+ "epoch": 12.16,
2814
+ "learning_rate": 2.1086956521739132e-05,
2815
+ "loss": 1.2617,
2816
+ "step": 450
2817
+ },
2818
+ {
2819
+ "epoch": 12.19,
2820
+ "learning_rate": 2.101449275362319e-05,
2821
+ "loss": 1.2685,
2822
+ "step": 451
2823
+ },
2824
+ {
2825
+ "epoch": 12.21,
2826
+ "learning_rate": 2.094202898550725e-05,
2827
+ "loss": 1.1995,
2828
+ "step": 452
2829
+ },
2830
+ {
2831
+ "epoch": 12.24,
2832
+ "learning_rate": 2.0869565217391303e-05,
2833
+ "loss": 1.2554,
2834
+ "step": 453
2835
+ },
2836
+ {
2837
+ "epoch": 12.27,
2838
+ "learning_rate": 2.0797101449275363e-05,
2839
+ "loss": 1.2545,
2840
+ "step": 454
2841
+ },
2842
+ {
2843
+ "epoch": 12.29,
2844
+ "learning_rate": 2.072463768115942e-05,
2845
+ "loss": 1.2315,
2846
+ "step": 455
2847
+ },
2848
+ {
2849
+ "epoch": 12.32,
2850
+ "learning_rate": 2.065217391304348e-05,
2851
+ "loss": 1.2373,
2852
+ "step": 456
2853
+ },
2854
+ {
2855
+ "epoch": 12.35,
2856
+ "learning_rate": 2.0579710144927537e-05,
2857
+ "loss": 1.2745,
2858
+ "step": 457
2859
+ },
2860
+ {
2861
+ "epoch": 12.37,
2862
+ "learning_rate": 2.0507246376811594e-05,
2863
+ "loss": 1.2405,
2864
+ "step": 458
2865
+ },
2866
+ {
2867
+ "epoch": 12.4,
2868
+ "learning_rate": 2.0434782608695654e-05,
2869
+ "loss": 1.2467,
2870
+ "step": 459
2871
+ },
2872
+ {
2873
+ "epoch": 12.43,
2874
+ "learning_rate": 2.036231884057971e-05,
2875
+ "loss": 1.213,
2876
+ "step": 460
2877
+ },
2878
+ {
2879
+ "epoch": 12.45,
2880
+ "learning_rate": 2.028985507246377e-05,
2881
+ "loss": 1.2499,
2882
+ "step": 461
2883
+ },
2884
+ {
2885
+ "epoch": 12.48,
2886
+ "learning_rate": 2.0217391304347828e-05,
2887
+ "loss": 1.2601,
2888
+ "step": 462
2889
+ },
2890
+ {
2891
+ "epoch": 12.51,
2892
+ "learning_rate": 2.0144927536231885e-05,
2893
+ "loss": 1.2704,
2894
+ "step": 463
2895
+ },
2896
+ {
2897
+ "epoch": 12.53,
2898
+ "learning_rate": 2.007246376811594e-05,
2899
+ "loss": 1.2739,
2900
+ "step": 464
2901
+ },
2902
+ {
2903
+ "epoch": 12.56,
2904
+ "learning_rate": 2e-05,
2905
+ "loss": 1.1849,
2906
+ "step": 465
2907
+ },
2908
+ {
2909
+ "epoch": 12.59,
2910
+ "learning_rate": 1.992753623188406e-05,
2911
+ "loss": 1.1746,
2912
+ "step": 466
2913
+ },
2914
+ {
2915
+ "epoch": 12.61,
2916
+ "learning_rate": 1.985507246376812e-05,
2917
+ "loss": 1.2887,
2918
+ "step": 467
2919
+ },
2920
+ {
2921
+ "epoch": 12.64,
2922
+ "learning_rate": 1.9782608695652176e-05,
2923
+ "loss": 1.2067,
2924
+ "step": 468
2925
+ },
2926
+ {
2927
+ "epoch": 12.67,
2928
+ "learning_rate": 1.9710144927536232e-05,
2929
+ "loss": 1.2601,
2930
+ "step": 469
2931
+ },
2932
+ {
2933
+ "epoch": 12.69,
2934
+ "learning_rate": 1.963768115942029e-05,
2935
+ "loss": 1.2632,
2936
+ "step": 470
2937
+ },
2938
+ {
2939
+ "epoch": 12.72,
2940
+ "learning_rate": 1.956521739130435e-05,
2941
+ "loss": 1.2633,
2942
+ "step": 471
2943
+ },
2944
+ {
2945
+ "epoch": 12.75,
2946
+ "learning_rate": 1.9492753623188406e-05,
2947
+ "loss": 1.216,
2948
+ "step": 472
2949
+ },
2950
+ {
2951
+ "epoch": 12.77,
2952
+ "learning_rate": 1.9420289855072467e-05,
2953
+ "loss": 1.2608,
2954
+ "step": 473
2955
+ },
2956
+ {
2957
+ "epoch": 12.8,
2958
+ "learning_rate": 1.9347826086956523e-05,
2959
+ "loss": 1.2219,
2960
+ "step": 474
2961
+ },
2962
+ {
2963
+ "epoch": 12.83,
2964
+ "learning_rate": 1.927536231884058e-05,
2965
+ "loss": 1.2864,
2966
+ "step": 475
2967
+ },
2968
+ {
2969
+ "epoch": 12.85,
2970
+ "learning_rate": 1.9202898550724637e-05,
2971
+ "loss": 1.3147,
2972
+ "step": 476
2973
+ },
2974
+ {
2975
+ "epoch": 12.88,
2976
+ "learning_rate": 1.9130434782608697e-05,
2977
+ "loss": 1.2553,
2978
+ "step": 477
2979
+ },
2980
+ {
2981
+ "epoch": 12.91,
2982
+ "learning_rate": 1.9057971014492754e-05,
2983
+ "loss": 1.2906,
2984
+ "step": 478
2985
+ },
2986
+ {
2987
+ "epoch": 12.93,
2988
+ "learning_rate": 1.8985507246376814e-05,
2989
+ "loss": 1.2831,
2990
+ "step": 479
2991
+ },
2992
+ {
2993
+ "epoch": 12.96,
2994
+ "learning_rate": 1.8913043478260868e-05,
2995
+ "loss": 1.2721,
2996
+ "step": 480
2997
+ },
2998
+ {
2999
+ "epoch": 12.99,
3000
+ "learning_rate": 1.8840579710144928e-05,
3001
+ "loss": 1.206,
3002
+ "step": 481
3003
+ },
3004
+ {
3005
+ "epoch": 12.99,
3006
+ "eval_accuracy": 0.7335030880918842,
3007
+ "eval_loss": 1.2285895347595215,
3008
+ "eval_runtime": 13.2581,
3009
+ "eval_samples_per_second": 37.411,
3010
+ "eval_steps_per_second": 4.676,
3011
+ "step": 481
3012
+ },
3013
+ {
3014
+ "epoch": 13.03,
3015
+ "learning_rate": 1.8768115942028985e-05,
3016
+ "loss": 1.8759,
3017
+ "step": 482
3018
+ },
3019
+ {
3020
+ "epoch": 13.05,
3021
+ "learning_rate": 1.8695652173913045e-05,
3022
+ "loss": 1.2655,
3023
+ "step": 483
3024
+ },
3025
+ {
3026
+ "epoch": 13.08,
3027
+ "learning_rate": 1.8623188405797102e-05,
3028
+ "loss": 1.218,
3029
+ "step": 484
3030
+ },
3031
+ {
3032
+ "epoch": 13.11,
3033
+ "learning_rate": 1.855072463768116e-05,
3034
+ "loss": 1.2632,
3035
+ "step": 485
3036
+ },
3037
+ {
3038
+ "epoch": 13.13,
3039
+ "learning_rate": 1.8478260869565216e-05,
3040
+ "loss": 1.2984,
3041
+ "step": 486
3042
+ },
3043
+ {
3044
+ "epoch": 13.16,
3045
+ "learning_rate": 1.8405797101449276e-05,
3046
+ "loss": 1.2791,
3047
+ "step": 487
3048
+ },
3049
+ {
3050
+ "epoch": 13.19,
3051
+ "learning_rate": 1.8333333333333333e-05,
3052
+ "loss": 1.2126,
3053
+ "step": 488
3054
+ },
3055
+ {
3056
+ "epoch": 13.21,
3057
+ "learning_rate": 1.8260869565217393e-05,
3058
+ "loss": 1.2503,
3059
+ "step": 489
3060
+ },
3061
+ {
3062
+ "epoch": 13.24,
3063
+ "learning_rate": 1.818840579710145e-05,
3064
+ "loss": 1.2168,
3065
+ "step": 490
3066
+ },
3067
+ {
3068
+ "epoch": 13.27,
3069
+ "learning_rate": 1.8115942028985507e-05,
3070
+ "loss": 1.3218,
3071
+ "step": 491
3072
+ },
3073
+ {
3074
+ "epoch": 13.29,
3075
+ "learning_rate": 1.8043478260869567e-05,
3076
+ "loss": 1.2605,
3077
+ "step": 492
3078
+ },
3079
+ {
3080
+ "epoch": 13.32,
3081
+ "learning_rate": 1.7971014492753624e-05,
3082
+ "loss": 1.2497,
3083
+ "step": 493
3084
+ },
3085
+ {
3086
+ "epoch": 13.35,
3087
+ "learning_rate": 1.7898550724637684e-05,
3088
+ "loss": 1.2989,
3089
+ "step": 494
3090
+ },
3091
+ {
3092
+ "epoch": 13.37,
3093
+ "learning_rate": 1.782608695652174e-05,
3094
+ "loss": 1.2328,
3095
+ "step": 495
3096
+ },
3097
+ {
3098
+ "epoch": 13.4,
3099
+ "learning_rate": 1.7753623188405798e-05,
3100
+ "loss": 1.2262,
3101
+ "step": 496
3102
+ },
3103
+ {
3104
+ "epoch": 13.43,
3105
+ "learning_rate": 1.7681159420289855e-05,
3106
+ "loss": 1.2803,
3107
+ "step": 497
3108
+ },
3109
+ {
3110
+ "epoch": 13.45,
3111
+ "learning_rate": 1.7608695652173915e-05,
3112
+ "loss": 1.273,
3113
+ "step": 498
3114
+ },
3115
+ {
3116
+ "epoch": 13.48,
3117
+ "learning_rate": 1.7536231884057972e-05,
3118
+ "loss": 1.2922,
3119
+ "step": 499
3120
+ },
3121
+ {
3122
+ "epoch": 13.51,
3123
+ "learning_rate": 1.7463768115942032e-05,
3124
+ "loss": 1.1986,
3125
+ "step": 500
3126
+ },
3127
+ {
3128
+ "epoch": 13.53,
3129
+ "learning_rate": 1.739130434782609e-05,
3130
+ "loss": 1.267,
3131
+ "step": 501
3132
+ },
3133
+ {
3134
+ "epoch": 13.56,
3135
+ "learning_rate": 1.7318840579710146e-05,
3136
+ "loss": 1.2887,
3137
+ "step": 502
3138
+ },
3139
+ {
3140
+ "epoch": 13.59,
3141
+ "learning_rate": 1.7246376811594203e-05,
3142
+ "loss": 1.1824,
3143
+ "step": 503
3144
+ },
3145
+ {
3146
+ "epoch": 13.61,
3147
+ "learning_rate": 1.7173913043478263e-05,
3148
+ "loss": 1.2688,
3149
+ "step": 504
3150
+ },
3151
+ {
3152
+ "epoch": 13.64,
3153
+ "learning_rate": 1.710144927536232e-05,
3154
+ "loss": 1.2429,
3155
+ "step": 505
3156
+ },
3157
+ {
3158
+ "epoch": 13.67,
3159
+ "learning_rate": 1.702898550724638e-05,
3160
+ "loss": 1.2028,
3161
+ "step": 506
3162
+ },
3163
+ {
3164
+ "epoch": 13.69,
3165
+ "learning_rate": 1.6956521739130433e-05,
3166
+ "loss": 1.2543,
3167
+ "step": 507
3168
+ },
3169
+ {
3170
+ "epoch": 13.72,
3171
+ "learning_rate": 1.6884057971014494e-05,
3172
+ "loss": 1.2463,
3173
+ "step": 508
3174
+ },
3175
+ {
3176
+ "epoch": 13.75,
3177
+ "learning_rate": 1.681159420289855e-05,
3178
+ "loss": 1.2003,
3179
+ "step": 509
3180
+ },
3181
+ {
3182
+ "epoch": 13.77,
3183
+ "learning_rate": 1.673913043478261e-05,
3184
+ "loss": 1.2947,
3185
+ "step": 510
3186
+ },
3187
+ {
3188
+ "epoch": 13.8,
3189
+ "learning_rate": 1.6666666666666667e-05,
3190
+ "loss": 1.2499,
3191
+ "step": 511
3192
+ },
3193
+ {
3194
+ "epoch": 13.83,
3195
+ "learning_rate": 1.6594202898550724e-05,
3196
+ "loss": 1.2518,
3197
+ "step": 512
3198
+ },
3199
+ {
3200
+ "epoch": 13.85,
3201
+ "learning_rate": 1.652173913043478e-05,
3202
+ "loss": 1.2409,
3203
+ "step": 513
3204
+ },
3205
+ {
3206
+ "epoch": 13.88,
3207
+ "learning_rate": 1.644927536231884e-05,
3208
+ "loss": 1.295,
3209
+ "step": 514
3210
+ },
3211
+ {
3212
+ "epoch": 13.91,
3213
+ "learning_rate": 1.6376811594202898e-05,
3214
+ "loss": 1.2678,
3215
+ "step": 515
3216
+ },
3217
+ {
3218
+ "epoch": 13.93,
3219
+ "learning_rate": 1.630434782608696e-05,
3220
+ "loss": 1.217,
3221
+ "step": 516
3222
+ },
3223
+ {
3224
+ "epoch": 13.96,
3225
+ "learning_rate": 1.6231884057971015e-05,
3226
+ "loss": 1.2197,
3227
+ "step": 517
3228
+ },
3229
+ {
3230
+ "epoch": 13.99,
3231
+ "learning_rate": 1.6159420289855072e-05,
3232
+ "loss": 1.221,
3233
+ "step": 518
3234
+ },
3235
+ {
3236
+ "epoch": 13.99,
3237
+ "eval_accuracy": 0.7327190178115681,
3238
+ "eval_loss": 1.2421268224716187,
3239
+ "eval_runtime": 13.279,
3240
+ "eval_samples_per_second": 37.352,
3241
+ "eval_steps_per_second": 4.669,
3242
+ "step": 518
3243
+ },
3244
+ {
3245
+ "epoch": 14.03,
3246
+ "learning_rate": 1.608695652173913e-05,
3247
+ "loss": 1.834,
3248
+ "step": 519
3249
+ },
3250
+ {
3251
+ "epoch": 14.05,
3252
+ "learning_rate": 1.601449275362319e-05,
3253
+ "loss": 1.2727,
3254
+ "step": 520
3255
+ },
3256
+ {
3257
+ "epoch": 14.08,
3258
+ "learning_rate": 1.5942028985507246e-05,
3259
+ "loss": 1.2753,
3260
+ "step": 521
3261
+ },
3262
+ {
3263
+ "epoch": 14.11,
3264
+ "learning_rate": 1.5869565217391306e-05,
3265
+ "loss": 1.2137,
3266
+ "step": 522
3267
+ },
3268
+ {
3269
+ "epoch": 14.13,
3270
+ "learning_rate": 1.5797101449275363e-05,
3271
+ "loss": 1.2626,
3272
+ "step": 523
3273
+ },
3274
+ {
3275
+ "epoch": 14.16,
3276
+ "learning_rate": 1.572463768115942e-05,
3277
+ "loss": 1.2518,
3278
+ "step": 524
3279
+ },
3280
+ {
3281
+ "epoch": 14.19,
3282
+ "learning_rate": 1.565217391304348e-05,
3283
+ "loss": 1.2167,
3284
+ "step": 525
3285
+ },
3286
+ {
3287
+ "epoch": 14.21,
3288
+ "learning_rate": 1.5579710144927537e-05,
3289
+ "loss": 1.1935,
3290
+ "step": 526
3291
+ },
3292
+ {
3293
+ "epoch": 14.24,
3294
+ "learning_rate": 1.5507246376811597e-05,
3295
+ "loss": 1.2374,
3296
+ "step": 527
3297
+ },
3298
+ {
3299
+ "epoch": 14.27,
3300
+ "learning_rate": 1.5434782608695654e-05,
3301
+ "loss": 1.2236,
3302
+ "step": 528
3303
+ },
3304
+ {
3305
+ "epoch": 14.29,
3306
+ "learning_rate": 1.536231884057971e-05,
3307
+ "loss": 1.2117,
3308
+ "step": 529
3309
+ },
3310
+ {
3311
+ "epoch": 14.32,
3312
+ "learning_rate": 1.5289855072463768e-05,
3313
+ "loss": 1.3012,
3314
+ "step": 530
3315
+ },
3316
+ {
3317
+ "epoch": 14.35,
3318
+ "learning_rate": 1.5217391304347828e-05,
3319
+ "loss": 1.3001,
3320
+ "step": 531
3321
+ },
3322
+ {
3323
+ "epoch": 14.37,
3324
+ "learning_rate": 1.5144927536231885e-05,
3325
+ "loss": 1.198,
3326
+ "step": 532
3327
+ },
3328
+ {
3329
+ "epoch": 14.4,
3330
+ "learning_rate": 1.5072463768115944e-05,
3331
+ "loss": 1.2198,
3332
+ "step": 533
3333
+ },
3334
+ {
3335
+ "epoch": 14.43,
3336
+ "learning_rate": 1.5e-05,
3337
+ "loss": 1.2423,
3338
+ "step": 534
3339
+ },
3340
+ {
3341
+ "epoch": 14.45,
3342
+ "learning_rate": 1.4927536231884059e-05,
3343
+ "loss": 1.2475,
3344
+ "step": 535
3345
+ },
3346
+ {
3347
+ "epoch": 14.48,
3348
+ "learning_rate": 1.4855072463768116e-05,
3349
+ "loss": 1.2687,
3350
+ "step": 536
3351
+ },
3352
+ {
3353
+ "epoch": 14.51,
3354
+ "learning_rate": 1.4782608695652176e-05,
3355
+ "loss": 1.2282,
3356
+ "step": 537
3357
+ },
3358
+ {
3359
+ "epoch": 14.53,
3360
+ "learning_rate": 1.4710144927536231e-05,
3361
+ "loss": 1.2676,
3362
+ "step": 538
3363
+ },
3364
+ {
3365
+ "epoch": 14.56,
3366
+ "learning_rate": 1.4637681159420291e-05,
3367
+ "loss": 1.3138,
3368
+ "step": 539
3369
+ },
3370
+ {
3371
+ "epoch": 14.59,
3372
+ "learning_rate": 1.4565217391304348e-05,
3373
+ "loss": 1.2945,
3374
+ "step": 540
3375
+ },
3376
+ {
3377
+ "epoch": 14.61,
3378
+ "learning_rate": 1.4492753623188407e-05,
3379
+ "loss": 1.2479,
3380
+ "step": 541
3381
+ },
3382
+ {
3383
+ "epoch": 14.64,
3384
+ "learning_rate": 1.4420289855072464e-05,
3385
+ "loss": 1.2788,
3386
+ "step": 542
3387
+ },
3388
+ {
3389
+ "epoch": 14.67,
3390
+ "learning_rate": 1.4347826086956522e-05,
3391
+ "loss": 1.2951,
3392
+ "step": 543
3393
+ },
3394
+ {
3395
+ "epoch": 14.69,
3396
+ "learning_rate": 1.4275362318840579e-05,
3397
+ "loss": 1.2326,
3398
+ "step": 544
3399
+ },
3400
+ {
3401
+ "epoch": 14.72,
3402
+ "learning_rate": 1.420289855072464e-05,
3403
+ "loss": 1.2256,
3404
+ "step": 545
3405
+ },
3406
+ {
3407
+ "epoch": 14.75,
3408
+ "learning_rate": 1.4130434782608694e-05,
3409
+ "loss": 1.213,
3410
+ "step": 546
3411
+ },
3412
+ {
3413
+ "epoch": 14.77,
3414
+ "learning_rate": 1.4057971014492755e-05,
3415
+ "loss": 1.2928,
3416
+ "step": 547
3417
+ },
3418
+ {
3419
+ "epoch": 14.8,
3420
+ "learning_rate": 1.3985507246376811e-05,
3421
+ "loss": 1.2372,
3422
+ "step": 548
3423
+ },
3424
+ {
3425
+ "epoch": 14.83,
3426
+ "learning_rate": 1.391304347826087e-05,
3427
+ "loss": 1.2561,
3428
+ "step": 549
3429
+ },
3430
+ {
3431
+ "epoch": 14.85,
3432
+ "learning_rate": 1.3840579710144927e-05,
3433
+ "loss": 1.23,
3434
+ "step": 550
3435
+ },
3436
+ {
3437
+ "epoch": 14.88,
3438
+ "learning_rate": 1.3768115942028985e-05,
3439
+ "loss": 1.2413,
3440
+ "step": 551
3441
+ },
3442
+ {
3443
+ "epoch": 14.91,
3444
+ "learning_rate": 1.3695652173913042e-05,
3445
+ "loss": 1.2263,
3446
+ "step": 552
3447
+ },
3448
+ {
3449
+ "epoch": 14.93,
3450
+ "learning_rate": 1.3623188405797103e-05,
3451
+ "loss": 1.2865,
3452
+ "step": 553
3453
+ },
3454
+ {
3455
+ "epoch": 14.96,
3456
+ "learning_rate": 1.3550724637681161e-05,
3457
+ "loss": 1.2619,
3458
+ "step": 554
3459
+ },
3460
+ {
3461
+ "epoch": 14.99,
3462
+ "learning_rate": 1.3478260869565218e-05,
3463
+ "loss": 1.2062,
3464
+ "step": 555
3465
+ },
3466
+ {
3467
+ "epoch": 14.99,
3468
+ "eval_accuracy": 0.732803299595691,
3469
+ "eval_loss": 1.2402293682098389,
3470
+ "eval_runtime": 13.2984,
3471
+ "eval_samples_per_second": 37.298,
3472
+ "eval_steps_per_second": 4.662,
3473
+ "step": 555
3474
+ },
3475
+ {
3476
+ "epoch": 15.03,
3477
+ "learning_rate": 1.3405797101449276e-05,
3478
+ "loss": 1.8577,
3479
+ "step": 556
3480
+ },
3481
+ {
3482
+ "epoch": 15.05,
3483
+ "learning_rate": 1.3333333333333333e-05,
3484
+ "loss": 1.2946,
3485
+ "step": 557
3486
+ },
3487
+ {
3488
+ "epoch": 15.08,
3489
+ "learning_rate": 1.3260869565217394e-05,
3490
+ "loss": 1.2822,
3491
+ "step": 558
3492
+ },
3493
+ {
3494
+ "epoch": 15.11,
3495
+ "learning_rate": 1.318840579710145e-05,
3496
+ "loss": 1.2716,
3497
+ "step": 559
3498
+ },
3499
+ {
3500
+ "epoch": 15.13,
3501
+ "learning_rate": 1.3115942028985509e-05,
3502
+ "loss": 1.2188,
3503
+ "step": 560
3504
+ },
3505
+ {
3506
+ "epoch": 15.16,
3507
+ "learning_rate": 1.3043478260869566e-05,
3508
+ "loss": 1.2308,
3509
+ "step": 561
3510
+ },
3511
+ {
3512
+ "epoch": 15.19,
3513
+ "learning_rate": 1.2971014492753624e-05,
3514
+ "loss": 1.2465,
3515
+ "step": 562
3516
+ },
3517
+ {
3518
+ "epoch": 15.21,
3519
+ "learning_rate": 1.2898550724637681e-05,
3520
+ "loss": 1.2039,
3521
+ "step": 563
3522
+ },
3523
+ {
3524
+ "epoch": 15.24,
3525
+ "learning_rate": 1.2826086956521741e-05,
3526
+ "loss": 1.2685,
3527
+ "step": 564
3528
+ },
3529
+ {
3530
+ "epoch": 15.27,
3531
+ "learning_rate": 1.2753623188405797e-05,
3532
+ "loss": 1.224,
3533
+ "step": 565
3534
+ },
3535
+ {
3536
+ "epoch": 15.29,
3537
+ "learning_rate": 1.2681159420289857e-05,
3538
+ "loss": 1.2462,
3539
+ "step": 566
3540
+ },
3541
+ {
3542
+ "epoch": 15.32,
3543
+ "learning_rate": 1.2608695652173914e-05,
3544
+ "loss": 1.2162,
3545
+ "step": 567
3546
+ },
3547
+ {
3548
+ "epoch": 15.35,
3549
+ "learning_rate": 1.2536231884057972e-05,
3550
+ "loss": 1.1778,
3551
+ "step": 568
3552
+ },
3553
+ {
3554
+ "epoch": 15.37,
3555
+ "learning_rate": 1.246376811594203e-05,
3556
+ "loss": 1.2065,
3557
+ "step": 569
3558
+ },
3559
+ {
3560
+ "epoch": 15.4,
3561
+ "learning_rate": 1.2391304347826088e-05,
3562
+ "loss": 1.2536,
3563
+ "step": 570
3564
+ },
3565
+ {
3566
+ "epoch": 15.43,
3567
+ "learning_rate": 1.2318840579710146e-05,
3568
+ "loss": 1.2198,
3569
+ "step": 571
3570
+ },
3571
+ {
3572
+ "epoch": 15.45,
3573
+ "learning_rate": 1.2246376811594205e-05,
3574
+ "loss": 1.242,
3575
+ "step": 572
3576
+ },
3577
+ {
3578
+ "epoch": 15.48,
3579
+ "learning_rate": 1.2173913043478261e-05,
3580
+ "loss": 1.2376,
3581
+ "step": 573
3582
+ },
3583
+ {
3584
+ "epoch": 15.51,
3585
+ "learning_rate": 1.210144927536232e-05,
3586
+ "loss": 1.2419,
3587
+ "step": 574
3588
+ },
3589
+ {
3590
+ "epoch": 15.53,
3591
+ "learning_rate": 1.2028985507246379e-05,
3592
+ "loss": 1.2673,
3593
+ "step": 575
3594
+ },
3595
+ {
3596
+ "epoch": 15.56,
3597
+ "learning_rate": 1.1956521739130435e-05,
3598
+ "loss": 1.2265,
3599
+ "step": 576
3600
+ },
3601
+ {
3602
+ "epoch": 15.59,
3603
+ "learning_rate": 1.1884057971014494e-05,
3604
+ "loss": 1.2708,
3605
+ "step": 577
3606
+ },
3607
+ {
3608
+ "epoch": 15.61,
3609
+ "learning_rate": 1.181159420289855e-05,
3610
+ "loss": 1.2264,
3611
+ "step": 578
3612
+ },
3613
+ {
3614
+ "epoch": 15.64,
3615
+ "learning_rate": 1.173913043478261e-05,
3616
+ "loss": 1.2257,
3617
+ "step": 579
3618
+ },
3619
+ {
3620
+ "epoch": 15.67,
3621
+ "learning_rate": 1.1666666666666668e-05,
3622
+ "loss": 1.1863,
3623
+ "step": 580
3624
+ },
3625
+ {
3626
+ "epoch": 15.69,
3627
+ "learning_rate": 1.1594202898550725e-05,
3628
+ "loss": 1.1775,
3629
+ "step": 581
3630
+ },
3631
+ {
3632
+ "epoch": 15.72,
3633
+ "learning_rate": 1.1521739130434783e-05,
3634
+ "loss": 1.2742,
3635
+ "step": 582
3636
+ },
3637
+ {
3638
+ "epoch": 15.75,
3639
+ "learning_rate": 1.1449275362318842e-05,
3640
+ "loss": 1.2685,
3641
+ "step": 583
3642
+ },
3643
+ {
3644
+ "epoch": 15.77,
3645
+ "learning_rate": 1.1376811594202899e-05,
3646
+ "loss": 1.2752,
3647
+ "step": 584
3648
+ },
3649
+ {
3650
+ "epoch": 15.8,
3651
+ "learning_rate": 1.1304347826086957e-05,
3652
+ "loss": 1.2163,
3653
+ "step": 585
3654
+ },
3655
+ {
3656
+ "epoch": 15.83,
3657
+ "learning_rate": 1.1231884057971016e-05,
3658
+ "loss": 1.279,
3659
+ "step": 586
3660
+ },
3661
+ {
3662
+ "epoch": 15.85,
3663
+ "learning_rate": 1.1159420289855073e-05,
3664
+ "loss": 1.2633,
3665
+ "step": 587
3666
+ },
3667
+ {
3668
+ "epoch": 15.88,
3669
+ "learning_rate": 1.1086956521739131e-05,
3670
+ "loss": 1.2338,
3671
+ "step": 588
3672
+ },
3673
+ {
3674
+ "epoch": 15.91,
3675
+ "learning_rate": 1.1014492753623188e-05,
3676
+ "loss": 1.2283,
3677
+ "step": 589
3678
+ },
3679
+ {
3680
+ "epoch": 15.93,
3681
+ "learning_rate": 1.0942028985507247e-05,
3682
+ "loss": 1.2701,
3683
+ "step": 590
3684
+ },
3685
+ {
3686
+ "epoch": 15.96,
3687
+ "learning_rate": 1.0869565217391305e-05,
3688
+ "loss": 1.2949,
3689
+ "step": 591
3690
+ },
3691
+ {
3692
+ "epoch": 15.99,
3693
+ "learning_rate": 1.0797101449275362e-05,
3694
+ "loss": 1.2305,
3695
+ "step": 592
3696
+ },
3697
+ {
3698
+ "epoch": 15.99,
3699
+ "eval_accuracy": 0.7307723434675875,
3700
+ "eval_loss": 1.247312307357788,
3701
+ "eval_runtime": 13.2755,
3702
+ "eval_samples_per_second": 37.362,
3703
+ "eval_steps_per_second": 4.67,
3704
+ "step": 592
3705
+ },
3706
+ {
3707
+ "epoch": 16.03,
3708
+ "learning_rate": 1.072463768115942e-05,
3709
+ "loss": 1.8458,
3710
+ "step": 593
3711
+ },
3712
+ {
3713
+ "epoch": 16.05,
3714
+ "learning_rate": 1.0652173913043479e-05,
3715
+ "loss": 1.1802,
3716
+ "step": 594
3717
+ },
3718
+ {
3719
+ "epoch": 16.08,
3720
+ "learning_rate": 1.0579710144927536e-05,
3721
+ "loss": 1.2814,
3722
+ "step": 595
3723
+ },
3724
+ {
3725
+ "epoch": 16.11,
3726
+ "learning_rate": 1.0507246376811594e-05,
3727
+ "loss": 1.2367,
3728
+ "step": 596
3729
+ },
3730
+ {
3731
+ "epoch": 16.13,
3732
+ "learning_rate": 1.0434782608695651e-05,
3733
+ "loss": 1.2374,
3734
+ "step": 597
3735
+ },
3736
+ {
3737
+ "epoch": 16.16,
3738
+ "learning_rate": 1.036231884057971e-05,
3739
+ "loss": 1.3441,
3740
+ "step": 598
3741
+ },
3742
+ {
3743
+ "epoch": 16.19,
3744
+ "learning_rate": 1.0289855072463768e-05,
3745
+ "loss": 1.2469,
3746
+ "step": 599
3747
+ },
3748
+ {
3749
+ "epoch": 16.21,
3750
+ "learning_rate": 1.0217391304347827e-05,
3751
+ "loss": 1.2178,
3752
+ "step": 600
3753
+ },
3754
+ {
3755
+ "epoch": 16.24,
3756
+ "learning_rate": 1.0144927536231885e-05,
3757
+ "loss": 1.2353,
3758
+ "step": 601
3759
+ },
3760
+ {
3761
+ "epoch": 16.27,
3762
+ "learning_rate": 1.0072463768115942e-05,
3763
+ "loss": 1.2735,
3764
+ "step": 602
3765
+ },
3766
+ {
3767
+ "epoch": 16.29,
3768
+ "learning_rate": 1e-05,
3769
+ "loss": 1.2322,
3770
+ "step": 603
3771
+ },
3772
+ {
3773
+ "epoch": 16.32,
3774
+ "learning_rate": 9.92753623188406e-06,
3775
+ "loss": 1.2349,
3776
+ "step": 604
3777
+ },
3778
+ {
3779
+ "epoch": 16.35,
3780
+ "learning_rate": 9.855072463768116e-06,
3781
+ "loss": 1.2346,
3782
+ "step": 605
3783
+ },
3784
+ {
3785
+ "epoch": 16.37,
3786
+ "learning_rate": 9.782608695652175e-06,
3787
+ "loss": 1.2412,
3788
+ "step": 606
3789
+ },
3790
+ {
3791
+ "epoch": 16.4,
3792
+ "learning_rate": 9.710144927536233e-06,
3793
+ "loss": 1.2781,
3794
+ "step": 607
3795
+ },
3796
+ {
3797
+ "epoch": 16.43,
3798
+ "learning_rate": 9.63768115942029e-06,
3799
+ "loss": 1.2516,
3800
+ "step": 608
3801
+ },
3802
+ {
3803
+ "epoch": 16.45,
3804
+ "learning_rate": 9.565217391304349e-06,
3805
+ "loss": 1.1951,
3806
+ "step": 609
3807
+ },
3808
+ {
3809
+ "epoch": 16.48,
3810
+ "learning_rate": 9.492753623188407e-06,
3811
+ "loss": 1.2746,
3812
+ "step": 610
3813
+ },
3814
+ {
3815
+ "epoch": 16.51,
3816
+ "learning_rate": 9.420289855072464e-06,
3817
+ "loss": 1.2231,
3818
+ "step": 611
3819
+ },
3820
+ {
3821
+ "epoch": 16.53,
3822
+ "learning_rate": 9.347826086956523e-06,
3823
+ "loss": 1.2761,
3824
+ "step": 612
3825
+ },
3826
+ {
3827
+ "epoch": 16.56,
3828
+ "learning_rate": 9.27536231884058e-06,
3829
+ "loss": 1.1835,
3830
+ "step": 613
3831
+ },
3832
+ {
3833
+ "epoch": 16.59,
3834
+ "learning_rate": 9.202898550724638e-06,
3835
+ "loss": 1.2439,
3836
+ "step": 614
3837
+ },
3838
+ {
3839
+ "epoch": 16.61,
3840
+ "learning_rate": 9.130434782608697e-06,
3841
+ "loss": 1.2909,
3842
+ "step": 615
3843
+ },
3844
+ {
3845
+ "epoch": 16.64,
3846
+ "learning_rate": 9.057971014492753e-06,
3847
+ "loss": 1.2473,
3848
+ "step": 616
3849
+ },
3850
+ {
3851
+ "epoch": 16.67,
3852
+ "learning_rate": 8.985507246376812e-06,
3853
+ "loss": 1.2172,
3854
+ "step": 617
3855
+ },
3856
+ {
3857
+ "epoch": 16.69,
3858
+ "learning_rate": 8.91304347826087e-06,
3859
+ "loss": 1.2669,
3860
+ "step": 618
3861
+ },
3862
+ {
3863
+ "epoch": 16.72,
3864
+ "learning_rate": 8.840579710144927e-06,
3865
+ "loss": 1.2826,
3866
+ "step": 619
3867
+ },
3868
+ {
3869
+ "epoch": 16.75,
3870
+ "learning_rate": 8.768115942028986e-06,
3871
+ "loss": 1.1676,
3872
+ "step": 620
3873
+ },
3874
+ {
3875
+ "epoch": 16.77,
3876
+ "learning_rate": 8.695652173913044e-06,
3877
+ "loss": 1.2068,
3878
+ "step": 621
3879
+ },
3880
+ {
3881
+ "epoch": 16.8,
3882
+ "learning_rate": 8.623188405797101e-06,
3883
+ "loss": 1.2571,
3884
+ "step": 622
3885
+ },
3886
+ {
3887
+ "epoch": 16.83,
3888
+ "learning_rate": 8.55072463768116e-06,
3889
+ "loss": 1.2138,
3890
+ "step": 623
3891
+ },
3892
+ {
3893
+ "epoch": 16.85,
3894
+ "learning_rate": 8.478260869565217e-06,
3895
+ "loss": 1.1783,
3896
+ "step": 624
3897
+ },
3898
+ {
3899
+ "epoch": 16.88,
3900
+ "learning_rate": 8.405797101449275e-06,
3901
+ "loss": 1.2801,
3902
+ "step": 625
3903
+ },
3904
+ {
3905
+ "epoch": 16.91,
3906
+ "learning_rate": 8.333333333333334e-06,
3907
+ "loss": 1.2122,
3908
+ "step": 626
3909
+ },
3910
+ {
3911
+ "epoch": 16.93,
3912
+ "learning_rate": 8.26086956521739e-06,
3913
+ "loss": 1.2112,
3914
+ "step": 627
3915
+ },
3916
+ {
3917
+ "epoch": 16.96,
3918
+ "learning_rate": 8.188405797101449e-06,
3919
+ "loss": 1.2542,
3920
+ "step": 628
3921
+ },
3922
+ {
3923
+ "epoch": 16.99,
3924
+ "learning_rate": 8.115942028985508e-06,
3925
+ "loss": 1.2426,
3926
+ "step": 629
3927
+ },
3928
+ {
3929
+ "epoch": 16.99,
3930
+ "eval_accuracy": 0.7318157181571816,
3931
+ "eval_loss": 1.2249630689620972,
3932
+ "eval_runtime": 13.2585,
3933
+ "eval_samples_per_second": 37.41,
3934
+ "eval_steps_per_second": 4.676,
3935
+ "step": 629
3936
+ },
3937
+ {
3938
+ "epoch": 17.03,
3939
+ "learning_rate": 8.043478260869565e-06,
3940
+ "loss": 1.8211,
3941
+ "step": 630
3942
+ },
3943
+ {
3944
+ "epoch": 17.05,
3945
+ "learning_rate": 7.971014492753623e-06,
3946
+ "loss": 1.2233,
3947
+ "step": 631
3948
+ },
3949
+ {
3950
+ "epoch": 17.08,
3951
+ "learning_rate": 7.898550724637682e-06,
3952
+ "loss": 1.246,
3953
+ "step": 632
3954
+ },
3955
+ {
3956
+ "epoch": 17.11,
3957
+ "learning_rate": 7.82608695652174e-06,
3958
+ "loss": 1.2303,
3959
+ "step": 633
3960
+ },
3961
+ {
3962
+ "epoch": 17.13,
3963
+ "learning_rate": 7.753623188405799e-06,
3964
+ "loss": 1.2127,
3965
+ "step": 634
3966
+ },
3967
+ {
3968
+ "epoch": 17.16,
3969
+ "learning_rate": 7.681159420289856e-06,
3970
+ "loss": 1.1923,
3971
+ "step": 635
3972
+ },
3973
+ {
3974
+ "epoch": 17.19,
3975
+ "learning_rate": 7.608695652173914e-06,
3976
+ "loss": 1.2247,
3977
+ "step": 636
3978
+ },
3979
+ {
3980
+ "epoch": 17.21,
3981
+ "learning_rate": 7.536231884057972e-06,
3982
+ "loss": 1.2365,
3983
+ "step": 637
3984
+ },
3985
+ {
3986
+ "epoch": 17.24,
3987
+ "learning_rate": 7.4637681159420295e-06,
3988
+ "loss": 1.1985,
3989
+ "step": 638
3990
+ },
3991
+ {
3992
+ "epoch": 17.27,
3993
+ "learning_rate": 7.391304347826088e-06,
3994
+ "loss": 1.2298,
3995
+ "step": 639
3996
+ },
3997
+ {
3998
+ "epoch": 17.29,
3999
+ "learning_rate": 7.318840579710146e-06,
4000
+ "loss": 1.1936,
4001
+ "step": 640
4002
+ },
4003
+ {
4004
+ "epoch": 17.32,
4005
+ "learning_rate": 7.246376811594203e-06,
4006
+ "loss": 1.2108,
4007
+ "step": 641
4008
+ },
4009
+ {
4010
+ "epoch": 17.35,
4011
+ "learning_rate": 7.173913043478261e-06,
4012
+ "loss": 1.3043,
4013
+ "step": 642
4014
+ },
4015
+ {
4016
+ "epoch": 17.37,
4017
+ "learning_rate": 7.10144927536232e-06,
4018
+ "loss": 1.2297,
4019
+ "step": 643
4020
+ },
4021
+ {
4022
+ "epoch": 17.4,
4023
+ "learning_rate": 7.028985507246377e-06,
4024
+ "loss": 1.2587,
4025
+ "step": 644
4026
+ },
4027
+ {
4028
+ "epoch": 17.43,
4029
+ "learning_rate": 6.956521739130435e-06,
4030
+ "loss": 1.2347,
4031
+ "step": 645
4032
+ },
4033
+ {
4034
+ "epoch": 17.45,
4035
+ "learning_rate": 6.884057971014493e-06,
4036
+ "loss": 1.2355,
4037
+ "step": 646
4038
+ },
4039
+ {
4040
+ "epoch": 17.48,
4041
+ "learning_rate": 6.811594202898551e-06,
4042
+ "loss": 1.2061,
4043
+ "step": 647
4044
+ },
4045
+ {
4046
+ "epoch": 17.51,
4047
+ "learning_rate": 6.739130434782609e-06,
4048
+ "loss": 1.2036,
4049
+ "step": 648
4050
+ },
4051
+ {
4052
+ "epoch": 17.53,
4053
+ "learning_rate": 6.666666666666667e-06,
4054
+ "loss": 1.2495,
4055
+ "step": 649
4056
+ },
4057
+ {
4058
+ "epoch": 17.56,
4059
+ "learning_rate": 6.594202898550725e-06,
4060
+ "loss": 1.216,
4061
+ "step": 650
4062
+ },
4063
+ {
4064
+ "epoch": 17.59,
4065
+ "learning_rate": 6.521739130434783e-06,
4066
+ "loss": 1.223,
4067
+ "step": 651
4068
+ },
4069
+ {
4070
+ "epoch": 17.61,
4071
+ "learning_rate": 6.449275362318841e-06,
4072
+ "loss": 1.2645,
4073
+ "step": 652
4074
+ },
4075
+ {
4076
+ "epoch": 17.64,
4077
+ "learning_rate": 6.376811594202898e-06,
4078
+ "loss": 1.1786,
4079
+ "step": 653
4080
+ },
4081
+ {
4082
+ "epoch": 17.67,
4083
+ "learning_rate": 6.304347826086957e-06,
4084
+ "loss": 1.2447,
4085
+ "step": 654
4086
+ },
4087
+ {
4088
+ "epoch": 17.69,
4089
+ "learning_rate": 6.231884057971015e-06,
4090
+ "loss": 1.2604,
4091
+ "step": 655
4092
+ },
4093
+ {
4094
+ "epoch": 17.72,
4095
+ "learning_rate": 6.159420289855073e-06,
4096
+ "loss": 1.2098,
4097
+ "step": 656
4098
+ },
4099
+ {
4100
+ "epoch": 17.75,
4101
+ "learning_rate": 6.086956521739131e-06,
4102
+ "loss": 1.2294,
4103
+ "step": 657
4104
+ },
4105
+ {
4106
+ "epoch": 17.77,
4107
+ "learning_rate": 6.014492753623189e-06,
4108
+ "loss": 1.2241,
4109
+ "step": 658
4110
+ },
4111
+ {
4112
+ "epoch": 17.8,
4113
+ "learning_rate": 5.942028985507247e-06,
4114
+ "loss": 1.2454,
4115
+ "step": 659
4116
+ },
4117
+ {
4118
+ "epoch": 17.83,
4119
+ "learning_rate": 5.869565217391305e-06,
4120
+ "loss": 1.2278,
4121
+ "step": 660
4122
+ },
4123
+ {
4124
+ "epoch": 17.85,
4125
+ "learning_rate": 5.797101449275362e-06,
4126
+ "loss": 1.2871,
4127
+ "step": 661
4128
+ },
4129
+ {
4130
+ "epoch": 17.88,
4131
+ "learning_rate": 5.724637681159421e-06,
4132
+ "loss": 1.2772,
4133
+ "step": 662
4134
+ },
4135
+ {
4136
+ "epoch": 17.91,
4137
+ "learning_rate": 5.652173913043479e-06,
4138
+ "loss": 1.2415,
4139
+ "step": 663
4140
+ },
4141
+ {
4142
+ "epoch": 17.93,
4143
+ "learning_rate": 5.579710144927536e-06,
4144
+ "loss": 1.1829,
4145
+ "step": 664
4146
+ },
4147
+ {
4148
+ "epoch": 17.96,
4149
+ "learning_rate": 5.507246376811594e-06,
4150
+ "loss": 1.228,
4151
+ "step": 665
4152
+ },
4153
+ {
4154
+ "epoch": 17.99,
4155
+ "learning_rate": 5.4347826086956525e-06,
4156
+ "loss": 1.2096,
4157
+ "step": 666
4158
+ },
4159
+ {
4160
+ "epoch": 17.99,
4161
+ "eval_accuracy": 0.7352733398543636,
4162
+ "eval_loss": 1.2186108827590942,
4163
+ "eval_runtime": 13.2382,
4164
+ "eval_samples_per_second": 37.467,
4165
+ "eval_steps_per_second": 4.683,
4166
+ "step": 666
4167
+ },
4168
+ {
4169
+ "epoch": 18.03,
4170
+ "learning_rate": 5.36231884057971e-06,
4171
+ "loss": 1.812,
4172
+ "step": 667
4173
+ },
4174
+ {
4175
+ "epoch": 18.05,
4176
+ "learning_rate": 5.289855072463768e-06,
4177
+ "loss": 1.1976,
4178
+ "step": 668
4179
+ },
4180
+ {
4181
+ "epoch": 18.08,
4182
+ "learning_rate": 5.217391304347826e-06,
4183
+ "loss": 1.2782,
4184
+ "step": 669
4185
+ },
4186
+ {
4187
+ "epoch": 18.11,
4188
+ "learning_rate": 5.144927536231884e-06,
4189
+ "loss": 1.2505,
4190
+ "step": 670
4191
+ },
4192
+ {
4193
+ "epoch": 18.13,
4194
+ "learning_rate": 5.072463768115943e-06,
4195
+ "loss": 1.2352,
4196
+ "step": 671
4197
+ },
4198
+ {
4199
+ "epoch": 18.16,
4200
+ "learning_rate": 5e-06,
4201
+ "loss": 1.2544,
4202
+ "step": 672
4203
+ },
4204
+ {
4205
+ "epoch": 18.19,
4206
+ "learning_rate": 4.927536231884058e-06,
4207
+ "loss": 1.1739,
4208
+ "step": 673
4209
+ },
4210
+ {
4211
+ "epoch": 18.21,
4212
+ "learning_rate": 4.855072463768117e-06,
4213
+ "loss": 1.2626,
4214
+ "step": 674
4215
+ },
4216
+ {
4217
+ "epoch": 18.24,
4218
+ "learning_rate": 4.782608695652174e-06,
4219
+ "loss": 1.2034,
4220
+ "step": 675
4221
+ },
4222
+ {
4223
+ "epoch": 18.27,
4224
+ "learning_rate": 4.710144927536232e-06,
4225
+ "loss": 1.2818,
4226
+ "step": 676
4227
+ },
4228
+ {
4229
+ "epoch": 18.29,
4230
+ "learning_rate": 4.63768115942029e-06,
4231
+ "loss": 1.2388,
4232
+ "step": 677
4233
+ },
4234
+ {
4235
+ "epoch": 18.32,
4236
+ "learning_rate": 4.565217391304348e-06,
4237
+ "loss": 1.239,
4238
+ "step": 678
4239
+ },
4240
+ {
4241
+ "epoch": 18.35,
4242
+ "learning_rate": 4.492753623188406e-06,
4243
+ "loss": 1.1827,
4244
+ "step": 679
4245
+ },
4246
+ {
4247
+ "epoch": 18.37,
4248
+ "learning_rate": 4.420289855072464e-06,
4249
+ "loss": 1.1712,
4250
+ "step": 680
4251
+ },
4252
+ {
4253
+ "epoch": 18.4,
4254
+ "learning_rate": 4.347826086956522e-06,
4255
+ "loss": 1.2147,
4256
+ "step": 681
4257
+ },
4258
+ {
4259
+ "epoch": 18.43,
4260
+ "learning_rate": 4.27536231884058e-06,
4261
+ "loss": 1.2106,
4262
+ "step": 682
4263
+ },
4264
+ {
4265
+ "epoch": 18.45,
4266
+ "learning_rate": 4.202898550724638e-06,
4267
+ "loss": 1.2296,
4268
+ "step": 683
4269
+ },
4270
+ {
4271
+ "epoch": 18.48,
4272
+ "learning_rate": 4.130434782608695e-06,
4273
+ "loss": 1.2551,
4274
+ "step": 684
4275
+ },
4276
+ {
4277
+ "epoch": 18.51,
4278
+ "learning_rate": 4.057971014492754e-06,
4279
+ "loss": 1.235,
4280
+ "step": 685
4281
+ },
4282
+ {
4283
+ "epoch": 18.53,
4284
+ "learning_rate": 3.9855072463768115e-06,
4285
+ "loss": 1.2341,
4286
+ "step": 686
4287
+ },
4288
+ {
4289
+ "epoch": 18.56,
4290
+ "learning_rate": 3.91304347826087e-06,
4291
+ "loss": 1.189,
4292
+ "step": 687
4293
+ },
4294
+ {
4295
+ "epoch": 18.59,
4296
+ "learning_rate": 3.840579710144928e-06,
4297
+ "loss": 1.2241,
4298
+ "step": 688
4299
+ },
4300
+ {
4301
+ "epoch": 18.61,
4302
+ "learning_rate": 3.768115942028986e-06,
4303
+ "loss": 1.2627,
4304
+ "step": 689
4305
+ },
4306
+ {
4307
+ "epoch": 18.64,
4308
+ "learning_rate": 3.695652173913044e-06,
4309
+ "loss": 1.2259,
4310
+ "step": 690
4311
+ },
4312
+ {
4313
+ "epoch": 18.67,
4314
+ "learning_rate": 3.6231884057971017e-06,
4315
+ "loss": 1.2247,
4316
+ "step": 691
4317
+ },
4318
+ {
4319
+ "epoch": 18.69,
4320
+ "learning_rate": 3.55072463768116e-06,
4321
+ "loss": 1.2493,
4322
+ "step": 692
4323
+ },
4324
+ {
4325
+ "epoch": 18.72,
4326
+ "learning_rate": 3.4782608695652175e-06,
4327
+ "loss": 1.1931,
4328
+ "step": 693
4329
+ },
4330
+ {
4331
+ "epoch": 18.75,
4332
+ "learning_rate": 3.4057971014492756e-06,
4333
+ "loss": 1.2441,
4334
+ "step": 694
4335
+ },
4336
+ {
4337
+ "epoch": 18.77,
4338
+ "learning_rate": 3.3333333333333333e-06,
4339
+ "loss": 1.1884,
4340
+ "step": 695
4341
+ },
4342
+ {
4343
+ "epoch": 18.8,
4344
+ "learning_rate": 3.2608695652173914e-06,
4345
+ "loss": 1.1858,
4346
+ "step": 696
4347
+ },
4348
+ {
4349
+ "epoch": 18.83,
4350
+ "learning_rate": 3.188405797101449e-06,
4351
+ "loss": 1.2114,
4352
+ "step": 697
4353
+ },
4354
+ {
4355
+ "epoch": 18.85,
4356
+ "learning_rate": 3.1159420289855077e-06,
4357
+ "loss": 1.2482,
4358
+ "step": 698
4359
+ },
4360
+ {
4361
+ "epoch": 18.88,
4362
+ "learning_rate": 3.0434782608695654e-06,
4363
+ "loss": 1.2253,
4364
+ "step": 699
4365
+ },
4366
+ {
4367
+ "epoch": 18.91,
4368
+ "learning_rate": 2.9710144927536235e-06,
4369
+ "loss": 1.2292,
4370
+ "step": 700
4371
+ },
4372
+ {
4373
+ "epoch": 18.93,
4374
+ "learning_rate": 2.898550724637681e-06,
4375
+ "loss": 1.2396,
4376
+ "step": 701
4377
+ },
4378
+ {
4379
+ "epoch": 18.96,
4380
+ "learning_rate": 2.8260869565217393e-06,
4381
+ "loss": 1.1881,
4382
+ "step": 702
4383
+ },
4384
+ {
4385
+ "epoch": 18.99,
4386
+ "learning_rate": 2.753623188405797e-06,
4387
+ "loss": 1.1961,
4388
+ "step": 703
4389
+ },
4390
+ {
4391
+ "epoch": 18.99,
4392
+ "eval_accuracy": 0.7360741986223355,
4393
+ "eval_loss": 1.2214456796646118,
4394
+ "eval_runtime": 13.2148,
4395
+ "eval_samples_per_second": 37.534,
4396
+ "eval_steps_per_second": 4.692,
4397
+ "step": 703
4398
+ },
4399
+ {
4400
+ "epoch": 19.03,
4401
+ "learning_rate": 2.681159420289855e-06,
4402
+ "loss": 1.8177,
4403
+ "step": 704
4404
+ },
4405
+ {
4406
+ "epoch": 19.05,
4407
+ "learning_rate": 2.608695652173913e-06,
4408
+ "loss": 1.2484,
4409
+ "step": 705
4410
+ },
4411
+ {
4412
+ "epoch": 19.08,
4413
+ "learning_rate": 2.5362318840579714e-06,
4414
+ "loss": 1.2075,
4415
+ "step": 706
4416
+ },
4417
+ {
4418
+ "epoch": 19.11,
4419
+ "learning_rate": 2.463768115942029e-06,
4420
+ "loss": 1.244,
4421
+ "step": 707
4422
+ },
4423
+ {
4424
+ "epoch": 19.13,
4425
+ "learning_rate": 2.391304347826087e-06,
4426
+ "loss": 1.2501,
4427
+ "step": 708
4428
+ },
4429
+ {
4430
+ "epoch": 19.16,
4431
+ "learning_rate": 2.318840579710145e-06,
4432
+ "loss": 1.2372,
4433
+ "step": 709
4434
+ },
4435
+ {
4436
+ "epoch": 19.19,
4437
+ "learning_rate": 2.246376811594203e-06,
4438
+ "loss": 1.212,
4439
+ "step": 710
4440
+ },
4441
+ {
4442
+ "epoch": 19.21,
4443
+ "learning_rate": 2.173913043478261e-06,
4444
+ "loss": 1.2428,
4445
+ "step": 711
4446
+ },
4447
+ {
4448
+ "epoch": 19.24,
4449
+ "learning_rate": 2.101449275362319e-06,
4450
+ "loss": 1.2565,
4451
+ "step": 712
4452
+ },
4453
+ {
4454
+ "epoch": 19.27,
4455
+ "learning_rate": 2.028985507246377e-06,
4456
+ "loss": 1.2166,
4457
+ "step": 713
4458
+ },
4459
+ {
4460
+ "epoch": 19.29,
4461
+ "learning_rate": 1.956521739130435e-06,
4462
+ "loss": 1.267,
4463
+ "step": 714
4464
+ },
4465
+ {
4466
+ "epoch": 19.32,
4467
+ "learning_rate": 1.884057971014493e-06,
4468
+ "loss": 1.1961,
4469
+ "step": 715
4470
+ },
4471
+ {
4472
+ "epoch": 19.35,
4473
+ "learning_rate": 1.8115942028985508e-06,
4474
+ "loss": 1.2229,
4475
+ "step": 716
4476
+ },
4477
+ {
4478
+ "epoch": 19.37,
4479
+ "learning_rate": 1.7391304347826088e-06,
4480
+ "loss": 1.246,
4481
+ "step": 717
4482
+ },
4483
+ {
4484
+ "epoch": 19.4,
4485
+ "learning_rate": 1.6666666666666667e-06,
4486
+ "loss": 1.2767,
4487
+ "step": 718
4488
+ },
4489
+ {
4490
+ "epoch": 19.43,
4491
+ "learning_rate": 1.5942028985507246e-06,
4492
+ "loss": 1.1875,
4493
+ "step": 719
4494
+ },
4495
+ {
4496
+ "epoch": 19.45,
4497
+ "learning_rate": 1.5217391304347827e-06,
4498
+ "loss": 1.2265,
4499
+ "step": 720
4500
+ },
4501
+ {
4502
+ "epoch": 19.48,
4503
+ "learning_rate": 1.4492753623188406e-06,
4504
+ "loss": 1.2228,
4505
+ "step": 721
4506
+ },
4507
+ {
4508
+ "epoch": 19.51,
4509
+ "learning_rate": 1.3768115942028985e-06,
4510
+ "loss": 1.2094,
4511
+ "step": 722
4512
+ },
4513
+ {
4514
+ "epoch": 19.53,
4515
+ "learning_rate": 1.3043478260869564e-06,
4516
+ "loss": 1.216,
4517
+ "step": 723
4518
+ },
4519
+ {
4520
+ "epoch": 19.56,
4521
+ "learning_rate": 1.2318840579710145e-06,
4522
+ "loss": 1.249,
4523
+ "step": 724
4524
+ },
4525
+ {
4526
+ "epoch": 19.59,
4527
+ "learning_rate": 1.1594202898550724e-06,
4528
+ "loss": 1.2173,
4529
+ "step": 725
4530
+ },
4531
+ {
4532
+ "epoch": 19.61,
4533
+ "learning_rate": 1.0869565217391306e-06,
4534
+ "loss": 1.2106,
4535
+ "step": 726
4536
+ },
4537
+ {
4538
+ "epoch": 19.64,
4539
+ "learning_rate": 1.0144927536231885e-06,
4540
+ "loss": 1.2393,
4541
+ "step": 727
4542
+ },
4543
+ {
4544
+ "epoch": 19.67,
4545
+ "learning_rate": 9.420289855072465e-07,
4546
+ "loss": 1.2367,
4547
+ "step": 728
4548
+ },
4549
+ {
4550
+ "epoch": 19.69,
4551
+ "learning_rate": 8.695652173913044e-07,
4552
+ "loss": 1.2037,
4553
+ "step": 729
4554
+ },
4555
+ {
4556
+ "epoch": 19.72,
4557
+ "learning_rate": 7.971014492753623e-07,
4558
+ "loss": 1.2176,
4559
+ "step": 730
4560
+ },
4561
+ {
4562
+ "epoch": 19.75,
4563
+ "learning_rate": 7.246376811594203e-07,
4564
+ "loss": 1.1534,
4565
+ "step": 731
4566
+ },
4567
+ {
4568
+ "epoch": 19.77,
4569
+ "learning_rate": 6.521739130434782e-07,
4570
+ "loss": 1.2168,
4571
+ "step": 732
4572
+ },
4573
+ {
4574
+ "epoch": 19.8,
4575
+ "learning_rate": 5.797101449275362e-07,
4576
+ "loss": 1.203,
4577
+ "step": 733
4578
+ },
4579
+ {
4580
+ "epoch": 19.83,
4581
+ "learning_rate": 5.072463768115942e-07,
4582
+ "loss": 1.2523,
4583
+ "step": 734
4584
+ },
4585
+ {
4586
+ "epoch": 19.85,
4587
+ "learning_rate": 4.347826086956522e-07,
4588
+ "loss": 1.2665,
4589
+ "step": 735
4590
+ },
4591
+ {
4592
+ "epoch": 19.88,
4593
+ "learning_rate": 3.6231884057971015e-07,
4594
+ "loss": 1.2114,
4595
+ "step": 736
4596
+ },
4597
+ {
4598
+ "epoch": 19.91,
4599
+ "learning_rate": 2.898550724637681e-07,
4600
+ "loss": 1.2425,
4601
+ "step": 737
4602
+ },
4603
+ {
4604
+ "epoch": 19.93,
4605
+ "learning_rate": 2.173913043478261e-07,
4606
+ "loss": 1.3178,
4607
+ "step": 738
4608
+ },
4609
+ {
4610
+ "epoch": 19.96,
4611
+ "learning_rate": 1.4492753623188405e-07,
4612
+ "loss": 1.2353,
4613
+ "step": 739
4614
+ },
4615
+ {
4616
+ "epoch": 19.99,
4617
+ "learning_rate": 7.246376811594203e-08,
4618
+ "loss": 1.2136,
4619
+ "step": 740
4620
+ },
4621
+ {
4622
+ "epoch": 19.99,
4623
+ "eval_accuracy": 0.7311184760057123,
4624
+ "eval_loss": 1.250640869140625,
4625
+ "eval_runtime": 13.2938,
4626
+ "eval_samples_per_second": 37.311,
4627
+ "eval_steps_per_second": 4.664,
4628
+ "step": 740
4629
+ },
4630
+ {
4631
+ "epoch": 19.99,
4632
+ "step": 740,
4633
+ "total_flos": 2.524663139915981e+16,
4634
+ "train_loss": 1.3203882475157043,
4635
+ "train_runtime": 9311.1992,
4636
+ "train_samples_per_second": 10.306,
4637
+ "train_steps_per_second": 0.079
4638
+ }
4639
+ ],
4640
+ "max_steps": 740,
4641
+ "num_train_epochs": 20,
4642
+ "total_flos": 2.524663139915981e+16,
4643
+ "trial_name": null,
4644
+ "trial_params": null
4645
+ }