anderloh commited on
Commit
8e6d934
·
verified ·
1 Parent(s): ea59d9e

End of training

Browse files
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
  base_model: anderloh/Hugginhface-master-wav2vec-pretreined-5-class-train-test
3
  tags:
 
4
  - generated_from_trainer
5
  metrics:
6
  - accuracy
@@ -14,9 +15,9 @@ should probably proofread and complete it, then remove this comment. -->
14
 
15
  # wav2vec2-5Class-Validation-Mobil
16
 
17
- This model is a fine-tuned version of [anderloh/Hugginhface-master-wav2vec-pretreined-5-class-train-test](https://huggingface.co/anderloh/Hugginhface-master-wav2vec-pretreined-5-class-train-test) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 1.2506
20
  - Accuracy: 0.5836
21
 
22
  ## Model description
 
1
  ---
2
  base_model: anderloh/Hugginhface-master-wav2vec-pretreined-5-class-train-test
3
  tags:
4
+ - audio-classification
5
  - generated_from_trainer
6
  metrics:
7
  - accuracy
 
15
 
16
  # wav2vec2-5Class-Validation-Mobil
17
 
18
+ This model is a fine-tuned version of [anderloh/Hugginhface-master-wav2vec-pretreined-5-class-train-test](https://huggingface.co/anderloh/Hugginhface-master-wav2vec-pretreined-5-class-train-test) on the anderloh/ValidateRes dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 1.2514
21
  - Accuracy: 0.5836
22
 
23
  ## Model description
all_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 276.92,
3
+ "eval_accuracy": 0.5836298932384342,
4
+ "eval_loss": 1.2513927221298218,
5
+ "eval_runtime": 4.3951,
6
+ "eval_samples_per_second": 63.934,
7
+ "eval_steps_per_second": 0.683,
8
+ "train_loss": 0.8810926691691081,
9
+ "train_runtime": 3759.0782,
10
+ "train_samples_per_second": 123.541,
11
+ "train_steps_per_second": 0.239
12
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 276.92,
3
+ "eval_accuracy": 0.5836298932384342,
4
+ "eval_loss": 1.2513927221298218,
5
+ "eval_runtime": 4.3951,
6
+ "eval_samples_per_second": 63.934,
7
+ "eval_steps_per_second": 0.683
8
+ }
runs/Jul13_20-32-33_ml6.hpc.uio.no/events.out.tfevents.1720899362.ml6.hpc.uio.no.4128034.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3778173a98ea9f7d3b60f4423177e2c14ca910de0ab74eb538c18836879e8e40
3
+ size 411
train_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 276.92,
3
+ "train_loss": 0.8810926691691081,
4
+ "train_runtime": 3759.0782,
5
+ "train_samples_per_second": 123.541,
6
+ "train_steps_per_second": 0.239
7
+ }
trainer_state.json ADDED
@@ -0,0 +1,2586 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.5836298932384342,
3
+ "best_model_checkpoint": "wav2vec2-5Class-Validation-Mobil/checkpoint-773",
4
+ "epoch": 276.9230769230769,
5
+ "eval_steps": 500,
6
+ "global_step": 900,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.92,
13
+ "eval_accuracy": 0.3202846975088968,
14
+ "eval_loss": 1.602386713027954,
15
+ "eval_runtime": 4.3468,
16
+ "eval_samples_per_second": 64.645,
17
+ "eval_steps_per_second": 0.69,
18
+ "step": 3
19
+ },
20
+ {
21
+ "epoch": 1.85,
22
+ "eval_accuracy": 0.3167259786476868,
23
+ "eval_loss": 1.6022303104400635,
24
+ "eval_runtime": 3.573,
25
+ "eval_samples_per_second": 78.645,
26
+ "eval_steps_per_second": 0.84,
27
+ "step": 6
28
+ },
29
+ {
30
+ "epoch": 2.77,
31
+ "eval_accuracy": 0.3167259786476868,
32
+ "eval_loss": 1.601974368095398,
33
+ "eval_runtime": 4.6151,
34
+ "eval_samples_per_second": 60.887,
35
+ "eval_steps_per_second": 0.65,
36
+ "step": 9
37
+ },
38
+ {
39
+ "epoch": 4.0,
40
+ "eval_accuracy": 0.3167259786476868,
41
+ "eval_loss": 1.6014597415924072,
42
+ "eval_runtime": 5.3659,
43
+ "eval_samples_per_second": 52.368,
44
+ "eval_steps_per_second": 0.559,
45
+ "step": 13
46
+ },
47
+ {
48
+ "epoch": 4.92,
49
+ "eval_accuracy": 0.3167259786476868,
50
+ "eval_loss": 1.6009386777877808,
51
+ "eval_runtime": 3.4504,
52
+ "eval_samples_per_second": 81.439,
53
+ "eval_steps_per_second": 0.869,
54
+ "step": 16
55
+ },
56
+ {
57
+ "epoch": 5.85,
58
+ "eval_accuracy": 0.31316725978647686,
59
+ "eval_loss": 1.6003268957138062,
60
+ "eval_runtime": 4.2937,
61
+ "eval_samples_per_second": 65.445,
62
+ "eval_steps_per_second": 0.699,
63
+ "step": 19
64
+ },
65
+ {
66
+ "epoch": 6.77,
67
+ "eval_accuracy": 0.30604982206405695,
68
+ "eval_loss": 1.5995941162109375,
69
+ "eval_runtime": 3.7057,
70
+ "eval_samples_per_second": 75.828,
71
+ "eval_steps_per_second": 0.81,
72
+ "step": 22
73
+ },
74
+ {
75
+ "epoch": 8.0,
76
+ "eval_accuracy": 0.298932384341637,
77
+ "eval_loss": 1.5984183549880981,
78
+ "eval_runtime": 4.6458,
79
+ "eval_samples_per_second": 60.484,
80
+ "eval_steps_per_second": 0.646,
81
+ "step": 26
82
+ },
83
+ {
84
+ "epoch": 8.92,
85
+ "eval_accuracy": 0.2918149466192171,
86
+ "eval_loss": 1.5974235534667969,
87
+ "eval_runtime": 5.0303,
88
+ "eval_samples_per_second": 55.861,
89
+ "eval_steps_per_second": 0.596,
90
+ "step": 29
91
+ },
92
+ {
93
+ "epoch": 9.85,
94
+ "eval_accuracy": 0.27402135231316727,
95
+ "eval_loss": 1.596360445022583,
96
+ "eval_runtime": 3.3268,
97
+ "eval_samples_per_second": 84.465,
98
+ "eval_steps_per_second": 0.902,
99
+ "step": 32
100
+ },
101
+ {
102
+ "epoch": 10.77,
103
+ "eval_accuracy": 0.2597864768683274,
104
+ "eval_loss": 1.5951836109161377,
105
+ "eval_runtime": 3.1882,
106
+ "eval_samples_per_second": 88.138,
107
+ "eval_steps_per_second": 0.941,
108
+ "step": 35
109
+ },
110
+ {
111
+ "epoch": 12.0,
112
+ "eval_accuracy": 0.26334519572953735,
113
+ "eval_loss": 1.593432903289795,
114
+ "eval_runtime": 4.2078,
115
+ "eval_samples_per_second": 66.78,
116
+ "eval_steps_per_second": 0.713,
117
+ "step": 39
118
+ },
119
+ {
120
+ "epoch": 12.92,
121
+ "eval_accuracy": 0.27402135231316727,
122
+ "eval_loss": 1.5920255184173584,
123
+ "eval_runtime": 4.9074,
124
+ "eval_samples_per_second": 57.261,
125
+ "eval_steps_per_second": 0.611,
126
+ "step": 42
127
+ },
128
+ {
129
+ "epoch": 13.85,
130
+ "eval_accuracy": 0.298932384341637,
131
+ "eval_loss": 1.5904992818832397,
132
+ "eval_runtime": 5.4737,
133
+ "eval_samples_per_second": 51.336,
134
+ "eval_steps_per_second": 0.548,
135
+ "step": 45
136
+ },
137
+ {
138
+ "epoch": 14.77,
139
+ "eval_accuracy": 0.298932384341637,
140
+ "eval_loss": 1.5889027118682861,
141
+ "eval_runtime": 5.4844,
142
+ "eval_samples_per_second": 51.236,
143
+ "eval_steps_per_second": 0.547,
144
+ "step": 48
145
+ },
146
+ {
147
+ "epoch": 16.0,
148
+ "eval_accuracy": 0.2846975088967972,
149
+ "eval_loss": 1.5867795944213867,
150
+ "eval_runtime": 4.8027,
151
+ "eval_samples_per_second": 58.508,
152
+ "eval_steps_per_second": 0.625,
153
+ "step": 52
154
+ },
155
+ {
156
+ "epoch": 16.92,
157
+ "eval_accuracy": 0.2846975088967972,
158
+ "eval_loss": 1.5850844383239746,
159
+ "eval_runtime": 4.5938,
160
+ "eval_samples_per_second": 61.169,
161
+ "eval_steps_per_second": 0.653,
162
+ "step": 55
163
+ },
164
+ {
165
+ "epoch": 17.85,
166
+ "eval_accuracy": 0.2846975088967972,
167
+ "eval_loss": 1.5833449363708496,
168
+ "eval_runtime": 3.4722,
169
+ "eval_samples_per_second": 80.929,
170
+ "eval_steps_per_second": 0.864,
171
+ "step": 58
172
+ },
173
+ {
174
+ "epoch": 18.77,
175
+ "eval_accuracy": 0.26334519572953735,
176
+ "eval_loss": 1.58156418800354,
177
+ "eval_runtime": 3.9515,
178
+ "eval_samples_per_second": 71.112,
179
+ "eval_steps_per_second": 0.759,
180
+ "step": 61
181
+ },
182
+ {
183
+ "epoch": 20.0,
184
+ "eval_accuracy": 0.24555160142348753,
185
+ "eval_loss": 1.579047441482544,
186
+ "eval_runtime": 4.2125,
187
+ "eval_samples_per_second": 66.707,
188
+ "eval_steps_per_second": 0.712,
189
+ "step": 65
190
+ },
191
+ {
192
+ "epoch": 20.92,
193
+ "eval_accuracy": 0.24199288256227758,
194
+ "eval_loss": 1.576985478401184,
195
+ "eval_runtime": 4.6275,
196
+ "eval_samples_per_second": 60.724,
197
+ "eval_steps_per_second": 0.648,
198
+ "step": 68
199
+ },
200
+ {
201
+ "epoch": 21.85,
202
+ "eval_accuracy": 0.23487544483985764,
203
+ "eval_loss": 1.574812650680542,
204
+ "eval_runtime": 4.9061,
205
+ "eval_samples_per_second": 57.275,
206
+ "eval_steps_per_second": 0.611,
207
+ "step": 71
208
+ },
209
+ {
210
+ "epoch": 22.77,
211
+ "eval_accuracy": 0.2313167259786477,
212
+ "eval_loss": 1.5727591514587402,
213
+ "eval_runtime": 5.6003,
214
+ "eval_samples_per_second": 50.176,
215
+ "eval_steps_per_second": 0.536,
216
+ "step": 74
217
+ },
218
+ {
219
+ "epoch": 24.0,
220
+ "eval_accuracy": 0.2277580071174377,
221
+ "eval_loss": 1.5699430704116821,
222
+ "eval_runtime": 4.5057,
223
+ "eval_samples_per_second": 62.365,
224
+ "eval_steps_per_second": 0.666,
225
+ "step": 78
226
+ },
227
+ {
228
+ "epoch": 24.92,
229
+ "eval_accuracy": 0.2313167259786477,
230
+ "eval_loss": 1.567823052406311,
231
+ "eval_runtime": 4.5731,
232
+ "eval_samples_per_second": 61.446,
233
+ "eval_steps_per_second": 0.656,
234
+ "step": 81
235
+ },
236
+ {
237
+ "epoch": 25.85,
238
+ "eval_accuracy": 0.2313167259786477,
239
+ "eval_loss": 1.5657496452331543,
240
+ "eval_runtime": 4.3556,
241
+ "eval_samples_per_second": 64.515,
242
+ "eval_steps_per_second": 0.689,
243
+ "step": 84
244
+ },
245
+ {
246
+ "epoch": 26.77,
247
+ "eval_accuracy": 0.2313167259786477,
248
+ "eval_loss": 1.5637929439544678,
249
+ "eval_runtime": 5.9441,
250
+ "eval_samples_per_second": 47.274,
251
+ "eval_steps_per_second": 0.505,
252
+ "step": 87
253
+ },
254
+ {
255
+ "epoch": 28.0,
256
+ "eval_accuracy": 0.2313167259786477,
257
+ "eval_loss": 1.5613017082214355,
258
+ "eval_runtime": 4.5762,
259
+ "eval_samples_per_second": 61.404,
260
+ "eval_steps_per_second": 0.656,
261
+ "step": 91
262
+ },
263
+ {
264
+ "epoch": 28.92,
265
+ "eval_accuracy": 0.2313167259786477,
266
+ "eval_loss": 1.5597190856933594,
267
+ "eval_runtime": 4.1813,
268
+ "eval_samples_per_second": 67.204,
269
+ "eval_steps_per_second": 0.717,
270
+ "step": 94
271
+ },
272
+ {
273
+ "epoch": 29.85,
274
+ "eval_accuracy": 0.2313167259786477,
275
+ "eval_loss": 1.5587605237960815,
276
+ "eval_runtime": 4.6749,
277
+ "eval_samples_per_second": 60.108,
278
+ "eval_steps_per_second": 0.642,
279
+ "step": 97
280
+ },
281
+ {
282
+ "epoch": 30.77,
283
+ "grad_norm": 66708.1953125,
284
+ "learning_rate": 2.962962962962963e-05,
285
+ "loss": 1.561,
286
+ "step": 100
287
+ },
288
+ {
289
+ "epoch": 30.77,
290
+ "eval_accuracy": 0.2313167259786477,
291
+ "eval_loss": 1.5586402416229248,
292
+ "eval_runtime": 5.2059,
293
+ "eval_samples_per_second": 53.977,
294
+ "eval_steps_per_second": 0.576,
295
+ "step": 100
296
+ },
297
+ {
298
+ "epoch": 32.0,
299
+ "eval_accuracy": 0.2313167259786477,
300
+ "eval_loss": 1.5596789121627808,
301
+ "eval_runtime": 4.428,
302
+ "eval_samples_per_second": 63.46,
303
+ "eval_steps_per_second": 0.678,
304
+ "step": 104
305
+ },
306
+ {
307
+ "epoch": 32.92,
308
+ "eval_accuracy": 0.2313167259786477,
309
+ "eval_loss": 1.5619100332260132,
310
+ "eval_runtime": 3.3009,
311
+ "eval_samples_per_second": 85.128,
312
+ "eval_steps_per_second": 0.909,
313
+ "step": 107
314
+ },
315
+ {
316
+ "epoch": 33.85,
317
+ "eval_accuracy": 0.2313167259786477,
318
+ "eval_loss": 1.5660569667816162,
319
+ "eval_runtime": 3.371,
320
+ "eval_samples_per_second": 83.357,
321
+ "eval_steps_per_second": 0.89,
322
+ "step": 110
323
+ },
324
+ {
325
+ "epoch": 34.77,
326
+ "eval_accuracy": 0.2313167259786477,
327
+ "eval_loss": 1.5720349550247192,
328
+ "eval_runtime": 3.9013,
329
+ "eval_samples_per_second": 72.028,
330
+ "eval_steps_per_second": 0.769,
331
+ "step": 113
332
+ },
333
+ {
334
+ "epoch": 36.0,
335
+ "eval_accuracy": 0.2313167259786477,
336
+ "eval_loss": 1.5833308696746826,
337
+ "eval_runtime": 4.7161,
338
+ "eval_samples_per_second": 59.583,
339
+ "eval_steps_per_second": 0.636,
340
+ "step": 117
341
+ },
342
+ {
343
+ "epoch": 36.92,
344
+ "eval_accuracy": 0.2313167259786477,
345
+ "eval_loss": 1.5957212448120117,
346
+ "eval_runtime": 4.1977,
347
+ "eval_samples_per_second": 66.942,
348
+ "eval_steps_per_second": 0.715,
349
+ "step": 120
350
+ },
351
+ {
352
+ "epoch": 37.85,
353
+ "eval_accuracy": 0.2313167259786477,
354
+ "eval_loss": 1.6119521856307983,
355
+ "eval_runtime": 3.034,
356
+ "eval_samples_per_second": 92.618,
357
+ "eval_steps_per_second": 0.989,
358
+ "step": 123
359
+ },
360
+ {
361
+ "epoch": 38.77,
362
+ "eval_accuracy": 0.2313167259786477,
363
+ "eval_loss": 1.631814956665039,
364
+ "eval_runtime": 3.0252,
365
+ "eval_samples_per_second": 92.887,
366
+ "eval_steps_per_second": 0.992,
367
+ "step": 126
368
+ },
369
+ {
370
+ "epoch": 40.0,
371
+ "eval_accuracy": 0.2313167259786477,
372
+ "eval_loss": 1.663757085800171,
373
+ "eval_runtime": 3.243,
374
+ "eval_samples_per_second": 86.648,
375
+ "eval_steps_per_second": 0.925,
376
+ "step": 130
377
+ },
378
+ {
379
+ "epoch": 40.92,
380
+ "eval_accuracy": 0.2313167259786477,
381
+ "eval_loss": 1.6904593706130981,
382
+ "eval_runtime": 3.1943,
383
+ "eval_samples_per_second": 87.97,
384
+ "eval_steps_per_second": 0.939,
385
+ "step": 133
386
+ },
387
+ {
388
+ "epoch": 41.85,
389
+ "eval_accuracy": 0.2313167259786477,
390
+ "eval_loss": 1.7196571826934814,
391
+ "eval_runtime": 3.4764,
392
+ "eval_samples_per_second": 80.832,
393
+ "eval_steps_per_second": 0.863,
394
+ "step": 136
395
+ },
396
+ {
397
+ "epoch": 42.77,
398
+ "eval_accuracy": 0.2313167259786477,
399
+ "eval_loss": 1.750288724899292,
400
+ "eval_runtime": 3.415,
401
+ "eval_samples_per_second": 82.283,
402
+ "eval_steps_per_second": 0.878,
403
+ "step": 139
404
+ },
405
+ {
406
+ "epoch": 44.0,
407
+ "eval_accuracy": 0.2313167259786477,
408
+ "eval_loss": 1.7802847623825073,
409
+ "eval_runtime": 3.0779,
410
+ "eval_samples_per_second": 91.295,
411
+ "eval_steps_per_second": 0.975,
412
+ "step": 143
413
+ },
414
+ {
415
+ "epoch": 44.92,
416
+ "eval_accuracy": 0.2313167259786477,
417
+ "eval_loss": 1.7917312383651733,
418
+ "eval_runtime": 3.6229,
419
+ "eval_samples_per_second": 77.562,
420
+ "eval_steps_per_second": 0.828,
421
+ "step": 146
422
+ },
423
+ {
424
+ "epoch": 45.85,
425
+ "eval_accuracy": 0.2313167259786477,
426
+ "eval_loss": 1.7919948101043701,
427
+ "eval_runtime": 3.2733,
428
+ "eval_samples_per_second": 85.845,
429
+ "eval_steps_per_second": 0.916,
430
+ "step": 149
431
+ },
432
+ {
433
+ "epoch": 46.77,
434
+ "eval_accuracy": 0.2313167259786477,
435
+ "eval_loss": 1.7869282960891724,
436
+ "eval_runtime": 3.1081,
437
+ "eval_samples_per_second": 90.408,
438
+ "eval_steps_per_second": 0.965,
439
+ "step": 152
440
+ },
441
+ {
442
+ "epoch": 48.0,
443
+ "eval_accuracy": 0.2597864768683274,
444
+ "eval_loss": 1.7699986696243286,
445
+ "eval_runtime": 3.2526,
446
+ "eval_samples_per_second": 86.392,
447
+ "eval_steps_per_second": 0.922,
448
+ "step": 156
449
+ },
450
+ {
451
+ "epoch": 48.92,
452
+ "eval_accuracy": 0.27402135231316727,
453
+ "eval_loss": 1.7525370121002197,
454
+ "eval_runtime": 2.789,
455
+ "eval_samples_per_second": 100.754,
456
+ "eval_steps_per_second": 1.076,
457
+ "step": 159
458
+ },
459
+ {
460
+ "epoch": 49.85,
461
+ "eval_accuracy": 0.2775800711743772,
462
+ "eval_loss": 1.7406829595565796,
463
+ "eval_runtime": 3.5203,
464
+ "eval_samples_per_second": 79.822,
465
+ "eval_steps_per_second": 0.852,
466
+ "step": 162
467
+ },
468
+ {
469
+ "epoch": 50.77,
470
+ "eval_accuracy": 0.2918149466192171,
471
+ "eval_loss": 1.7306878566741943,
472
+ "eval_runtime": 3.4092,
473
+ "eval_samples_per_second": 82.424,
474
+ "eval_steps_per_second": 0.88,
475
+ "step": 165
476
+ },
477
+ {
478
+ "epoch": 52.0,
479
+ "eval_accuracy": 0.3096085409252669,
480
+ "eval_loss": 1.7241473197937012,
481
+ "eval_runtime": 3.4771,
482
+ "eval_samples_per_second": 80.815,
483
+ "eval_steps_per_second": 0.863,
484
+ "step": 169
485
+ },
486
+ {
487
+ "epoch": 52.92,
488
+ "eval_accuracy": 0.3167259786476868,
489
+ "eval_loss": 1.7242671251296997,
490
+ "eval_runtime": 3.338,
491
+ "eval_samples_per_second": 84.182,
492
+ "eval_steps_per_second": 0.899,
493
+ "step": 172
494
+ },
495
+ {
496
+ "epoch": 53.85,
497
+ "eval_accuracy": 0.3167259786476868,
498
+ "eval_loss": 1.7253814935684204,
499
+ "eval_runtime": 3.037,
500
+ "eval_samples_per_second": 92.524,
501
+ "eval_steps_per_second": 0.988,
502
+ "step": 175
503
+ },
504
+ {
505
+ "epoch": 54.77,
506
+ "eval_accuracy": 0.3238434163701068,
507
+ "eval_loss": 1.7232733964920044,
508
+ "eval_runtime": 3.3453,
509
+ "eval_samples_per_second": 84.0,
510
+ "eval_steps_per_second": 0.897,
511
+ "step": 178
512
+ },
513
+ {
514
+ "epoch": 56.0,
515
+ "eval_accuracy": 0.3238434163701068,
516
+ "eval_loss": 1.7224737405776978,
517
+ "eval_runtime": 4.1856,
518
+ "eval_samples_per_second": 67.135,
519
+ "eval_steps_per_second": 0.717,
520
+ "step": 182
521
+ },
522
+ {
523
+ "epoch": 56.92,
524
+ "eval_accuracy": 0.3274021352313167,
525
+ "eval_loss": 1.7187089920043945,
526
+ "eval_runtime": 4.0825,
527
+ "eval_samples_per_second": 68.831,
528
+ "eval_steps_per_second": 0.735,
529
+ "step": 185
530
+ },
531
+ {
532
+ "epoch": 57.85,
533
+ "eval_accuracy": 0.3274021352313167,
534
+ "eval_loss": 1.7172435522079468,
535
+ "eval_runtime": 4.3988,
536
+ "eval_samples_per_second": 63.881,
537
+ "eval_steps_per_second": 0.682,
538
+ "step": 188
539
+ },
540
+ {
541
+ "epoch": 58.77,
542
+ "eval_accuracy": 0.33451957295373663,
543
+ "eval_loss": 1.7145518064498901,
544
+ "eval_runtime": 3.5886,
545
+ "eval_samples_per_second": 78.303,
546
+ "eval_steps_per_second": 0.836,
547
+ "step": 191
548
+ },
549
+ {
550
+ "epoch": 60.0,
551
+ "eval_accuracy": 0.3487544483985765,
552
+ "eval_loss": 1.711957573890686,
553
+ "eval_runtime": 3.0988,
554
+ "eval_samples_per_second": 90.681,
555
+ "eval_steps_per_second": 0.968,
556
+ "step": 195
557
+ },
558
+ {
559
+ "epoch": 60.92,
560
+ "eval_accuracy": 0.35587188612099646,
561
+ "eval_loss": 1.7048858404159546,
562
+ "eval_runtime": 3.3244,
563
+ "eval_samples_per_second": 84.526,
564
+ "eval_steps_per_second": 0.902,
565
+ "step": 198
566
+ },
567
+ {
568
+ "epoch": 61.54,
569
+ "grad_norm": 26972.24609375,
570
+ "learning_rate": 2.5925925925925925e-05,
571
+ "loss": 1.3094,
572
+ "step": 200
573
+ },
574
+ {
575
+ "epoch": 61.85,
576
+ "eval_accuracy": 0.3594306049822064,
577
+ "eval_loss": 1.702221155166626,
578
+ "eval_runtime": 2.9103,
579
+ "eval_samples_per_second": 96.553,
580
+ "eval_steps_per_second": 1.031,
581
+ "step": 201
582
+ },
583
+ {
584
+ "epoch": 62.77,
585
+ "eval_accuracy": 0.3736654804270463,
586
+ "eval_loss": 1.6912201642990112,
587
+ "eval_runtime": 3.4935,
588
+ "eval_samples_per_second": 80.435,
589
+ "eval_steps_per_second": 0.859,
590
+ "step": 204
591
+ },
592
+ {
593
+ "epoch": 64.0,
594
+ "eval_accuracy": 0.37722419928825623,
595
+ "eval_loss": 1.6797984838485718,
596
+ "eval_runtime": 3.0757,
597
+ "eval_samples_per_second": 91.361,
598
+ "eval_steps_per_second": 0.975,
599
+ "step": 208
600
+ },
601
+ {
602
+ "epoch": 64.92,
603
+ "eval_accuracy": 0.3807829181494662,
604
+ "eval_loss": 1.6687328815460205,
605
+ "eval_runtime": 3.281,
606
+ "eval_samples_per_second": 85.645,
607
+ "eval_steps_per_second": 0.914,
608
+ "step": 211
609
+ },
610
+ {
611
+ "epoch": 65.85,
612
+ "eval_accuracy": 0.38434163701067614,
613
+ "eval_loss": 1.6568727493286133,
614
+ "eval_runtime": 3.0158,
615
+ "eval_samples_per_second": 93.174,
616
+ "eval_steps_per_second": 0.995,
617
+ "step": 214
618
+ },
619
+ {
620
+ "epoch": 66.77,
621
+ "eval_accuracy": 0.3914590747330961,
622
+ "eval_loss": 1.642698049545288,
623
+ "eval_runtime": 2.9377,
624
+ "eval_samples_per_second": 95.654,
625
+ "eval_steps_per_second": 1.021,
626
+ "step": 217
627
+ },
628
+ {
629
+ "epoch": 68.0,
630
+ "eval_accuracy": 0.3914590747330961,
631
+ "eval_loss": 1.6301021575927734,
632
+ "eval_runtime": 2.9188,
633
+ "eval_samples_per_second": 96.272,
634
+ "eval_steps_per_second": 1.028,
635
+ "step": 221
636
+ },
637
+ {
638
+ "epoch": 68.92,
639
+ "eval_accuracy": 0.39501779359430605,
640
+ "eval_loss": 1.6217372417449951,
641
+ "eval_runtime": 3.1297,
642
+ "eval_samples_per_second": 89.784,
643
+ "eval_steps_per_second": 0.959,
644
+ "step": 224
645
+ },
646
+ {
647
+ "epoch": 69.85,
648
+ "eval_accuracy": 0.39501779359430605,
649
+ "eval_loss": 1.6203086376190186,
650
+ "eval_runtime": 3.3261,
651
+ "eval_samples_per_second": 84.482,
652
+ "eval_steps_per_second": 0.902,
653
+ "step": 227
654
+ },
655
+ {
656
+ "epoch": 70.77,
657
+ "eval_accuracy": 0.39501779359430605,
658
+ "eval_loss": 1.6257439851760864,
659
+ "eval_runtime": 3.1941,
660
+ "eval_samples_per_second": 87.974,
661
+ "eval_steps_per_second": 0.939,
662
+ "step": 230
663
+ },
664
+ {
665
+ "epoch": 72.0,
666
+ "eval_accuracy": 0.40213523131672596,
667
+ "eval_loss": 1.6192444562911987,
668
+ "eval_runtime": 2.8716,
669
+ "eval_samples_per_second": 97.855,
670
+ "eval_steps_per_second": 1.045,
671
+ "step": 234
672
+ },
673
+ {
674
+ "epoch": 72.92,
675
+ "eval_accuracy": 0.4092526690391459,
676
+ "eval_loss": 1.6044347286224365,
677
+ "eval_runtime": 3.3231,
678
+ "eval_samples_per_second": 84.559,
679
+ "eval_steps_per_second": 0.903,
680
+ "step": 237
681
+ },
682
+ {
683
+ "epoch": 73.85,
684
+ "eval_accuracy": 0.4306049822064057,
685
+ "eval_loss": 1.5868154764175415,
686
+ "eval_runtime": 3.0078,
687
+ "eval_samples_per_second": 93.422,
688
+ "eval_steps_per_second": 0.997,
689
+ "step": 240
690
+ },
691
+ {
692
+ "epoch": 74.77,
693
+ "eval_accuracy": 0.4377224199288256,
694
+ "eval_loss": 1.5786783695220947,
695
+ "eval_runtime": 3.1108,
696
+ "eval_samples_per_second": 90.332,
697
+ "eval_steps_per_second": 0.964,
698
+ "step": 243
699
+ },
700
+ {
701
+ "epoch": 76.0,
702
+ "eval_accuracy": 0.43416370106761565,
703
+ "eval_loss": 1.5762073993682861,
704
+ "eval_runtime": 4.8033,
705
+ "eval_samples_per_second": 58.501,
706
+ "eval_steps_per_second": 0.625,
707
+ "step": 247
708
+ },
709
+ {
710
+ "epoch": 76.92,
711
+ "eval_accuracy": 0.4377224199288256,
712
+ "eval_loss": 1.5717052221298218,
713
+ "eval_runtime": 4.9388,
714
+ "eval_samples_per_second": 56.896,
715
+ "eval_steps_per_second": 0.607,
716
+ "step": 250
717
+ },
718
+ {
719
+ "epoch": 77.85,
720
+ "eval_accuracy": 0.43416370106761565,
721
+ "eval_loss": 1.5673516988754272,
722
+ "eval_runtime": 3.5439,
723
+ "eval_samples_per_second": 79.29,
724
+ "eval_steps_per_second": 0.847,
725
+ "step": 253
726
+ },
727
+ {
728
+ "epoch": 78.77,
729
+ "eval_accuracy": 0.42704626334519574,
730
+ "eval_loss": 1.5683715343475342,
731
+ "eval_runtime": 2.9479,
732
+ "eval_samples_per_second": 95.323,
733
+ "eval_steps_per_second": 1.018,
734
+ "step": 256
735
+ },
736
+ {
737
+ "epoch": 80.0,
738
+ "eval_accuracy": 0.42704626334519574,
739
+ "eval_loss": 1.5619009733200073,
740
+ "eval_runtime": 3.2494,
741
+ "eval_samples_per_second": 86.478,
742
+ "eval_steps_per_second": 0.923,
743
+ "step": 260
744
+ },
745
+ {
746
+ "epoch": 80.92,
747
+ "eval_accuracy": 0.4306049822064057,
748
+ "eval_loss": 1.5554527044296265,
749
+ "eval_runtime": 3.0649,
750
+ "eval_samples_per_second": 91.683,
751
+ "eval_steps_per_second": 0.979,
752
+ "step": 263
753
+ },
754
+ {
755
+ "epoch": 81.85,
756
+ "eval_accuracy": 0.43416370106761565,
757
+ "eval_loss": 1.550489068031311,
758
+ "eval_runtime": 3.1587,
759
+ "eval_samples_per_second": 88.96,
760
+ "eval_steps_per_second": 0.95,
761
+ "step": 266
762
+ },
763
+ {
764
+ "epoch": 82.77,
765
+ "eval_accuracy": 0.4412811387900356,
766
+ "eval_loss": 1.5385645627975464,
767
+ "eval_runtime": 3.1715,
768
+ "eval_samples_per_second": 88.601,
769
+ "eval_steps_per_second": 0.946,
770
+ "step": 269
771
+ },
772
+ {
773
+ "epoch": 84.0,
774
+ "eval_accuracy": 0.4377224199288256,
775
+ "eval_loss": 1.536201000213623,
776
+ "eval_runtime": 3.2602,
777
+ "eval_samples_per_second": 86.191,
778
+ "eval_steps_per_second": 0.92,
779
+ "step": 273
780
+ },
781
+ {
782
+ "epoch": 84.92,
783
+ "eval_accuracy": 0.43416370106761565,
784
+ "eval_loss": 1.5410619974136353,
785
+ "eval_runtime": 2.9845,
786
+ "eval_samples_per_second": 94.153,
787
+ "eval_steps_per_second": 1.005,
788
+ "step": 276
789
+ },
790
+ {
791
+ "epoch": 85.85,
792
+ "eval_accuracy": 0.43416370106761565,
793
+ "eval_loss": 1.5452691316604614,
794
+ "eval_runtime": 3.4013,
795
+ "eval_samples_per_second": 82.616,
796
+ "eval_steps_per_second": 0.882,
797
+ "step": 279
798
+ },
799
+ {
800
+ "epoch": 86.77,
801
+ "eval_accuracy": 0.42704626334519574,
802
+ "eval_loss": 1.5611252784729004,
803
+ "eval_runtime": 2.9135,
804
+ "eval_samples_per_second": 96.447,
805
+ "eval_steps_per_second": 1.03,
806
+ "step": 282
807
+ },
808
+ {
809
+ "epoch": 88.0,
810
+ "eval_accuracy": 0.4199288256227758,
811
+ "eval_loss": 1.5766078233718872,
812
+ "eval_runtime": 2.8634,
813
+ "eval_samples_per_second": 98.135,
814
+ "eval_steps_per_second": 1.048,
815
+ "step": 286
816
+ },
817
+ {
818
+ "epoch": 88.92,
819
+ "eval_accuracy": 0.4199288256227758,
820
+ "eval_loss": 1.5781065225601196,
821
+ "eval_runtime": 3.1014,
822
+ "eval_samples_per_second": 90.606,
823
+ "eval_steps_per_second": 0.967,
824
+ "step": 289
825
+ },
826
+ {
827
+ "epoch": 89.85,
828
+ "eval_accuracy": 0.4234875444839858,
829
+ "eval_loss": 1.5674538612365723,
830
+ "eval_runtime": 3.5418,
831
+ "eval_samples_per_second": 79.339,
832
+ "eval_steps_per_second": 0.847,
833
+ "step": 292
834
+ },
835
+ {
836
+ "epoch": 90.77,
837
+ "eval_accuracy": 0.42704626334519574,
838
+ "eval_loss": 1.558840036392212,
839
+ "eval_runtime": 4.5717,
840
+ "eval_samples_per_second": 61.464,
841
+ "eval_steps_per_second": 0.656,
842
+ "step": 295
843
+ },
844
+ {
845
+ "epoch": 92.0,
846
+ "eval_accuracy": 0.42704626334519574,
847
+ "eval_loss": 1.5495978593826294,
848
+ "eval_runtime": 2.971,
849
+ "eval_samples_per_second": 94.581,
850
+ "eval_steps_per_second": 1.01,
851
+ "step": 299
852
+ },
853
+ {
854
+ "epoch": 92.31,
855
+ "grad_norm": 27984.919921875,
856
+ "learning_rate": 2.222222222222222e-05,
857
+ "loss": 1.0538,
858
+ "step": 300
859
+ },
860
+ {
861
+ "epoch": 92.92,
862
+ "eval_accuracy": 0.42704626334519574,
863
+ "eval_loss": 1.5492929220199585,
864
+ "eval_runtime": 3.229,
865
+ "eval_samples_per_second": 87.023,
866
+ "eval_steps_per_second": 0.929,
867
+ "step": 302
868
+ },
869
+ {
870
+ "epoch": 93.85,
871
+ "eval_accuracy": 0.4234875444839858,
872
+ "eval_loss": 1.5539740324020386,
873
+ "eval_runtime": 2.993,
874
+ "eval_samples_per_second": 93.886,
875
+ "eval_steps_per_second": 1.002,
876
+ "step": 305
877
+ },
878
+ {
879
+ "epoch": 94.77,
880
+ "eval_accuracy": 0.41637010676156583,
881
+ "eval_loss": 1.5620365142822266,
882
+ "eval_runtime": 3.5102,
883
+ "eval_samples_per_second": 80.052,
884
+ "eval_steps_per_second": 0.855,
885
+ "step": 308
886
+ },
887
+ {
888
+ "epoch": 96.0,
889
+ "eval_accuracy": 0.41637010676156583,
890
+ "eval_loss": 1.564751148223877,
891
+ "eval_runtime": 3.7132,
892
+ "eval_samples_per_second": 75.677,
893
+ "eval_steps_per_second": 0.808,
894
+ "step": 312
895
+ },
896
+ {
897
+ "epoch": 96.92,
898
+ "eval_accuracy": 0.41637010676156583,
899
+ "eval_loss": 1.561686396598816,
900
+ "eval_runtime": 4.9316,
901
+ "eval_samples_per_second": 56.98,
902
+ "eval_steps_per_second": 0.608,
903
+ "step": 315
904
+ },
905
+ {
906
+ "epoch": 97.85,
907
+ "eval_accuracy": 0.4234875444839858,
908
+ "eval_loss": 1.5461145639419556,
909
+ "eval_runtime": 3.1512,
910
+ "eval_samples_per_second": 89.173,
911
+ "eval_steps_per_second": 0.952,
912
+ "step": 318
913
+ },
914
+ {
915
+ "epoch": 98.77,
916
+ "eval_accuracy": 0.4306049822064057,
917
+ "eval_loss": 1.5348182916641235,
918
+ "eval_runtime": 4.3294,
919
+ "eval_samples_per_second": 64.906,
920
+ "eval_steps_per_second": 0.693,
921
+ "step": 321
922
+ },
923
+ {
924
+ "epoch": 100.0,
925
+ "eval_accuracy": 0.4306049822064057,
926
+ "eval_loss": 1.5345805883407593,
927
+ "eval_runtime": 3.3762,
928
+ "eval_samples_per_second": 83.23,
929
+ "eval_steps_per_second": 0.889,
930
+ "step": 325
931
+ },
932
+ {
933
+ "epoch": 100.92,
934
+ "eval_accuracy": 0.41637010676156583,
935
+ "eval_loss": 1.5465843677520752,
936
+ "eval_runtime": 3.8288,
937
+ "eval_samples_per_second": 73.391,
938
+ "eval_steps_per_second": 0.784,
939
+ "step": 328
940
+ },
941
+ {
942
+ "epoch": 101.85,
943
+ "eval_accuracy": 0.4128113879003559,
944
+ "eval_loss": 1.5547189712524414,
945
+ "eval_runtime": 4.3332,
946
+ "eval_samples_per_second": 64.848,
947
+ "eval_steps_per_second": 0.692,
948
+ "step": 331
949
+ },
950
+ {
951
+ "epoch": 102.77,
952
+ "eval_accuracy": 0.4128113879003559,
953
+ "eval_loss": 1.5559605360031128,
954
+ "eval_runtime": 3.2588,
955
+ "eval_samples_per_second": 86.229,
956
+ "eval_steps_per_second": 0.921,
957
+ "step": 334
958
+ },
959
+ {
960
+ "epoch": 104.0,
961
+ "eval_accuracy": 0.4306049822064057,
962
+ "eval_loss": 1.5315039157867432,
963
+ "eval_runtime": 4.5744,
964
+ "eval_samples_per_second": 61.429,
965
+ "eval_steps_per_second": 0.656,
966
+ "step": 338
967
+ },
968
+ {
969
+ "epoch": 104.92,
970
+ "eval_accuracy": 0.44483985765124556,
971
+ "eval_loss": 1.5124022960662842,
972
+ "eval_runtime": 3.3067,
973
+ "eval_samples_per_second": 84.979,
974
+ "eval_steps_per_second": 0.907,
975
+ "step": 341
976
+ },
977
+ {
978
+ "epoch": 105.85,
979
+ "eval_accuracy": 0.44483985765124556,
980
+ "eval_loss": 1.5044087171554565,
981
+ "eval_runtime": 3.9949,
982
+ "eval_samples_per_second": 70.341,
983
+ "eval_steps_per_second": 0.751,
984
+ "step": 344
985
+ },
986
+ {
987
+ "epoch": 106.77,
988
+ "eval_accuracy": 0.4483985765124555,
989
+ "eval_loss": 1.5010027885437012,
990
+ "eval_runtime": 3.5698,
991
+ "eval_samples_per_second": 78.716,
992
+ "eval_steps_per_second": 0.84,
993
+ "step": 347
994
+ },
995
+ {
996
+ "epoch": 108.0,
997
+ "eval_accuracy": 0.44483985765124556,
998
+ "eval_loss": 1.5004721879959106,
999
+ "eval_runtime": 2.9807,
1000
+ "eval_samples_per_second": 94.273,
1001
+ "eval_steps_per_second": 1.006,
1002
+ "step": 351
1003
+ },
1004
+ {
1005
+ "epoch": 108.92,
1006
+ "eval_accuracy": 0.44483985765124556,
1007
+ "eval_loss": 1.499153971672058,
1008
+ "eval_runtime": 2.8868,
1009
+ "eval_samples_per_second": 97.339,
1010
+ "eval_steps_per_second": 1.039,
1011
+ "step": 354
1012
+ },
1013
+ {
1014
+ "epoch": 109.85,
1015
+ "eval_accuracy": 0.4483985765124555,
1016
+ "eval_loss": 1.4993938207626343,
1017
+ "eval_runtime": 3.2052,
1018
+ "eval_samples_per_second": 87.67,
1019
+ "eval_steps_per_second": 0.936,
1020
+ "step": 357
1021
+ },
1022
+ {
1023
+ "epoch": 110.77,
1024
+ "eval_accuracy": 0.45195729537366547,
1025
+ "eval_loss": 1.4987653493881226,
1026
+ "eval_runtime": 3.3473,
1027
+ "eval_samples_per_second": 83.949,
1028
+ "eval_steps_per_second": 0.896,
1029
+ "step": 360
1030
+ },
1031
+ {
1032
+ "epoch": 112.0,
1033
+ "eval_accuracy": 0.46619217081850534,
1034
+ "eval_loss": 1.5004514455795288,
1035
+ "eval_runtime": 2.8714,
1036
+ "eval_samples_per_second": 97.862,
1037
+ "eval_steps_per_second": 1.045,
1038
+ "step": 364
1039
+ },
1040
+ {
1041
+ "epoch": 112.92,
1042
+ "eval_accuracy": 0.47330960854092524,
1043
+ "eval_loss": 1.5010361671447754,
1044
+ "eval_runtime": 3.6886,
1045
+ "eval_samples_per_second": 76.182,
1046
+ "eval_steps_per_second": 0.813,
1047
+ "step": 367
1048
+ },
1049
+ {
1050
+ "epoch": 113.85,
1051
+ "eval_accuracy": 0.4697508896797153,
1052
+ "eval_loss": 1.4968541860580444,
1053
+ "eval_runtime": 3.5621,
1054
+ "eval_samples_per_second": 78.886,
1055
+ "eval_steps_per_second": 0.842,
1056
+ "step": 370
1057
+ },
1058
+ {
1059
+ "epoch": 114.77,
1060
+ "eval_accuracy": 0.47330960854092524,
1061
+ "eval_loss": 1.4775702953338623,
1062
+ "eval_runtime": 4.3842,
1063
+ "eval_samples_per_second": 64.093,
1064
+ "eval_steps_per_second": 0.684,
1065
+ "step": 373
1066
+ },
1067
+ {
1068
+ "epoch": 116.0,
1069
+ "eval_accuracy": 0.47686832740213525,
1070
+ "eval_loss": 1.4527899026870728,
1071
+ "eval_runtime": 4.7808,
1072
+ "eval_samples_per_second": 58.777,
1073
+ "eval_steps_per_second": 0.628,
1074
+ "step": 377
1075
+ },
1076
+ {
1077
+ "epoch": 116.92,
1078
+ "eval_accuracy": 0.49466192170818507,
1079
+ "eval_loss": 1.4394866228103638,
1080
+ "eval_runtime": 5.0753,
1081
+ "eval_samples_per_second": 55.366,
1082
+ "eval_steps_per_second": 0.591,
1083
+ "step": 380
1084
+ },
1085
+ {
1086
+ "epoch": 117.85,
1087
+ "eval_accuracy": 0.498220640569395,
1088
+ "eval_loss": 1.4310173988342285,
1089
+ "eval_runtime": 4.758,
1090
+ "eval_samples_per_second": 59.058,
1091
+ "eval_steps_per_second": 0.631,
1092
+ "step": 383
1093
+ },
1094
+ {
1095
+ "epoch": 118.77,
1096
+ "eval_accuracy": 0.49466192170818507,
1097
+ "eval_loss": 1.4314603805541992,
1098
+ "eval_runtime": 3.9673,
1099
+ "eval_samples_per_second": 70.829,
1100
+ "eval_steps_per_second": 0.756,
1101
+ "step": 386
1102
+ },
1103
+ {
1104
+ "epoch": 120.0,
1105
+ "eval_accuracy": 0.49466192170818507,
1106
+ "eval_loss": 1.4388599395751953,
1107
+ "eval_runtime": 4.1069,
1108
+ "eval_samples_per_second": 68.422,
1109
+ "eval_steps_per_second": 0.73,
1110
+ "step": 390
1111
+ },
1112
+ {
1113
+ "epoch": 120.92,
1114
+ "eval_accuracy": 0.498220640569395,
1115
+ "eval_loss": 1.4374699592590332,
1116
+ "eval_runtime": 5.1154,
1117
+ "eval_samples_per_second": 54.933,
1118
+ "eval_steps_per_second": 0.586,
1119
+ "step": 393
1120
+ },
1121
+ {
1122
+ "epoch": 121.85,
1123
+ "eval_accuracy": 0.498220640569395,
1124
+ "eval_loss": 1.4381343126296997,
1125
+ "eval_runtime": 4.1133,
1126
+ "eval_samples_per_second": 68.315,
1127
+ "eval_steps_per_second": 0.729,
1128
+ "step": 396
1129
+ },
1130
+ {
1131
+ "epoch": 122.77,
1132
+ "eval_accuracy": 0.498220640569395,
1133
+ "eval_loss": 1.4246776103973389,
1134
+ "eval_runtime": 3.9833,
1135
+ "eval_samples_per_second": 70.544,
1136
+ "eval_steps_per_second": 0.753,
1137
+ "step": 399
1138
+ },
1139
+ {
1140
+ "epoch": 123.08,
1141
+ "grad_norm": 31388.482421875,
1142
+ "learning_rate": 1.8518518518518518e-05,
1143
+ "loss": 0.8509,
1144
+ "step": 400
1145
+ },
1146
+ {
1147
+ "epoch": 124.0,
1148
+ "eval_accuracy": 0.498220640569395,
1149
+ "eval_loss": 1.4195659160614014,
1150
+ "eval_runtime": 4.1654,
1151
+ "eval_samples_per_second": 67.461,
1152
+ "eval_steps_per_second": 0.72,
1153
+ "step": 403
1154
+ },
1155
+ {
1156
+ "epoch": 124.92,
1157
+ "eval_accuracy": 0.505338078291815,
1158
+ "eval_loss": 1.4178649187088013,
1159
+ "eval_runtime": 5.0869,
1160
+ "eval_samples_per_second": 55.239,
1161
+ "eval_steps_per_second": 0.59,
1162
+ "step": 406
1163
+ },
1164
+ {
1165
+ "epoch": 125.85,
1166
+ "eval_accuracy": 0.505338078291815,
1167
+ "eval_loss": 1.40910804271698,
1168
+ "eval_runtime": 4.5242,
1169
+ "eval_samples_per_second": 62.11,
1170
+ "eval_steps_per_second": 0.663,
1171
+ "step": 409
1172
+ },
1173
+ {
1174
+ "epoch": 126.77,
1175
+ "eval_accuracy": 0.505338078291815,
1176
+ "eval_loss": 1.3957635164260864,
1177
+ "eval_runtime": 4.5377,
1178
+ "eval_samples_per_second": 61.926,
1179
+ "eval_steps_per_second": 0.661,
1180
+ "step": 412
1181
+ },
1182
+ {
1183
+ "epoch": 128.0,
1184
+ "eval_accuracy": 0.5088967971530249,
1185
+ "eval_loss": 1.3736003637313843,
1186
+ "eval_runtime": 3.6994,
1187
+ "eval_samples_per_second": 75.958,
1188
+ "eval_steps_per_second": 0.811,
1189
+ "step": 416
1190
+ },
1191
+ {
1192
+ "epoch": 128.92,
1193
+ "eval_accuracy": 0.5088967971530249,
1194
+ "eval_loss": 1.3661431074142456,
1195
+ "eval_runtime": 4.0248,
1196
+ "eval_samples_per_second": 69.817,
1197
+ "eval_steps_per_second": 0.745,
1198
+ "step": 419
1199
+ },
1200
+ {
1201
+ "epoch": 129.85,
1202
+ "eval_accuracy": 0.5124555160142349,
1203
+ "eval_loss": 1.369443416595459,
1204
+ "eval_runtime": 4.9876,
1205
+ "eval_samples_per_second": 56.34,
1206
+ "eval_steps_per_second": 0.601,
1207
+ "step": 422
1208
+ },
1209
+ {
1210
+ "epoch": 130.77,
1211
+ "eval_accuracy": 0.5124555160142349,
1212
+ "eval_loss": 1.3807623386383057,
1213
+ "eval_runtime": 3.5494,
1214
+ "eval_samples_per_second": 79.169,
1215
+ "eval_steps_per_second": 0.845,
1216
+ "step": 425
1217
+ },
1218
+ {
1219
+ "epoch": 132.0,
1220
+ "eval_accuracy": 0.5124555160142349,
1221
+ "eval_loss": 1.3818711042404175,
1222
+ "eval_runtime": 3.9503,
1223
+ "eval_samples_per_second": 71.134,
1224
+ "eval_steps_per_second": 0.759,
1225
+ "step": 429
1226
+ },
1227
+ {
1228
+ "epoch": 132.92,
1229
+ "eval_accuracy": 0.5124555160142349,
1230
+ "eval_loss": 1.3859163522720337,
1231
+ "eval_runtime": 4.2041,
1232
+ "eval_samples_per_second": 66.84,
1233
+ "eval_steps_per_second": 0.714,
1234
+ "step": 432
1235
+ },
1236
+ {
1237
+ "epoch": 133.85,
1238
+ "eval_accuracy": 0.5231316725978647,
1239
+ "eval_loss": 1.378004789352417,
1240
+ "eval_runtime": 3.8384,
1241
+ "eval_samples_per_second": 73.208,
1242
+ "eval_steps_per_second": 0.782,
1243
+ "step": 435
1244
+ },
1245
+ {
1246
+ "epoch": 134.77,
1247
+ "eval_accuracy": 0.5231316725978647,
1248
+ "eval_loss": 1.3696413040161133,
1249
+ "eval_runtime": 4.6334,
1250
+ "eval_samples_per_second": 60.646,
1251
+ "eval_steps_per_second": 0.647,
1252
+ "step": 438
1253
+ },
1254
+ {
1255
+ "epoch": 136.0,
1256
+ "eval_accuracy": 0.5302491103202847,
1257
+ "eval_loss": 1.3564013242721558,
1258
+ "eval_runtime": 4.002,
1259
+ "eval_samples_per_second": 70.215,
1260
+ "eval_steps_per_second": 0.75,
1261
+ "step": 442
1262
+ },
1263
+ {
1264
+ "epoch": 136.92,
1265
+ "eval_accuracy": 0.5338078291814946,
1266
+ "eval_loss": 1.3421210050582886,
1267
+ "eval_runtime": 4.0161,
1268
+ "eval_samples_per_second": 69.968,
1269
+ "eval_steps_per_second": 0.747,
1270
+ "step": 445
1271
+ },
1272
+ {
1273
+ "epoch": 137.85,
1274
+ "eval_accuracy": 0.5373665480427047,
1275
+ "eval_loss": 1.325627326965332,
1276
+ "eval_runtime": 4.156,
1277
+ "eval_samples_per_second": 67.613,
1278
+ "eval_steps_per_second": 0.722,
1279
+ "step": 448
1280
+ },
1281
+ {
1282
+ "epoch": 138.77,
1283
+ "eval_accuracy": 0.5373665480427047,
1284
+ "eval_loss": 1.3274290561676025,
1285
+ "eval_runtime": 3.9911,
1286
+ "eval_samples_per_second": 70.407,
1287
+ "eval_steps_per_second": 0.752,
1288
+ "step": 451
1289
+ },
1290
+ {
1291
+ "epoch": 140.0,
1292
+ "eval_accuracy": 0.5409252669039146,
1293
+ "eval_loss": 1.3401566743850708,
1294
+ "eval_runtime": 4.4088,
1295
+ "eval_samples_per_second": 63.736,
1296
+ "eval_steps_per_second": 0.68,
1297
+ "step": 455
1298
+ },
1299
+ {
1300
+ "epoch": 140.92,
1301
+ "eval_accuracy": 0.5409252669039146,
1302
+ "eval_loss": 1.351689338684082,
1303
+ "eval_runtime": 4.4409,
1304
+ "eval_samples_per_second": 63.276,
1305
+ "eval_steps_per_second": 0.676,
1306
+ "step": 458
1307
+ },
1308
+ {
1309
+ "epoch": 141.85,
1310
+ "eval_accuracy": 0.5409252669039146,
1311
+ "eval_loss": 1.3585495948791504,
1312
+ "eval_runtime": 3.7955,
1313
+ "eval_samples_per_second": 74.035,
1314
+ "eval_steps_per_second": 0.79,
1315
+ "step": 461
1316
+ },
1317
+ {
1318
+ "epoch": 142.77,
1319
+ "eval_accuracy": 0.5373665480427047,
1320
+ "eval_loss": 1.3592112064361572,
1321
+ "eval_runtime": 3.3552,
1322
+ "eval_samples_per_second": 83.75,
1323
+ "eval_steps_per_second": 0.894,
1324
+ "step": 464
1325
+ },
1326
+ {
1327
+ "epoch": 144.0,
1328
+ "eval_accuracy": 0.5480427046263345,
1329
+ "eval_loss": 1.3329293727874756,
1330
+ "eval_runtime": 5.3044,
1331
+ "eval_samples_per_second": 52.975,
1332
+ "eval_steps_per_second": 0.566,
1333
+ "step": 468
1334
+ },
1335
+ {
1336
+ "epoch": 144.92,
1337
+ "eval_accuracy": 0.5480427046263345,
1338
+ "eval_loss": 1.312560796737671,
1339
+ "eval_runtime": 4.319,
1340
+ "eval_samples_per_second": 65.061,
1341
+ "eval_steps_per_second": 0.695,
1342
+ "step": 471
1343
+ },
1344
+ {
1345
+ "epoch": 145.85,
1346
+ "eval_accuracy": 0.5444839857651246,
1347
+ "eval_loss": 1.3075566291809082,
1348
+ "eval_runtime": 3.9528,
1349
+ "eval_samples_per_second": 71.09,
1350
+ "eval_steps_per_second": 0.759,
1351
+ "step": 474
1352
+ },
1353
+ {
1354
+ "epoch": 146.77,
1355
+ "eval_accuracy": 0.5480427046263345,
1356
+ "eval_loss": 1.3146412372589111,
1357
+ "eval_runtime": 4.3249,
1358
+ "eval_samples_per_second": 64.973,
1359
+ "eval_steps_per_second": 0.694,
1360
+ "step": 477
1361
+ },
1362
+ {
1363
+ "epoch": 148.0,
1364
+ "eval_accuracy": 0.5444839857651246,
1365
+ "eval_loss": 1.3345069885253906,
1366
+ "eval_runtime": 3.9127,
1367
+ "eval_samples_per_second": 71.817,
1368
+ "eval_steps_per_second": 0.767,
1369
+ "step": 481
1370
+ },
1371
+ {
1372
+ "epoch": 148.92,
1373
+ "eval_accuracy": 0.5444839857651246,
1374
+ "eval_loss": 1.3408929109573364,
1375
+ "eval_runtime": 4.1463,
1376
+ "eval_samples_per_second": 67.771,
1377
+ "eval_steps_per_second": 0.724,
1378
+ "step": 484
1379
+ },
1380
+ {
1381
+ "epoch": 149.85,
1382
+ "eval_accuracy": 0.5444839857651246,
1383
+ "eval_loss": 1.3374032974243164,
1384
+ "eval_runtime": 4.2345,
1385
+ "eval_samples_per_second": 66.359,
1386
+ "eval_steps_per_second": 0.708,
1387
+ "step": 487
1388
+ },
1389
+ {
1390
+ "epoch": 150.77,
1391
+ "eval_accuracy": 0.5480427046263345,
1392
+ "eval_loss": 1.3227189779281616,
1393
+ "eval_runtime": 4.3006,
1394
+ "eval_samples_per_second": 65.339,
1395
+ "eval_steps_per_second": 0.698,
1396
+ "step": 490
1397
+ },
1398
+ {
1399
+ "epoch": 152.0,
1400
+ "eval_accuracy": 0.5444839857651246,
1401
+ "eval_loss": 1.3200651407241821,
1402
+ "eval_runtime": 4.4216,
1403
+ "eval_samples_per_second": 63.551,
1404
+ "eval_steps_per_second": 0.678,
1405
+ "step": 494
1406
+ },
1407
+ {
1408
+ "epoch": 152.92,
1409
+ "eval_accuracy": 0.5444839857651246,
1410
+ "eval_loss": 1.3174102306365967,
1411
+ "eval_runtime": 4.4898,
1412
+ "eval_samples_per_second": 62.586,
1413
+ "eval_steps_per_second": 0.668,
1414
+ "step": 497
1415
+ },
1416
+ {
1417
+ "epoch": 153.85,
1418
+ "grad_norm": 24984.7734375,
1419
+ "learning_rate": 1.4814814814814815e-05,
1420
+ "loss": 0.7118,
1421
+ "step": 500
1422
+ },
1423
+ {
1424
+ "epoch": 153.85,
1425
+ "eval_accuracy": 0.5444839857651246,
1426
+ "eval_loss": 1.3073471784591675,
1427
+ "eval_runtime": 4.2385,
1428
+ "eval_samples_per_second": 66.297,
1429
+ "eval_steps_per_second": 0.708,
1430
+ "step": 500
1431
+ },
1432
+ {
1433
+ "epoch": 154.77,
1434
+ "eval_accuracy": 0.5551601423487544,
1435
+ "eval_loss": 1.2983657121658325,
1436
+ "eval_runtime": 3.4569,
1437
+ "eval_samples_per_second": 81.286,
1438
+ "eval_steps_per_second": 0.868,
1439
+ "step": 503
1440
+ },
1441
+ {
1442
+ "epoch": 156.0,
1443
+ "eval_accuracy": 0.5516014234875445,
1444
+ "eval_loss": 1.2974605560302734,
1445
+ "eval_runtime": 4.3467,
1446
+ "eval_samples_per_second": 64.647,
1447
+ "eval_steps_per_second": 0.69,
1448
+ "step": 507
1449
+ },
1450
+ {
1451
+ "epoch": 156.92,
1452
+ "eval_accuracy": 0.5516014234875445,
1453
+ "eval_loss": 1.3027478456497192,
1454
+ "eval_runtime": 4.5106,
1455
+ "eval_samples_per_second": 62.297,
1456
+ "eval_steps_per_second": 0.665,
1457
+ "step": 510
1458
+ },
1459
+ {
1460
+ "epoch": 157.85,
1461
+ "eval_accuracy": 0.5480427046263345,
1462
+ "eval_loss": 1.3088507652282715,
1463
+ "eval_runtime": 4.2508,
1464
+ "eval_samples_per_second": 66.105,
1465
+ "eval_steps_per_second": 0.706,
1466
+ "step": 513
1467
+ },
1468
+ {
1469
+ "epoch": 158.77,
1470
+ "eval_accuracy": 0.5480427046263345,
1471
+ "eval_loss": 1.3138750791549683,
1472
+ "eval_runtime": 4.4205,
1473
+ "eval_samples_per_second": 63.567,
1474
+ "eval_steps_per_second": 0.679,
1475
+ "step": 516
1476
+ },
1477
+ {
1478
+ "epoch": 160.0,
1479
+ "eval_accuracy": 0.5551601423487544,
1480
+ "eval_loss": 1.3067928552627563,
1481
+ "eval_runtime": 4.2488,
1482
+ "eval_samples_per_second": 66.136,
1483
+ "eval_steps_per_second": 0.706,
1484
+ "step": 520
1485
+ },
1486
+ {
1487
+ "epoch": 160.92,
1488
+ "eval_accuracy": 0.5551601423487544,
1489
+ "eval_loss": 1.3011025190353394,
1490
+ "eval_runtime": 4.3025,
1491
+ "eval_samples_per_second": 65.31,
1492
+ "eval_steps_per_second": 0.697,
1493
+ "step": 523
1494
+ },
1495
+ {
1496
+ "epoch": 161.85,
1497
+ "eval_accuracy": 0.5551601423487544,
1498
+ "eval_loss": 1.2957364320755005,
1499
+ "eval_runtime": 4.3812,
1500
+ "eval_samples_per_second": 64.137,
1501
+ "eval_steps_per_second": 0.685,
1502
+ "step": 526
1503
+ },
1504
+ {
1505
+ "epoch": 162.77,
1506
+ "eval_accuracy": 0.5551601423487544,
1507
+ "eval_loss": 1.296021819114685,
1508
+ "eval_runtime": 4.6921,
1509
+ "eval_samples_per_second": 59.887,
1510
+ "eval_steps_per_second": 0.639,
1511
+ "step": 529
1512
+ },
1513
+ {
1514
+ "epoch": 164.0,
1515
+ "eval_accuracy": 0.5516014234875445,
1516
+ "eval_loss": 1.3158953189849854,
1517
+ "eval_runtime": 4.7452,
1518
+ "eval_samples_per_second": 59.218,
1519
+ "eval_steps_per_second": 0.632,
1520
+ "step": 533
1521
+ },
1522
+ {
1523
+ "epoch": 164.92,
1524
+ "eval_accuracy": 0.5516014234875445,
1525
+ "eval_loss": 1.3257168531417847,
1526
+ "eval_runtime": 4.4128,
1527
+ "eval_samples_per_second": 63.678,
1528
+ "eval_steps_per_second": 0.68,
1529
+ "step": 536
1530
+ },
1531
+ {
1532
+ "epoch": 165.85,
1533
+ "eval_accuracy": 0.5516014234875445,
1534
+ "eval_loss": 1.3312301635742188,
1535
+ "eval_runtime": 3.6447,
1536
+ "eval_samples_per_second": 77.099,
1537
+ "eval_steps_per_second": 0.823,
1538
+ "step": 539
1539
+ },
1540
+ {
1541
+ "epoch": 166.77,
1542
+ "eval_accuracy": 0.5516014234875445,
1543
+ "eval_loss": 1.322218418121338,
1544
+ "eval_runtime": 4.2773,
1545
+ "eval_samples_per_second": 65.695,
1546
+ "eval_steps_per_second": 0.701,
1547
+ "step": 542
1548
+ },
1549
+ {
1550
+ "epoch": 168.0,
1551
+ "eval_accuracy": 0.5551601423487544,
1552
+ "eval_loss": 1.298622488975525,
1553
+ "eval_runtime": 4.5789,
1554
+ "eval_samples_per_second": 61.369,
1555
+ "eval_steps_per_second": 0.655,
1556
+ "step": 546
1557
+ },
1558
+ {
1559
+ "epoch": 168.92,
1560
+ "eval_accuracy": 0.5587188612099644,
1561
+ "eval_loss": 1.289797306060791,
1562
+ "eval_runtime": 4.5328,
1563
+ "eval_samples_per_second": 61.993,
1564
+ "eval_steps_per_second": 0.662,
1565
+ "step": 549
1566
+ },
1567
+ {
1568
+ "epoch": 169.85,
1569
+ "eval_accuracy": 0.5551601423487544,
1570
+ "eval_loss": 1.2937852144241333,
1571
+ "eval_runtime": 3.7509,
1572
+ "eval_samples_per_second": 74.915,
1573
+ "eval_steps_per_second": 0.8,
1574
+ "step": 552
1575
+ },
1576
+ {
1577
+ "epoch": 170.77,
1578
+ "eval_accuracy": 0.5551601423487544,
1579
+ "eval_loss": 1.290231704711914,
1580
+ "eval_runtime": 4.1153,
1581
+ "eval_samples_per_second": 68.282,
1582
+ "eval_steps_per_second": 0.729,
1583
+ "step": 555
1584
+ },
1585
+ {
1586
+ "epoch": 172.0,
1587
+ "eval_accuracy": 0.5658362989323843,
1588
+ "eval_loss": 1.287913203239441,
1589
+ "eval_runtime": 4.7912,
1590
+ "eval_samples_per_second": 58.649,
1591
+ "eval_steps_per_second": 0.626,
1592
+ "step": 559
1593
+ },
1594
+ {
1595
+ "epoch": 172.92,
1596
+ "eval_accuracy": 0.5658362989323843,
1597
+ "eval_loss": 1.283803939819336,
1598
+ "eval_runtime": 4.5456,
1599
+ "eval_samples_per_second": 61.818,
1600
+ "eval_steps_per_second": 0.66,
1601
+ "step": 562
1602
+ },
1603
+ {
1604
+ "epoch": 173.85,
1605
+ "eval_accuracy": 0.5658362989323843,
1606
+ "eval_loss": 1.2811965942382812,
1607
+ "eval_runtime": 4.4869,
1608
+ "eval_samples_per_second": 62.627,
1609
+ "eval_steps_per_second": 0.669,
1610
+ "step": 565
1611
+ },
1612
+ {
1613
+ "epoch": 174.77,
1614
+ "eval_accuracy": 0.5658362989323843,
1615
+ "eval_loss": 1.2863661050796509,
1616
+ "eval_runtime": 4.2715,
1617
+ "eval_samples_per_second": 65.785,
1618
+ "eval_steps_per_second": 0.702,
1619
+ "step": 568
1620
+ },
1621
+ {
1622
+ "epoch": 176.0,
1623
+ "eval_accuracy": 0.5551601423487544,
1624
+ "eval_loss": 1.2934131622314453,
1625
+ "eval_runtime": 4.643,
1626
+ "eval_samples_per_second": 60.522,
1627
+ "eval_steps_per_second": 0.646,
1628
+ "step": 572
1629
+ },
1630
+ {
1631
+ "epoch": 176.92,
1632
+ "eval_accuracy": 0.5587188612099644,
1633
+ "eval_loss": 1.2940202951431274,
1634
+ "eval_runtime": 4.2681,
1635
+ "eval_samples_per_second": 65.837,
1636
+ "eval_steps_per_second": 0.703,
1637
+ "step": 575
1638
+ },
1639
+ {
1640
+ "epoch": 177.85,
1641
+ "eval_accuracy": 0.5587188612099644,
1642
+ "eval_loss": 1.298832654953003,
1643
+ "eval_runtime": 4.2991,
1644
+ "eval_samples_per_second": 65.363,
1645
+ "eval_steps_per_second": 0.698,
1646
+ "step": 578
1647
+ },
1648
+ {
1649
+ "epoch": 178.77,
1650
+ "eval_accuracy": 0.5622775800711743,
1651
+ "eval_loss": 1.295286774635315,
1652
+ "eval_runtime": 4.1989,
1653
+ "eval_samples_per_second": 66.922,
1654
+ "eval_steps_per_second": 0.714,
1655
+ "step": 581
1656
+ },
1657
+ {
1658
+ "epoch": 180.0,
1659
+ "eval_accuracy": 0.5587188612099644,
1660
+ "eval_loss": 1.2971975803375244,
1661
+ "eval_runtime": 4.7188,
1662
+ "eval_samples_per_second": 59.549,
1663
+ "eval_steps_per_second": 0.636,
1664
+ "step": 585
1665
+ },
1666
+ {
1667
+ "epoch": 180.92,
1668
+ "eval_accuracy": 0.5658362989323843,
1669
+ "eval_loss": 1.2936004400253296,
1670
+ "eval_runtime": 4.813,
1671
+ "eval_samples_per_second": 58.383,
1672
+ "eval_steps_per_second": 0.623,
1673
+ "step": 588
1674
+ },
1675
+ {
1676
+ "epoch": 181.85,
1677
+ "eval_accuracy": 0.5658362989323843,
1678
+ "eval_loss": 1.2928047180175781,
1679
+ "eval_runtime": 4.1735,
1680
+ "eval_samples_per_second": 67.33,
1681
+ "eval_steps_per_second": 0.719,
1682
+ "step": 591
1683
+ },
1684
+ {
1685
+ "epoch": 182.77,
1686
+ "eval_accuracy": 0.5658362989323843,
1687
+ "eval_loss": 1.291295051574707,
1688
+ "eval_runtime": 4.4694,
1689
+ "eval_samples_per_second": 62.872,
1690
+ "eval_steps_per_second": 0.671,
1691
+ "step": 594
1692
+ },
1693
+ {
1694
+ "epoch": 184.0,
1695
+ "eval_accuracy": 0.5658362989323843,
1696
+ "eval_loss": 1.2824889421463013,
1697
+ "eval_runtime": 4.0765,
1698
+ "eval_samples_per_second": 68.932,
1699
+ "eval_steps_per_second": 0.736,
1700
+ "step": 598
1701
+ },
1702
+ {
1703
+ "epoch": 184.62,
1704
+ "grad_norm": 29892.71484375,
1705
+ "learning_rate": 1.111111111111111e-05,
1706
+ "loss": 0.6473,
1707
+ "step": 600
1708
+ },
1709
+ {
1710
+ "epoch": 184.92,
1711
+ "eval_accuracy": 0.5693950177935944,
1712
+ "eval_loss": 1.2735832929611206,
1713
+ "eval_runtime": 4.6704,
1714
+ "eval_samples_per_second": 60.166,
1715
+ "eval_steps_per_second": 0.642,
1716
+ "step": 601
1717
+ },
1718
+ {
1719
+ "epoch": 185.85,
1720
+ "eval_accuracy": 0.5693950177935944,
1721
+ "eval_loss": 1.2714898586273193,
1722
+ "eval_runtime": 4.6432,
1723
+ "eval_samples_per_second": 60.519,
1724
+ "eval_steps_per_second": 0.646,
1725
+ "step": 604
1726
+ },
1727
+ {
1728
+ "epoch": 186.77,
1729
+ "eval_accuracy": 0.5693950177935944,
1730
+ "eval_loss": 1.2703534364700317,
1731
+ "eval_runtime": 4.1853,
1732
+ "eval_samples_per_second": 67.139,
1733
+ "eval_steps_per_second": 0.717,
1734
+ "step": 607
1735
+ },
1736
+ {
1737
+ "epoch": 188.0,
1738
+ "eval_accuracy": 0.5693950177935944,
1739
+ "eval_loss": 1.2716755867004395,
1740
+ "eval_runtime": 4.1775,
1741
+ "eval_samples_per_second": 67.265,
1742
+ "eval_steps_per_second": 0.718,
1743
+ "step": 611
1744
+ },
1745
+ {
1746
+ "epoch": 188.92,
1747
+ "eval_accuracy": 0.5658362989323843,
1748
+ "eval_loss": 1.2724348306655884,
1749
+ "eval_runtime": 4.9312,
1750
+ "eval_samples_per_second": 56.984,
1751
+ "eval_steps_per_second": 0.608,
1752
+ "step": 614
1753
+ },
1754
+ {
1755
+ "epoch": 189.85,
1756
+ "eval_accuracy": 0.5658362989323843,
1757
+ "eval_loss": 1.2763242721557617,
1758
+ "eval_runtime": 3.5712,
1759
+ "eval_samples_per_second": 78.685,
1760
+ "eval_steps_per_second": 0.84,
1761
+ "step": 617
1762
+ },
1763
+ {
1764
+ "epoch": 190.77,
1765
+ "eval_accuracy": 0.5658362989323843,
1766
+ "eval_loss": 1.2811599969863892,
1767
+ "eval_runtime": 4.2324,
1768
+ "eval_samples_per_second": 66.393,
1769
+ "eval_steps_per_second": 0.709,
1770
+ "step": 620
1771
+ },
1772
+ {
1773
+ "epoch": 192.0,
1774
+ "eval_accuracy": 0.5658362989323843,
1775
+ "eval_loss": 1.2791301012039185,
1776
+ "eval_runtime": 4.4625,
1777
+ "eval_samples_per_second": 62.97,
1778
+ "eval_steps_per_second": 0.672,
1779
+ "step": 624
1780
+ },
1781
+ {
1782
+ "epoch": 192.92,
1783
+ "eval_accuracy": 0.5693950177935944,
1784
+ "eval_loss": 1.2697654962539673,
1785
+ "eval_runtime": 4.2766,
1786
+ "eval_samples_per_second": 65.707,
1787
+ "eval_steps_per_second": 0.701,
1788
+ "step": 627
1789
+ },
1790
+ {
1791
+ "epoch": 193.85,
1792
+ "eval_accuracy": 0.5693950177935944,
1793
+ "eval_loss": 1.269476294517517,
1794
+ "eval_runtime": 4.2862,
1795
+ "eval_samples_per_second": 65.56,
1796
+ "eval_steps_per_second": 0.7,
1797
+ "step": 630
1798
+ },
1799
+ {
1800
+ "epoch": 194.77,
1801
+ "eval_accuracy": 0.5693950177935944,
1802
+ "eval_loss": 1.2703962326049805,
1803
+ "eval_runtime": 4.2135,
1804
+ "eval_samples_per_second": 66.69,
1805
+ "eval_steps_per_second": 0.712,
1806
+ "step": 633
1807
+ },
1808
+ {
1809
+ "epoch": 196.0,
1810
+ "eval_accuracy": 0.5658362989323843,
1811
+ "eval_loss": 1.2736749649047852,
1812
+ "eval_runtime": 4.1666,
1813
+ "eval_samples_per_second": 67.441,
1814
+ "eval_steps_per_second": 0.72,
1815
+ "step": 637
1816
+ },
1817
+ {
1818
+ "epoch": 196.92,
1819
+ "eval_accuracy": 0.5658362989323843,
1820
+ "eval_loss": 1.2782082557678223,
1821
+ "eval_runtime": 4.3682,
1822
+ "eval_samples_per_second": 64.329,
1823
+ "eval_steps_per_second": 0.687,
1824
+ "step": 640
1825
+ },
1826
+ {
1827
+ "epoch": 197.85,
1828
+ "eval_accuracy": 0.5622775800711743,
1829
+ "eval_loss": 1.2813825607299805,
1830
+ "eval_runtime": 5.6488,
1831
+ "eval_samples_per_second": 49.745,
1832
+ "eval_steps_per_second": 0.531,
1833
+ "step": 643
1834
+ },
1835
+ {
1836
+ "epoch": 198.77,
1837
+ "eval_accuracy": 0.5622775800711743,
1838
+ "eval_loss": 1.2819089889526367,
1839
+ "eval_runtime": 5.1916,
1840
+ "eval_samples_per_second": 54.126,
1841
+ "eval_steps_per_second": 0.578,
1842
+ "step": 646
1843
+ },
1844
+ {
1845
+ "epoch": 200.0,
1846
+ "eval_accuracy": 0.5658362989323843,
1847
+ "eval_loss": 1.274595022201538,
1848
+ "eval_runtime": 4.4378,
1849
+ "eval_samples_per_second": 63.32,
1850
+ "eval_steps_per_second": 0.676,
1851
+ "step": 650
1852
+ },
1853
+ {
1854
+ "epoch": 200.92,
1855
+ "eval_accuracy": 0.5658362989323843,
1856
+ "eval_loss": 1.2694467306137085,
1857
+ "eval_runtime": 4.797,
1858
+ "eval_samples_per_second": 58.579,
1859
+ "eval_steps_per_second": 0.625,
1860
+ "step": 653
1861
+ },
1862
+ {
1863
+ "epoch": 201.85,
1864
+ "eval_accuracy": 0.5765124555160143,
1865
+ "eval_loss": 1.262547254562378,
1866
+ "eval_runtime": 4.6991,
1867
+ "eval_samples_per_second": 59.798,
1868
+ "eval_steps_per_second": 0.638,
1869
+ "step": 656
1870
+ },
1871
+ {
1872
+ "epoch": 202.77,
1873
+ "eval_accuracy": 0.5800711743772242,
1874
+ "eval_loss": 1.2575123310089111,
1875
+ "eval_runtime": 4.9663,
1876
+ "eval_samples_per_second": 56.582,
1877
+ "eval_steps_per_second": 0.604,
1878
+ "step": 659
1879
+ },
1880
+ {
1881
+ "epoch": 204.0,
1882
+ "eval_accuracy": 0.5800711743772242,
1883
+ "eval_loss": 1.2548755407333374,
1884
+ "eval_runtime": 5.2012,
1885
+ "eval_samples_per_second": 54.026,
1886
+ "eval_steps_per_second": 0.577,
1887
+ "step": 663
1888
+ },
1889
+ {
1890
+ "epoch": 204.92,
1891
+ "eval_accuracy": 0.5729537366548043,
1892
+ "eval_loss": 1.2623133659362793,
1893
+ "eval_runtime": 4.5347,
1894
+ "eval_samples_per_second": 61.967,
1895
+ "eval_steps_per_second": 0.662,
1896
+ "step": 666
1897
+ },
1898
+ {
1899
+ "epoch": 205.85,
1900
+ "eval_accuracy": 0.5658362989323843,
1901
+ "eval_loss": 1.2665455341339111,
1902
+ "eval_runtime": 3.1603,
1903
+ "eval_samples_per_second": 88.917,
1904
+ "eval_steps_per_second": 0.949,
1905
+ "step": 669
1906
+ },
1907
+ {
1908
+ "epoch": 206.77,
1909
+ "eval_accuracy": 0.5658362989323843,
1910
+ "eval_loss": 1.2684026956558228,
1911
+ "eval_runtime": 4.2009,
1912
+ "eval_samples_per_second": 66.89,
1913
+ "eval_steps_per_second": 0.714,
1914
+ "step": 672
1915
+ },
1916
+ {
1917
+ "epoch": 208.0,
1918
+ "eval_accuracy": 0.5622775800711743,
1919
+ "eval_loss": 1.277047038078308,
1920
+ "eval_runtime": 4.3489,
1921
+ "eval_samples_per_second": 64.613,
1922
+ "eval_steps_per_second": 0.69,
1923
+ "step": 676
1924
+ },
1925
+ {
1926
+ "epoch": 208.92,
1927
+ "eval_accuracy": 0.5622775800711743,
1928
+ "eval_loss": 1.2807551622390747,
1929
+ "eval_runtime": 3.8563,
1930
+ "eval_samples_per_second": 72.867,
1931
+ "eval_steps_per_second": 0.778,
1932
+ "step": 679
1933
+ },
1934
+ {
1935
+ "epoch": 209.85,
1936
+ "eval_accuracy": 0.5729537366548043,
1937
+ "eval_loss": 1.2761532068252563,
1938
+ "eval_runtime": 4.7161,
1939
+ "eval_samples_per_second": 59.583,
1940
+ "eval_steps_per_second": 0.636,
1941
+ "step": 682
1942
+ },
1943
+ {
1944
+ "epoch": 210.77,
1945
+ "eval_accuracy": 0.5729537366548043,
1946
+ "eval_loss": 1.2759194374084473,
1947
+ "eval_runtime": 5.0376,
1948
+ "eval_samples_per_second": 55.781,
1949
+ "eval_steps_per_second": 0.596,
1950
+ "step": 685
1951
+ },
1952
+ {
1953
+ "epoch": 212.0,
1954
+ "eval_accuracy": 0.5729537366548043,
1955
+ "eval_loss": 1.2752187252044678,
1956
+ "eval_runtime": 4.5842,
1957
+ "eval_samples_per_second": 61.297,
1958
+ "eval_steps_per_second": 0.654,
1959
+ "step": 689
1960
+ },
1961
+ {
1962
+ "epoch": 212.92,
1963
+ "eval_accuracy": 0.5729537366548043,
1964
+ "eval_loss": 1.275394082069397,
1965
+ "eval_runtime": 4.2209,
1966
+ "eval_samples_per_second": 66.573,
1967
+ "eval_steps_per_second": 0.711,
1968
+ "step": 692
1969
+ },
1970
+ {
1971
+ "epoch": 213.85,
1972
+ "eval_accuracy": 0.5765124555160143,
1973
+ "eval_loss": 1.272161602973938,
1974
+ "eval_runtime": 4.7348,
1975
+ "eval_samples_per_second": 59.347,
1976
+ "eval_steps_per_second": 0.634,
1977
+ "step": 695
1978
+ },
1979
+ {
1980
+ "epoch": 214.77,
1981
+ "eval_accuracy": 0.5765124555160143,
1982
+ "eval_loss": 1.273858904838562,
1983
+ "eval_runtime": 4.0254,
1984
+ "eval_samples_per_second": 69.808,
1985
+ "eval_steps_per_second": 0.745,
1986
+ "step": 698
1987
+ },
1988
+ {
1989
+ "epoch": 215.38,
1990
+ "grad_norm": 28098.056640625,
1991
+ "learning_rate": 7.4074074074074075e-06,
1992
+ "loss": 0.613,
1993
+ "step": 700
1994
+ },
1995
+ {
1996
+ "epoch": 216.0,
1997
+ "eval_accuracy": 0.5765124555160143,
1998
+ "eval_loss": 1.2782981395721436,
1999
+ "eval_runtime": 4.6723,
2000
+ "eval_samples_per_second": 60.142,
2001
+ "eval_steps_per_second": 0.642,
2002
+ "step": 702
2003
+ },
2004
+ {
2005
+ "epoch": 216.92,
2006
+ "eval_accuracy": 0.5765124555160143,
2007
+ "eval_loss": 1.2774933576583862,
2008
+ "eval_runtime": 4.576,
2009
+ "eval_samples_per_second": 61.407,
2010
+ "eval_steps_per_second": 0.656,
2011
+ "step": 705
2012
+ },
2013
+ {
2014
+ "epoch": 217.85,
2015
+ "eval_accuracy": 0.5765124555160143,
2016
+ "eval_loss": 1.2740654945373535,
2017
+ "eval_runtime": 4.8253,
2018
+ "eval_samples_per_second": 58.234,
2019
+ "eval_steps_per_second": 0.622,
2020
+ "step": 708
2021
+ },
2022
+ {
2023
+ "epoch": 218.77,
2024
+ "eval_accuracy": 0.5765124555160143,
2025
+ "eval_loss": 1.2705509662628174,
2026
+ "eval_runtime": 4.386,
2027
+ "eval_samples_per_second": 64.067,
2028
+ "eval_steps_per_second": 0.684,
2029
+ "step": 711
2030
+ },
2031
+ {
2032
+ "epoch": 220.0,
2033
+ "eval_accuracy": 0.5765124555160143,
2034
+ "eval_loss": 1.2627956867218018,
2035
+ "eval_runtime": 4.2817,
2036
+ "eval_samples_per_second": 65.628,
2037
+ "eval_steps_per_second": 0.701,
2038
+ "step": 715
2039
+ },
2040
+ {
2041
+ "epoch": 220.92,
2042
+ "eval_accuracy": 0.5800711743772242,
2043
+ "eval_loss": 1.2580970525741577,
2044
+ "eval_runtime": 3.9386,
2045
+ "eval_samples_per_second": 71.344,
2046
+ "eval_steps_per_second": 0.762,
2047
+ "step": 718
2048
+ },
2049
+ {
2050
+ "epoch": 221.85,
2051
+ "eval_accuracy": 0.5765124555160143,
2052
+ "eval_loss": 1.2567566633224487,
2053
+ "eval_runtime": 4.3353,
2054
+ "eval_samples_per_second": 64.817,
2055
+ "eval_steps_per_second": 0.692,
2056
+ "step": 721
2057
+ },
2058
+ {
2059
+ "epoch": 222.77,
2060
+ "eval_accuracy": 0.5729537366548043,
2061
+ "eval_loss": 1.2558982372283936,
2062
+ "eval_runtime": 3.7135,
2063
+ "eval_samples_per_second": 75.67,
2064
+ "eval_steps_per_second": 0.808,
2065
+ "step": 724
2066
+ },
2067
+ {
2068
+ "epoch": 224.0,
2069
+ "eval_accuracy": 0.5765124555160143,
2070
+ "eval_loss": 1.2502700090408325,
2071
+ "eval_runtime": 5.0636,
2072
+ "eval_samples_per_second": 55.494,
2073
+ "eval_steps_per_second": 0.592,
2074
+ "step": 728
2075
+ },
2076
+ {
2077
+ "epoch": 224.92,
2078
+ "eval_accuracy": 0.5765124555160143,
2079
+ "eval_loss": 1.2497973442077637,
2080
+ "eval_runtime": 4.7669,
2081
+ "eval_samples_per_second": 58.948,
2082
+ "eval_steps_per_second": 0.629,
2083
+ "step": 731
2084
+ },
2085
+ {
2086
+ "epoch": 225.85,
2087
+ "eval_accuracy": 0.5765124555160143,
2088
+ "eval_loss": 1.2500195503234863,
2089
+ "eval_runtime": 3.9522,
2090
+ "eval_samples_per_second": 71.099,
2091
+ "eval_steps_per_second": 0.759,
2092
+ "step": 734
2093
+ },
2094
+ {
2095
+ "epoch": 226.77,
2096
+ "eval_accuracy": 0.5765124555160143,
2097
+ "eval_loss": 1.2490234375,
2098
+ "eval_runtime": 4.1869,
2099
+ "eval_samples_per_second": 67.114,
2100
+ "eval_steps_per_second": 0.717,
2101
+ "step": 737
2102
+ },
2103
+ {
2104
+ "epoch": 228.0,
2105
+ "eval_accuracy": 0.5765124555160143,
2106
+ "eval_loss": 1.2531741857528687,
2107
+ "eval_runtime": 3.9865,
2108
+ "eval_samples_per_second": 70.489,
2109
+ "eval_steps_per_second": 0.753,
2110
+ "step": 741
2111
+ },
2112
+ {
2113
+ "epoch": 228.92,
2114
+ "eval_accuracy": 0.5765124555160143,
2115
+ "eval_loss": 1.2572293281555176,
2116
+ "eval_runtime": 5.2241,
2117
+ "eval_samples_per_second": 53.789,
2118
+ "eval_steps_per_second": 0.574,
2119
+ "step": 744
2120
+ },
2121
+ {
2122
+ "epoch": 229.85,
2123
+ "eval_accuracy": 0.5765124555160143,
2124
+ "eval_loss": 1.2598803043365479,
2125
+ "eval_runtime": 4.1402,
2126
+ "eval_samples_per_second": 67.87,
2127
+ "eval_steps_per_second": 0.725,
2128
+ "step": 747
2129
+ },
2130
+ {
2131
+ "epoch": 230.77,
2132
+ "eval_accuracy": 0.5729537366548043,
2133
+ "eval_loss": 1.2600898742675781,
2134
+ "eval_runtime": 3.9785,
2135
+ "eval_samples_per_second": 70.63,
2136
+ "eval_steps_per_second": 0.754,
2137
+ "step": 750
2138
+ },
2139
+ {
2140
+ "epoch": 232.0,
2141
+ "eval_accuracy": 0.5729537366548043,
2142
+ "eval_loss": 1.2625129222869873,
2143
+ "eval_runtime": 4.1458,
2144
+ "eval_samples_per_second": 67.779,
2145
+ "eval_steps_per_second": 0.724,
2146
+ "step": 754
2147
+ },
2148
+ {
2149
+ "epoch": 232.92,
2150
+ "eval_accuracy": 0.5765124555160143,
2151
+ "eval_loss": 1.2635974884033203,
2152
+ "eval_runtime": 4.5032,
2153
+ "eval_samples_per_second": 62.401,
2154
+ "eval_steps_per_second": 0.666,
2155
+ "step": 757
2156
+ },
2157
+ {
2158
+ "epoch": 233.85,
2159
+ "eval_accuracy": 0.5765124555160143,
2160
+ "eval_loss": 1.2629433870315552,
2161
+ "eval_runtime": 4.1399,
2162
+ "eval_samples_per_second": 67.876,
2163
+ "eval_steps_per_second": 0.725,
2164
+ "step": 760
2165
+ },
2166
+ {
2167
+ "epoch": 234.77,
2168
+ "eval_accuracy": 0.5765124555160143,
2169
+ "eval_loss": 1.2600425481796265,
2170
+ "eval_runtime": 4.2407,
2171
+ "eval_samples_per_second": 66.263,
2172
+ "eval_steps_per_second": 0.707,
2173
+ "step": 763
2174
+ },
2175
+ {
2176
+ "epoch": 236.0,
2177
+ "eval_accuracy": 0.5800711743772242,
2178
+ "eval_loss": 1.2558783292770386,
2179
+ "eval_runtime": 4.1208,
2180
+ "eval_samples_per_second": 68.19,
2181
+ "eval_steps_per_second": 0.728,
2182
+ "step": 767
2183
+ },
2184
+ {
2185
+ "epoch": 236.92,
2186
+ "eval_accuracy": 0.5800711743772242,
2187
+ "eval_loss": 1.2534478902816772,
2188
+ "eval_runtime": 3.8139,
2189
+ "eval_samples_per_second": 73.678,
2190
+ "eval_steps_per_second": 0.787,
2191
+ "step": 770
2192
+ },
2193
+ {
2194
+ "epoch": 237.85,
2195
+ "eval_accuracy": 0.5836298932384342,
2196
+ "eval_loss": 1.2513927221298218,
2197
+ "eval_runtime": 4.6813,
2198
+ "eval_samples_per_second": 60.026,
2199
+ "eval_steps_per_second": 0.641,
2200
+ "step": 773
2201
+ },
2202
+ {
2203
+ "epoch": 238.77,
2204
+ "eval_accuracy": 0.5836298932384342,
2205
+ "eval_loss": 1.2508091926574707,
2206
+ "eval_runtime": 4.2671,
2207
+ "eval_samples_per_second": 65.852,
2208
+ "eval_steps_per_second": 0.703,
2209
+ "step": 776
2210
+ },
2211
+ {
2212
+ "epoch": 240.0,
2213
+ "eval_accuracy": 0.5836298932384342,
2214
+ "eval_loss": 1.2487518787384033,
2215
+ "eval_runtime": 3.7642,
2216
+ "eval_samples_per_second": 74.651,
2217
+ "eval_steps_per_second": 0.797,
2218
+ "step": 780
2219
+ },
2220
+ {
2221
+ "epoch": 240.92,
2222
+ "eval_accuracy": 0.5836298932384342,
2223
+ "eval_loss": 1.2483351230621338,
2224
+ "eval_runtime": 4.8941,
2225
+ "eval_samples_per_second": 57.416,
2226
+ "eval_steps_per_second": 0.613,
2227
+ "step": 783
2228
+ },
2229
+ {
2230
+ "epoch": 241.85,
2231
+ "eval_accuracy": 0.5836298932384342,
2232
+ "eval_loss": 1.2500139474868774,
2233
+ "eval_runtime": 4.274,
2234
+ "eval_samples_per_second": 65.746,
2235
+ "eval_steps_per_second": 0.702,
2236
+ "step": 786
2237
+ },
2238
+ {
2239
+ "epoch": 242.77,
2240
+ "eval_accuracy": 0.5800711743772242,
2241
+ "eval_loss": 1.2503968477249146,
2242
+ "eval_runtime": 4.4982,
2243
+ "eval_samples_per_second": 62.469,
2244
+ "eval_steps_per_second": 0.667,
2245
+ "step": 789
2246
+ },
2247
+ {
2248
+ "epoch": 244.0,
2249
+ "eval_accuracy": 0.5800711743772242,
2250
+ "eval_loss": 1.2521419525146484,
2251
+ "eval_runtime": 4.0413,
2252
+ "eval_samples_per_second": 69.532,
2253
+ "eval_steps_per_second": 0.742,
2254
+ "step": 793
2255
+ },
2256
+ {
2257
+ "epoch": 244.92,
2258
+ "eval_accuracy": 0.5800711743772242,
2259
+ "eval_loss": 1.2532862424850464,
2260
+ "eval_runtime": 4.1262,
2261
+ "eval_samples_per_second": 68.101,
2262
+ "eval_steps_per_second": 0.727,
2263
+ "step": 796
2264
+ },
2265
+ {
2266
+ "epoch": 245.85,
2267
+ "eval_accuracy": 0.5800711743772242,
2268
+ "eval_loss": 1.251287817955017,
2269
+ "eval_runtime": 3.9321,
2270
+ "eval_samples_per_second": 71.463,
2271
+ "eval_steps_per_second": 0.763,
2272
+ "step": 799
2273
+ },
2274
+ {
2275
+ "epoch": 246.15,
2276
+ "grad_norm": 63046.29296875,
2277
+ "learning_rate": 3.7037037037037037e-06,
2278
+ "loss": 0.5946,
2279
+ "step": 800
2280
+ },
2281
+ {
2282
+ "epoch": 246.77,
2283
+ "eval_accuracy": 0.5800711743772242,
2284
+ "eval_loss": 1.2513457536697388,
2285
+ "eval_runtime": 4.1155,
2286
+ "eval_samples_per_second": 68.279,
2287
+ "eval_steps_per_second": 0.729,
2288
+ "step": 802
2289
+ },
2290
+ {
2291
+ "epoch": 248.0,
2292
+ "eval_accuracy": 0.5800711743772242,
2293
+ "eval_loss": 1.2507133483886719,
2294
+ "eval_runtime": 4.3807,
2295
+ "eval_samples_per_second": 64.145,
2296
+ "eval_steps_per_second": 0.685,
2297
+ "step": 806
2298
+ },
2299
+ {
2300
+ "epoch": 248.92,
2301
+ "eval_accuracy": 0.5836298932384342,
2302
+ "eval_loss": 1.2491704225540161,
2303
+ "eval_runtime": 4.0611,
2304
+ "eval_samples_per_second": 69.193,
2305
+ "eval_steps_per_second": 0.739,
2306
+ "step": 809
2307
+ },
2308
+ {
2309
+ "epoch": 249.85,
2310
+ "eval_accuracy": 0.5800711743772242,
2311
+ "eval_loss": 1.2499818801879883,
2312
+ "eval_runtime": 3.9673,
2313
+ "eval_samples_per_second": 70.828,
2314
+ "eval_steps_per_second": 0.756,
2315
+ "step": 812
2316
+ },
2317
+ {
2318
+ "epoch": 250.77,
2319
+ "eval_accuracy": 0.5800711743772242,
2320
+ "eval_loss": 1.2505466938018799,
2321
+ "eval_runtime": 4.5211,
2322
+ "eval_samples_per_second": 62.153,
2323
+ "eval_steps_per_second": 0.664,
2324
+ "step": 815
2325
+ },
2326
+ {
2327
+ "epoch": 252.0,
2328
+ "eval_accuracy": 0.5800711743772242,
2329
+ "eval_loss": 1.2519145011901855,
2330
+ "eval_runtime": 5.2859,
2331
+ "eval_samples_per_second": 53.16,
2332
+ "eval_steps_per_second": 0.568,
2333
+ "step": 819
2334
+ },
2335
+ {
2336
+ "epoch": 252.92,
2337
+ "eval_accuracy": 0.5800711743772242,
2338
+ "eval_loss": 1.253113865852356,
2339
+ "eval_runtime": 4.0658,
2340
+ "eval_samples_per_second": 69.113,
2341
+ "eval_steps_per_second": 0.738,
2342
+ "step": 822
2343
+ },
2344
+ {
2345
+ "epoch": 253.85,
2346
+ "eval_accuracy": 0.5800711743772242,
2347
+ "eval_loss": 1.2538248300552368,
2348
+ "eval_runtime": 4.1084,
2349
+ "eval_samples_per_second": 68.396,
2350
+ "eval_steps_per_second": 0.73,
2351
+ "step": 825
2352
+ },
2353
+ {
2354
+ "epoch": 254.77,
2355
+ "eval_accuracy": 0.5800711743772242,
2356
+ "eval_loss": 1.2532281875610352,
2357
+ "eval_runtime": 4.0615,
2358
+ "eval_samples_per_second": 69.186,
2359
+ "eval_steps_per_second": 0.739,
2360
+ "step": 828
2361
+ },
2362
+ {
2363
+ "epoch": 256.0,
2364
+ "eval_accuracy": 0.5800711743772242,
2365
+ "eval_loss": 1.2527676820755005,
2366
+ "eval_runtime": 4.6892,
2367
+ "eval_samples_per_second": 59.925,
2368
+ "eval_steps_per_second": 0.64,
2369
+ "step": 832
2370
+ },
2371
+ {
2372
+ "epoch": 256.92,
2373
+ "eval_accuracy": 0.5800711743772242,
2374
+ "eval_loss": 1.252835988998413,
2375
+ "eval_runtime": 3.7759,
2376
+ "eval_samples_per_second": 74.42,
2377
+ "eval_steps_per_second": 0.795,
2378
+ "step": 835
2379
+ },
2380
+ {
2381
+ "epoch": 257.85,
2382
+ "eval_accuracy": 0.5836298932384342,
2383
+ "eval_loss": 1.2521347999572754,
2384
+ "eval_runtime": 3.5788,
2385
+ "eval_samples_per_second": 78.519,
2386
+ "eval_steps_per_second": 0.838,
2387
+ "step": 838
2388
+ },
2389
+ {
2390
+ "epoch": 258.77,
2391
+ "eval_accuracy": 0.5836298932384342,
2392
+ "eval_loss": 1.252551555633545,
2393
+ "eval_runtime": 3.9885,
2394
+ "eval_samples_per_second": 70.452,
2395
+ "eval_steps_per_second": 0.752,
2396
+ "step": 841
2397
+ },
2398
+ {
2399
+ "epoch": 260.0,
2400
+ "eval_accuracy": 0.5836298932384342,
2401
+ "eval_loss": 1.2527978420257568,
2402
+ "eval_runtime": 3.7855,
2403
+ "eval_samples_per_second": 74.231,
2404
+ "eval_steps_per_second": 0.792,
2405
+ "step": 845
2406
+ },
2407
+ {
2408
+ "epoch": 260.92,
2409
+ "eval_accuracy": 0.5836298932384342,
2410
+ "eval_loss": 1.2529038190841675,
2411
+ "eval_runtime": 3.8445,
2412
+ "eval_samples_per_second": 73.091,
2413
+ "eval_steps_per_second": 0.78,
2414
+ "step": 848
2415
+ },
2416
+ {
2417
+ "epoch": 261.85,
2418
+ "eval_accuracy": 0.5836298932384342,
2419
+ "eval_loss": 1.2528387308120728,
2420
+ "eval_runtime": 3.7891,
2421
+ "eval_samples_per_second": 74.16,
2422
+ "eval_steps_per_second": 0.792,
2423
+ "step": 851
2424
+ },
2425
+ {
2426
+ "epoch": 262.77,
2427
+ "eval_accuracy": 0.5836298932384342,
2428
+ "eval_loss": 1.2516640424728394,
2429
+ "eval_runtime": 4.2868,
2430
+ "eval_samples_per_second": 65.55,
2431
+ "eval_steps_per_second": 0.7,
2432
+ "step": 854
2433
+ },
2434
+ {
2435
+ "epoch": 264.0,
2436
+ "eval_accuracy": 0.5836298932384342,
2437
+ "eval_loss": 1.251232385635376,
2438
+ "eval_runtime": 3.7886,
2439
+ "eval_samples_per_second": 74.169,
2440
+ "eval_steps_per_second": 0.792,
2441
+ "step": 858
2442
+ },
2443
+ {
2444
+ "epoch": 264.92,
2445
+ "eval_accuracy": 0.5836298932384342,
2446
+ "eval_loss": 1.251160979270935,
2447
+ "eval_runtime": 4.4131,
2448
+ "eval_samples_per_second": 63.674,
2449
+ "eval_steps_per_second": 0.68,
2450
+ "step": 861
2451
+ },
2452
+ {
2453
+ "epoch": 265.85,
2454
+ "eval_accuracy": 0.5836298932384342,
2455
+ "eval_loss": 1.2503511905670166,
2456
+ "eval_runtime": 3.8431,
2457
+ "eval_samples_per_second": 73.118,
2458
+ "eval_steps_per_second": 0.781,
2459
+ "step": 864
2460
+ },
2461
+ {
2462
+ "epoch": 266.77,
2463
+ "eval_accuracy": 0.5836298932384342,
2464
+ "eval_loss": 1.2499034404754639,
2465
+ "eval_runtime": 3.7109,
2466
+ "eval_samples_per_second": 75.723,
2467
+ "eval_steps_per_second": 0.808,
2468
+ "step": 867
2469
+ },
2470
+ {
2471
+ "epoch": 268.0,
2472
+ "eval_accuracy": 0.5836298932384342,
2473
+ "eval_loss": 1.2496285438537598,
2474
+ "eval_runtime": 4.0777,
2475
+ "eval_samples_per_second": 68.912,
2476
+ "eval_steps_per_second": 0.736,
2477
+ "step": 871
2478
+ },
2479
+ {
2480
+ "epoch": 268.92,
2481
+ "eval_accuracy": 0.5836298932384342,
2482
+ "eval_loss": 1.2497419118881226,
2483
+ "eval_runtime": 4.4854,
2484
+ "eval_samples_per_second": 62.648,
2485
+ "eval_steps_per_second": 0.669,
2486
+ "step": 874
2487
+ },
2488
+ {
2489
+ "epoch": 269.85,
2490
+ "eval_accuracy": 0.5836298932384342,
2491
+ "eval_loss": 1.2500321865081787,
2492
+ "eval_runtime": 5.2487,
2493
+ "eval_samples_per_second": 53.537,
2494
+ "eval_steps_per_second": 0.572,
2495
+ "step": 877
2496
+ },
2497
+ {
2498
+ "epoch": 270.77,
2499
+ "eval_accuracy": 0.5836298932384342,
2500
+ "eval_loss": 1.250011682510376,
2501
+ "eval_runtime": 4.3951,
2502
+ "eval_samples_per_second": 63.935,
2503
+ "eval_steps_per_second": 0.683,
2504
+ "step": 880
2505
+ },
2506
+ {
2507
+ "epoch": 272.0,
2508
+ "eval_accuracy": 0.5836298932384342,
2509
+ "eval_loss": 1.2498865127563477,
2510
+ "eval_runtime": 4.1755,
2511
+ "eval_samples_per_second": 67.297,
2512
+ "eval_steps_per_second": 0.718,
2513
+ "step": 884
2514
+ },
2515
+ {
2516
+ "epoch": 272.92,
2517
+ "eval_accuracy": 0.5836298932384342,
2518
+ "eval_loss": 1.2500803470611572,
2519
+ "eval_runtime": 4.6562,
2520
+ "eval_samples_per_second": 60.349,
2521
+ "eval_steps_per_second": 0.644,
2522
+ "step": 887
2523
+ },
2524
+ {
2525
+ "epoch": 273.85,
2526
+ "eval_accuracy": 0.5836298932384342,
2527
+ "eval_loss": 1.2503583431243896,
2528
+ "eval_runtime": 4.5464,
2529
+ "eval_samples_per_second": 61.807,
2530
+ "eval_steps_per_second": 0.66,
2531
+ "step": 890
2532
+ },
2533
+ {
2534
+ "epoch": 274.77,
2535
+ "eval_accuracy": 0.5836298932384342,
2536
+ "eval_loss": 1.2506159543991089,
2537
+ "eval_runtime": 3.9624,
2538
+ "eval_samples_per_second": 70.917,
2539
+ "eval_steps_per_second": 0.757,
2540
+ "step": 893
2541
+ },
2542
+ {
2543
+ "epoch": 276.0,
2544
+ "eval_accuracy": 0.5836298932384342,
2545
+ "eval_loss": 1.2505924701690674,
2546
+ "eval_runtime": 4.0033,
2547
+ "eval_samples_per_second": 70.192,
2548
+ "eval_steps_per_second": 0.749,
2549
+ "step": 897
2550
+ },
2551
+ {
2552
+ "epoch": 276.92,
2553
+ "grad_norm": 30700.47265625,
2554
+ "learning_rate": 0.0,
2555
+ "loss": 0.588,
2556
+ "step": 900
2557
+ },
2558
+ {
2559
+ "epoch": 276.92,
2560
+ "eval_accuracy": 0.5836298932384342,
2561
+ "eval_loss": 1.2505559921264648,
2562
+ "eval_runtime": 4.7936,
2563
+ "eval_samples_per_second": 58.62,
2564
+ "eval_steps_per_second": 0.626,
2565
+ "step": 900
2566
+ },
2567
+ {
2568
+ "epoch": 276.92,
2569
+ "step": 900,
2570
+ "total_flos": 3.755576946691584e+18,
2571
+ "train_loss": 0.8810926691691081,
2572
+ "train_runtime": 3759.0782,
2573
+ "train_samples_per_second": 123.541,
2574
+ "train_steps_per_second": 0.239
2575
+ }
2576
+ ],
2577
+ "logging_steps": 100,
2578
+ "max_steps": 900,
2579
+ "num_input_tokens_seen": 0,
2580
+ "num_train_epochs": 300,
2581
+ "save_steps": 500,
2582
+ "total_flos": 3.755576946691584e+18,
2583
+ "train_batch_size": 128,
2584
+ "trial_name": null,
2585
+ "trial_params": null
2586
+ }