hrezaei commited on
Commit
371b0a4
·
verified ·
1 Parent(s): f62d1a5

End of training

Browse files
Files changed (5) hide show
  1. README.md +16 -3
  2. all_results.json +16 -0
  3. eval_results.json +10 -0
  4. train_results.json +9 -0
  5. trainer_state.json +2342 -0
README.md CHANGED
@@ -2,11 +2,24 @@
2
  library_name: transformers
3
  tags:
4
  - generated_from_trainer
 
 
5
  metrics:
6
  - accuracy
7
  model-index:
8
  - name: T5LA
9
- results: []
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -15,10 +28,10 @@ should probably proofread and complete it, then remove this comment. -->
15
  [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/uoy/llm_training/runs/elf928gg)
16
  # T5LA
17
 
18
- This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Accuracy: 0.0322
21
  - Loss: 5.5470
 
22
 
23
  ## Model description
24
 
 
2
  library_name: transformers
3
  tags:
4
  - generated_from_trainer
5
+ datasets:
6
+ - HuggingFaceFW/fineweb
7
  metrics:
8
  - accuracy
9
  model-index:
10
  - name: T5LA
11
+ results:
12
+ - task:
13
+ name: Causal Language Modeling
14
+ type: text-generation
15
+ dataset:
16
+ name: HuggingFaceFW/fineweb sample-10BT
17
+ type: HuggingFaceFW/fineweb
18
+ args: sample-10BT
19
+ metrics:
20
+ - name: Accuracy
21
+ type: accuracy
22
+ value: 0.03222989830774154
23
  ---
24
 
25
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
28
  [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/uoy/llm_training/runs/elf928gg)
29
  # T5LA
30
 
31
+ This model is a fine-tuned version of [](https://huggingface.co/) on the HuggingFaceFW/fineweb sample-10BT dataset.
32
  It achieves the following results on the evaluation set:
 
33
  - Loss: 5.5470
34
+ - Accuracy: 0.0322
35
 
36
  ## Model description
37
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.00001,
3
+ "eval_accuracy": 0.03222989830774154,
4
+ "eval_loss": 5.5469770431518555,
5
+ "eval_runtime": 110.5546,
6
+ "eval_samples": 10000,
7
+ "eval_samples_per_second": 32.491,
8
+ "eval_steps_per_second": 2.035,
9
+ "perplexity": 256.46111204397334,
10
+ "total_flos": 9.182126159167488e+17,
11
+ "train_loss": 5.625401986489168e-05,
12
+ "train_runtime": 26.8473,
13
+ "train_samples": 1000000,
14
+ "train_samples_per_second": 59596.412,
15
+ "train_steps_per_second": 3724.776
16
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.00001,
3
+ "eval_accuracy": 0.03222989830774154,
4
+ "eval_loss": 5.5469770431518555,
5
+ "eval_runtime": 110.5546,
6
+ "eval_samples": 10000,
7
+ "eval_samples_per_second": 32.491,
8
+ "eval_steps_per_second": 2.035,
9
+ "perplexity": 256.46111204397334
10
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.00001,
3
+ "total_flos": 9.182126159167488e+17,
4
+ "train_loss": 5.625401986489168e-05,
5
+ "train_runtime": 26.8473,
6
+ "train_samples": 1000000,
7
+ "train_samples_per_second": 59596.412,
8
+ "train_steps_per_second": 3724.776
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,2342 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 5.546974182128906,
3
+ "best_model_checkpoint": "/users/hr1171/scratch/T5LA/checkpoint-100000",
4
+ "epoch": 1.00001,
5
+ "eval_steps": 1000,
6
+ "global_step": 100001,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.005,
13
+ "grad_norm": 0.06483318656682968,
14
+ "learning_rate": 4.975e-05,
15
+ "loss": 10.0248,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 0.01,
20
+ "grad_norm": 0.07472355663776398,
21
+ "learning_rate": 4.9500000000000004e-05,
22
+ "loss": 9.4056,
23
+ "step": 1000
24
+ },
25
+ {
26
+ "epoch": 0.01,
27
+ "eval_accuracy": 0.043492843878108624,
28
+ "eval_loss": 9.121541023254395,
29
+ "eval_runtime": 110.4799,
30
+ "eval_samples_per_second": 32.513,
31
+ "eval_steps_per_second": 2.037,
32
+ "step": 1000
33
+ },
34
+ {
35
+ "epoch": 0.015,
36
+ "grad_norm": 0.08399348706007004,
37
+ "learning_rate": 4.9250000000000004e-05,
38
+ "loss": 8.8698,
39
+ "step": 1500
40
+ },
41
+ {
42
+ "epoch": 0.02,
43
+ "grad_norm": 0.08570190519094467,
44
+ "learning_rate": 4.9e-05,
45
+ "loss": 8.4062,
46
+ "step": 2000
47
+ },
48
+ {
49
+ "epoch": 0.02,
50
+ "eval_accuracy": 0.04429728167514647,
51
+ "eval_loss": 8.193856239318848,
52
+ "eval_runtime": 112.0891,
53
+ "eval_samples_per_second": 32.046,
54
+ "eval_steps_per_second": 2.007,
55
+ "step": 2000
56
+ },
57
+ {
58
+ "epoch": 0.025,
59
+ "grad_norm": 0.08781777322292328,
60
+ "learning_rate": 4.875e-05,
61
+ "loss": 8.0175,
62
+ "step": 2500
63
+ },
64
+ {
65
+ "epoch": 0.03,
66
+ "grad_norm": 0.07474139332771301,
67
+ "learning_rate": 4.85e-05,
68
+ "loss": 7.7307,
69
+ "step": 3000
70
+ },
71
+ {
72
+ "epoch": 0.03,
73
+ "eval_accuracy": 0.04438749518317016,
74
+ "eval_loss": 7.602397918701172,
75
+ "eval_runtime": 112.6714,
76
+ "eval_samples_per_second": 31.88,
77
+ "eval_steps_per_second": 1.997,
78
+ "step": 3000
79
+ },
80
+ {
81
+ "epoch": 0.035,
82
+ "grad_norm": 0.07133983075618744,
83
+ "learning_rate": 4.825e-05,
84
+ "loss": 7.5203,
85
+ "step": 3500
86
+ },
87
+ {
88
+ "epoch": 0.04,
89
+ "grad_norm": 0.04410657659173012,
90
+ "learning_rate": 4.8e-05,
91
+ "loss": 7.39,
92
+ "step": 4000
93
+ },
94
+ {
95
+ "epoch": 0.04,
96
+ "eval_accuracy": 0.0443981085370553,
97
+ "eval_loss": 7.333784103393555,
98
+ "eval_runtime": 111.8336,
99
+ "eval_samples_per_second": 32.119,
100
+ "eval_steps_per_second": 2.012,
101
+ "step": 4000
102
+ },
103
+ {
104
+ "epoch": 0.045,
105
+ "grad_norm": 0.03503479063510895,
106
+ "learning_rate": 4.775e-05,
107
+ "loss": 7.3043,
108
+ "step": 4500
109
+ },
110
+ {
111
+ "epoch": 0.05,
112
+ "grad_norm": 0.039906181395053864,
113
+ "learning_rate": 4.75e-05,
114
+ "loss": 7.2546,
115
+ "step": 5000
116
+ },
117
+ {
118
+ "epoch": 0.05,
119
+ "eval_accuracy": 0.04405276632986957,
120
+ "eval_loss": 7.245168685913086,
121
+ "eval_runtime": 113.4504,
122
+ "eval_samples_per_second": 31.661,
123
+ "eval_steps_per_second": 1.983,
124
+ "step": 5000
125
+ },
126
+ {
127
+ "epoch": 0.055,
128
+ "grad_norm": 0.04767516627907753,
129
+ "learning_rate": 4.7249999999999997e-05,
130
+ "loss": 7.2409,
131
+ "step": 5500
132
+ },
133
+ {
134
+ "epoch": 0.06,
135
+ "grad_norm": 0.07150296866893768,
136
+ "learning_rate": 4.7e-05,
137
+ "loss": 7.1985,
138
+ "step": 6000
139
+ },
140
+ {
141
+ "epoch": 0.06,
142
+ "eval_accuracy": 0.03687500952480477,
143
+ "eval_loss": 7.168196201324463,
144
+ "eval_runtime": 110.8688,
145
+ "eval_samples_per_second": 32.399,
146
+ "eval_steps_per_second": 2.029,
147
+ "step": 6000
148
+ },
149
+ {
150
+ "epoch": 0.065,
151
+ "grad_norm": 0.10620597004890442,
152
+ "learning_rate": 4.6750000000000005e-05,
153
+ "loss": 7.1475,
154
+ "step": 6500
155
+ },
156
+ {
157
+ "epoch": 0.07,
158
+ "grad_norm": 0.09856470674276352,
159
+ "learning_rate": 4.6500000000000005e-05,
160
+ "loss": 7.1009,
161
+ "step": 7000
162
+ },
163
+ {
164
+ "epoch": 0.07,
165
+ "eval_accuracy": 0.03457599379091584,
166
+ "eval_loss": 7.071776866912842,
167
+ "eval_runtime": 115.3179,
168
+ "eval_samples_per_second": 31.149,
169
+ "eval_steps_per_second": 1.951,
170
+ "step": 7000
171
+ },
172
+ {
173
+ "epoch": 0.075,
174
+ "grad_norm": 0.12105035036802292,
175
+ "learning_rate": 4.6250000000000006e-05,
176
+ "loss": 7.0493,
177
+ "step": 7500
178
+ },
179
+ {
180
+ "epoch": 0.08,
181
+ "grad_norm": 0.1322629749774933,
182
+ "learning_rate": 4.600000000000001e-05,
183
+ "loss": 7.004,
184
+ "step": 8000
185
+ },
186
+ {
187
+ "epoch": 0.08,
188
+ "eval_accuracy": 0.03315856677269135,
189
+ "eval_loss": 6.977820873260498,
190
+ "eval_runtime": 110.4899,
191
+ "eval_samples_per_second": 32.51,
192
+ "eval_steps_per_second": 2.036,
193
+ "step": 8000
194
+ },
195
+ {
196
+ "epoch": 0.085,
197
+ "grad_norm": 0.16380475461483002,
198
+ "learning_rate": 4.575e-05,
199
+ "loss": 6.9535,
200
+ "step": 8500
201
+ },
202
+ {
203
+ "epoch": 0.09,
204
+ "grad_norm": 0.1642669439315796,
205
+ "learning_rate": 4.55e-05,
206
+ "loss": 6.9159,
207
+ "step": 9000
208
+ },
209
+ {
210
+ "epoch": 0.09,
211
+ "eval_accuracy": 0.032452642670689945,
212
+ "eval_loss": 6.896429061889648,
213
+ "eval_runtime": 112.221,
214
+ "eval_samples_per_second": 32.008,
215
+ "eval_steps_per_second": 2.005,
216
+ "step": 9000
217
+ },
218
+ {
219
+ "epoch": 0.095,
220
+ "grad_norm": 0.2253541350364685,
221
+ "learning_rate": 4.525e-05,
222
+ "loss": 6.8866,
223
+ "step": 9500
224
+ },
225
+ {
226
+ "epoch": 0.1,
227
+ "grad_norm": 0.20757266879081726,
228
+ "learning_rate": 4.5e-05,
229
+ "loss": 6.8548,
230
+ "step": 10000
231
+ },
232
+ {
233
+ "epoch": 0.1,
234
+ "eval_accuracy": 0.03252571153012995,
235
+ "eval_loss": 6.830728054046631,
236
+ "eval_runtime": 112.9087,
237
+ "eval_samples_per_second": 31.813,
238
+ "eval_steps_per_second": 1.993,
239
+ "step": 10000
240
+ },
241
+ {
242
+ "epoch": 0.105,
243
+ "grad_norm": 0.21346400678157806,
244
+ "learning_rate": 4.4750000000000004e-05,
245
+ "loss": 6.8302,
246
+ "step": 10500
247
+ },
248
+ {
249
+ "epoch": 0.11,
250
+ "grad_norm": 0.24308614432811737,
251
+ "learning_rate": 4.4500000000000004e-05,
252
+ "loss": 6.7833,
253
+ "step": 11000
254
+ },
255
+ {
256
+ "epoch": 0.11,
257
+ "eval_accuracy": 0.032619326754142475,
258
+ "eval_loss": 6.77016544342041,
259
+ "eval_runtime": 112.3436,
260
+ "eval_samples_per_second": 31.973,
261
+ "eval_steps_per_second": 2.003,
262
+ "step": 11000
263
+ },
264
+ {
265
+ "epoch": 0.115,
266
+ "grad_norm": 0.27737605571746826,
267
+ "learning_rate": 4.4250000000000005e-05,
268
+ "loss": 6.7775,
269
+ "step": 11500
270
+ },
271
+ {
272
+ "epoch": 0.12,
273
+ "grad_norm": 0.3234957158565521,
274
+ "learning_rate": 4.4000000000000006e-05,
275
+ "loss": 6.7376,
276
+ "step": 12000
277
+ },
278
+ {
279
+ "epoch": 0.12,
280
+ "eval_accuracy": 0.03369168370246034,
281
+ "eval_loss": 6.716267108917236,
282
+ "eval_runtime": 111.859,
283
+ "eval_samples_per_second": 32.112,
284
+ "eval_steps_per_second": 2.011,
285
+ "step": 12000
286
+ },
287
+ {
288
+ "epoch": 0.125,
289
+ "grad_norm": 0.27591678500175476,
290
+ "learning_rate": 4.375e-05,
291
+ "loss": 6.7188,
292
+ "step": 12500
293
+ },
294
+ {
295
+ "epoch": 0.13,
296
+ "grad_norm": 0.31338784098625183,
297
+ "learning_rate": 4.35005e-05,
298
+ "loss": 6.6821,
299
+ "step": 13000
300
+ },
301
+ {
302
+ "epoch": 0.13,
303
+ "eval_accuracy": 0.03455980162280902,
304
+ "eval_loss": 6.661470413208008,
305
+ "eval_runtime": 120.7993,
306
+ "eval_samples_per_second": 29.735,
307
+ "eval_steps_per_second": 1.863,
308
+ "step": 13000
309
+ },
310
+ {
311
+ "epoch": 0.135,
312
+ "grad_norm": 0.513124406337738,
313
+ "learning_rate": 4.32505e-05,
314
+ "loss": 6.6565,
315
+ "step": 13500
316
+ },
317
+ {
318
+ "epoch": 0.14,
319
+ "grad_norm": 0.40841469168663025,
320
+ "learning_rate": 4.30005e-05,
321
+ "loss": 6.6373,
322
+ "step": 14000
323
+ },
324
+ {
325
+ "epoch": 0.14,
326
+ "eval_accuracy": 0.03493249362654492,
327
+ "eval_loss": 6.608607292175293,
328
+ "eval_runtime": 111.8763,
329
+ "eval_samples_per_second": 32.107,
330
+ "eval_steps_per_second": 2.011,
331
+ "step": 14000
332
+ },
333
+ {
334
+ "epoch": 0.145,
335
+ "grad_norm": 0.3946859538555145,
336
+ "learning_rate": 4.2750500000000003e-05,
337
+ "loss": 6.6188,
338
+ "step": 14500
339
+ },
340
+ {
341
+ "epoch": 0.15,
342
+ "grad_norm": 0.48516473174095154,
343
+ "learning_rate": 4.2501000000000005e-05,
344
+ "loss": 6.5895,
345
+ "step": 15000
346
+ },
347
+ {
348
+ "epoch": 0.15,
349
+ "eval_accuracy": 0.0343823681168318,
350
+ "eval_loss": 6.556947231292725,
351
+ "eval_runtime": 112.7205,
352
+ "eval_samples_per_second": 31.866,
353
+ "eval_steps_per_second": 1.996,
354
+ "step": 15000
355
+ },
356
+ {
357
+ "epoch": 0.155,
358
+ "grad_norm": 0.543297529220581,
359
+ "learning_rate": 4.22515e-05,
360
+ "loss": 6.5555,
361
+ "step": 15500
362
+ },
363
+ {
364
+ "epoch": 0.16,
365
+ "grad_norm": 0.46242478489875793,
366
+ "learning_rate": 4.200150000000001e-05,
367
+ "loss": 6.5421,
368
+ "step": 16000
369
+ },
370
+ {
371
+ "epoch": 0.16,
372
+ "eval_accuracy": 0.035396487687420944,
373
+ "eval_loss": 6.511915683746338,
374
+ "eval_runtime": 111.3995,
375
+ "eval_samples_per_second": 32.244,
376
+ "eval_steps_per_second": 2.02,
377
+ "step": 16000
378
+ },
379
+ {
380
+ "epoch": 0.165,
381
+ "grad_norm": 0.5675227642059326,
382
+ "learning_rate": 4.17515e-05,
383
+ "loss": 6.5233,
384
+ "step": 16500
385
+ },
386
+ {
387
+ "epoch": 0.17,
388
+ "grad_norm": 0.5619395971298218,
389
+ "learning_rate": 4.15015e-05,
390
+ "loss": 6.5051,
391
+ "step": 17000
392
+ },
393
+ {
394
+ "epoch": 0.17,
395
+ "eval_accuracy": 0.035478945282990115,
396
+ "eval_loss": 6.467820644378662,
397
+ "eval_runtime": 111.7569,
398
+ "eval_samples_per_second": 32.141,
399
+ "eval_steps_per_second": 2.013,
400
+ "step": 17000
401
+ },
402
+ {
403
+ "epoch": 0.175,
404
+ "grad_norm": 0.4741845726966858,
405
+ "learning_rate": 4.1252000000000004e-05,
406
+ "loss": 6.491,
407
+ "step": 17500
408
+ },
409
+ {
410
+ "epoch": 0.18,
411
+ "grad_norm": 0.5824158191680908,
412
+ "learning_rate": 4.1002000000000005e-05,
413
+ "loss": 6.4391,
414
+ "step": 18000
415
+ },
416
+ {
417
+ "epoch": 0.18,
418
+ "eval_accuracy": 0.03599301260322167,
419
+ "eval_loss": 6.432408332824707,
420
+ "eval_runtime": 113.2507,
421
+ "eval_samples_per_second": 31.717,
422
+ "eval_steps_per_second": 1.987,
423
+ "step": 18000
424
+ },
425
+ {
426
+ "epoch": 0.185,
427
+ "grad_norm": 0.740391194820404,
428
+ "learning_rate": 4.0752e-05,
429
+ "loss": 6.4602,
430
+ "step": 18500
431
+ },
432
+ {
433
+ "epoch": 0.19,
434
+ "grad_norm": 0.55320805311203,
435
+ "learning_rate": 4.0502e-05,
436
+ "loss": 6.4242,
437
+ "step": 19000
438
+ },
439
+ {
440
+ "epoch": 0.19,
441
+ "eval_accuracy": 0.03552126262989112,
442
+ "eval_loss": 6.40146541595459,
443
+ "eval_runtime": 111.8361,
444
+ "eval_samples_per_second": 32.118,
445
+ "eval_steps_per_second": 2.012,
446
+ "step": 19000
447
+ },
448
+ {
449
+ "epoch": 0.195,
450
+ "grad_norm": 0.7065308094024658,
451
+ "learning_rate": 4.0252e-05,
452
+ "loss": 6.4047,
453
+ "step": 19500
454
+ },
455
+ {
456
+ "epoch": 0.2,
457
+ "grad_norm": 0.6466010212898254,
458
+ "learning_rate": 4.0003e-05,
459
+ "loss": 6.3889,
460
+ "step": 20000
461
+ },
462
+ {
463
+ "epoch": 0.2,
464
+ "eval_accuracy": 0.03732308355485308,
465
+ "eval_loss": 6.355311393737793,
466
+ "eval_runtime": 112.8526,
467
+ "eval_samples_per_second": 31.829,
468
+ "eval_steps_per_second": 1.994,
469
+ "step": 20000
470
+ },
471
+ {
472
+ "epoch": 0.205,
473
+ "grad_norm": 0.5905252695083618,
474
+ "learning_rate": 3.9753000000000004e-05,
475
+ "loss": 6.3693,
476
+ "step": 20500
477
+ },
478
+ {
479
+ "epoch": 0.21,
480
+ "grad_norm": 0.5333964824676514,
481
+ "learning_rate": 3.9503000000000004e-05,
482
+ "loss": 6.3631,
483
+ "step": 21000
484
+ },
485
+ {
486
+ "epoch": 0.21,
487
+ "eval_accuracy": 0.036739757297089004,
488
+ "eval_loss": 6.328461647033691,
489
+ "eval_runtime": 112.8636,
490
+ "eval_samples_per_second": 31.826,
491
+ "eval_steps_per_second": 1.994,
492
+ "step": 21000
493
+ },
494
+ {
495
+ "epoch": 0.215,
496
+ "grad_norm": 0.5893262028694153,
497
+ "learning_rate": 3.9253e-05,
498
+ "loss": 6.3469,
499
+ "step": 21500
500
+ },
501
+ {
502
+ "epoch": 0.22,
503
+ "grad_norm": 0.596720278263092,
504
+ "learning_rate": 3.9003e-05,
505
+ "loss": 6.3296,
506
+ "step": 22000
507
+ },
508
+ {
509
+ "epoch": 0.22,
510
+ "eval_accuracy": 0.03686983891650175,
511
+ "eval_loss": 6.3014631271362305,
512
+ "eval_runtime": 111.598,
513
+ "eval_samples_per_second": 32.187,
514
+ "eval_steps_per_second": 2.016,
515
+ "step": 22000
516
+ },
517
+ {
518
+ "epoch": 0.225,
519
+ "grad_norm": 0.49506717920303345,
520
+ "learning_rate": 3.8753e-05,
521
+ "loss": 6.3161,
522
+ "step": 22500
523
+ },
524
+ {
525
+ "epoch": 0.23,
526
+ "grad_norm": 0.6392377614974976,
527
+ "learning_rate": 3.8503e-05,
528
+ "loss": 6.3081,
529
+ "step": 23000
530
+ },
531
+ {
532
+ "epoch": 0.23,
533
+ "eval_accuracy": 0.03635019278204852,
534
+ "eval_loss": 6.269931316375732,
535
+ "eval_runtime": 112.6212,
536
+ "eval_samples_per_second": 31.895,
537
+ "eval_steps_per_second": 1.998,
538
+ "step": 23000
539
+ },
540
+ {
541
+ "epoch": 0.235,
542
+ "grad_norm": 0.5753453969955444,
543
+ "learning_rate": 3.8253e-05,
544
+ "loss": 6.2806,
545
+ "step": 23500
546
+ },
547
+ {
548
+ "epoch": 0.24,
549
+ "grad_norm": 0.5371689796447754,
550
+ "learning_rate": 3.80035e-05,
551
+ "loss": 6.2784,
552
+ "step": 24000
553
+ },
554
+ {
555
+ "epoch": 0.24,
556
+ "eval_accuracy": 0.036991620348901764,
557
+ "eval_loss": 6.2454352378845215,
558
+ "eval_runtime": 115.1822,
559
+ "eval_samples_per_second": 31.185,
560
+ "eval_steps_per_second": 1.953,
561
+ "step": 24000
562
+ },
563
+ {
564
+ "epoch": 0.245,
565
+ "grad_norm": 0.5090768337249756,
566
+ "learning_rate": 3.7753500000000004e-05,
567
+ "loss": 6.2632,
568
+ "step": 24500
569
+ },
570
+ {
571
+ "epoch": 0.25,
572
+ "grad_norm": 0.5666926503181458,
573
+ "learning_rate": 3.7503500000000004e-05,
574
+ "loss": 6.2589,
575
+ "step": 25000
576
+ },
577
+ {
578
+ "epoch": 0.25,
579
+ "eval_accuracy": 0.03744921918371879,
580
+ "eval_loss": 6.216660499572754,
581
+ "eval_runtime": 112.1843,
582
+ "eval_samples_per_second": 32.019,
583
+ "eval_steps_per_second": 2.006,
584
+ "step": 25000
585
+ },
586
+ {
587
+ "epoch": 0.255,
588
+ "grad_norm": 0.5514094233512878,
589
+ "learning_rate": 3.7253500000000005e-05,
590
+ "loss": 6.2441,
591
+ "step": 25500
592
+ },
593
+ {
594
+ "epoch": 0.26,
595
+ "grad_norm": 0.5683887600898743,
596
+ "learning_rate": 3.7004e-05,
597
+ "loss": 6.2371,
598
+ "step": 26000
599
+ },
600
+ {
601
+ "epoch": 0.26,
602
+ "eval_accuracy": 0.03695529002214109,
603
+ "eval_loss": 6.189018726348877,
604
+ "eval_runtime": 110.9014,
605
+ "eval_samples_per_second": 32.389,
606
+ "eval_steps_per_second": 2.029,
607
+ "step": 26000
608
+ },
609
+ {
610
+ "epoch": 0.265,
611
+ "grad_norm": 0.6400378346443176,
612
+ "learning_rate": 3.6754e-05,
613
+ "loss": 6.2257,
614
+ "step": 26500
615
+ },
616
+ {
617
+ "epoch": 0.27,
618
+ "grad_norm": 0.5867771506309509,
619
+ "learning_rate": 3.6504e-05,
620
+ "loss": 6.1978,
621
+ "step": 27000
622
+ },
623
+ {
624
+ "epoch": 0.27,
625
+ "eval_accuracy": 0.0376371299749416,
626
+ "eval_loss": 6.165986061096191,
627
+ "eval_runtime": 112.4076,
628
+ "eval_samples_per_second": 31.955,
629
+ "eval_steps_per_second": 2.002,
630
+ "step": 27000
631
+ },
632
+ {
633
+ "epoch": 0.275,
634
+ "grad_norm": 0.7539021372795105,
635
+ "learning_rate": 3.62545e-05,
636
+ "loss": 6.2078,
637
+ "step": 27500
638
+ },
639
+ {
640
+ "epoch": 0.28,
641
+ "grad_norm": 0.5702797174453735,
642
+ "learning_rate": 3.6004500000000004e-05,
643
+ "loss": 6.1895,
644
+ "step": 28000
645
+ },
646
+ {
647
+ "epoch": 0.28,
648
+ "eval_accuracy": 0.03748677412823544,
649
+ "eval_loss": 6.1377997398376465,
650
+ "eval_runtime": 111.0948,
651
+ "eval_samples_per_second": 32.333,
652
+ "eval_steps_per_second": 2.025,
653
+ "step": 28000
654
+ },
655
+ {
656
+ "epoch": 0.285,
657
+ "grad_norm": 0.5679642558097839,
658
+ "learning_rate": 3.5754500000000005e-05,
659
+ "loss": 6.1921,
660
+ "step": 28500
661
+ },
662
+ {
663
+ "epoch": 0.29,
664
+ "grad_norm": 0.6666756868362427,
665
+ "learning_rate": 3.55045e-05,
666
+ "loss": 6.1636,
667
+ "step": 29000
668
+ },
669
+ {
670
+ "epoch": 0.29,
671
+ "eval_accuracy": 0.03662328254163156,
672
+ "eval_loss": 6.121272087097168,
673
+ "eval_runtime": 111.2726,
674
+ "eval_samples_per_second": 32.281,
675
+ "eval_steps_per_second": 2.022,
676
+ "step": 29000
677
+ },
678
+ {
679
+ "epoch": 0.295,
680
+ "grad_norm": 0.5925132632255554,
681
+ "learning_rate": 3.52545e-05,
682
+ "loss": 6.1639,
683
+ "step": 29500
684
+ },
685
+ {
686
+ "epoch": 0.3,
687
+ "grad_norm": 0.5706653594970703,
688
+ "learning_rate": 3.50045e-05,
689
+ "loss": 6.1262,
690
+ "step": 30000
691
+ },
692
+ {
693
+ "epoch": 0.3,
694
+ "eval_accuracy": 0.03699611061400702,
695
+ "eval_loss": 6.096695899963379,
696
+ "eval_runtime": 112.2709,
697
+ "eval_samples_per_second": 31.994,
698
+ "eval_steps_per_second": 2.004,
699
+ "step": 30000
700
+ },
701
+ {
702
+ "epoch": 0.305,
703
+ "grad_norm": 0.6464671492576599,
704
+ "learning_rate": 3.47545e-05,
705
+ "loss": 6.1365,
706
+ "step": 30500
707
+ },
708
+ {
709
+ "epoch": 0.31,
710
+ "grad_norm": 0.5408177375793457,
711
+ "learning_rate": 3.4505e-05,
712
+ "loss": 6.1345,
713
+ "step": 31000
714
+ },
715
+ {
716
+ "epoch": 0.31,
717
+ "eval_accuracy": 0.03612323029127397,
718
+ "eval_loss": 6.074455738067627,
719
+ "eval_runtime": 113.9504,
720
+ "eval_samples_per_second": 31.522,
721
+ "eval_steps_per_second": 1.975,
722
+ "step": 31000
723
+ },
724
+ {
725
+ "epoch": 0.315,
726
+ "grad_norm": 0.5881854891777039,
727
+ "learning_rate": 3.4255e-05,
728
+ "loss": 6.1113,
729
+ "step": 31500
730
+ },
731
+ {
732
+ "epoch": 0.32,
733
+ "grad_norm": 0.6097517609596252,
734
+ "learning_rate": 3.4005000000000004e-05,
735
+ "loss": 6.1096,
736
+ "step": 32000
737
+ },
738
+ {
739
+ "epoch": 0.32,
740
+ "eval_accuracy": 0.03600621126125832,
741
+ "eval_loss": 6.055612087249756,
742
+ "eval_runtime": 111.7549,
743
+ "eval_samples_per_second": 32.142,
744
+ "eval_steps_per_second": 2.013,
745
+ "step": 32000
746
+ },
747
+ {
748
+ "epoch": 0.325,
749
+ "grad_norm": 0.6643844246864319,
750
+ "learning_rate": 3.3755000000000005e-05,
751
+ "loss": 6.1049,
752
+ "step": 32500
753
+ },
754
+ {
755
+ "epoch": 0.33,
756
+ "grad_norm": 0.6951597929000854,
757
+ "learning_rate": 3.3505000000000005e-05,
758
+ "loss": 6.0794,
759
+ "step": 33000
760
+ },
761
+ {
762
+ "epoch": 0.33,
763
+ "eval_accuracy": 0.03573761176678053,
764
+ "eval_loss": 6.041288375854492,
765
+ "eval_runtime": 112.9798,
766
+ "eval_samples_per_second": 31.793,
767
+ "eval_steps_per_second": 1.992,
768
+ "step": 33000
769
+ },
770
+ {
771
+ "epoch": 0.335,
772
+ "grad_norm": 0.8083083033561707,
773
+ "learning_rate": 3.3255000000000006e-05,
774
+ "loss": 6.1019,
775
+ "step": 33500
776
+ },
777
+ {
778
+ "epoch": 0.34,
779
+ "grad_norm": 0.6118751764297485,
780
+ "learning_rate": 3.3005e-05,
781
+ "loss": 6.0643,
782
+ "step": 34000
783
+ },
784
+ {
785
+ "epoch": 0.34,
786
+ "eval_accuracy": 0.03625793824443153,
787
+ "eval_loss": 6.013612270355225,
788
+ "eval_runtime": 111.1363,
789
+ "eval_samples_per_second": 32.321,
790
+ "eval_steps_per_second": 2.025,
791
+ "step": 34000
792
+ },
793
+ {
794
+ "epoch": 0.345,
795
+ "grad_norm": 0.6557193398475647,
796
+ "learning_rate": 3.27555e-05,
797
+ "loss": 6.0551,
798
+ "step": 34500
799
+ },
800
+ {
801
+ "epoch": 0.35,
802
+ "grad_norm": 0.5440818667411804,
803
+ "learning_rate": 3.25055e-05,
804
+ "loss": 6.057,
805
+ "step": 35000
806
+ },
807
+ {
808
+ "epoch": 0.35,
809
+ "eval_accuracy": 0.03619099247377141,
810
+ "eval_loss": 5.996499538421631,
811
+ "eval_runtime": 114.4917,
812
+ "eval_samples_per_second": 31.373,
813
+ "eval_steps_per_second": 1.965,
814
+ "step": 35000
815
+ },
816
+ {
817
+ "epoch": 0.355,
818
+ "grad_norm": 0.5865129828453064,
819
+ "learning_rate": 3.2255499999999996e-05,
820
+ "loss": 6.0495,
821
+ "step": 35500
822
+ },
823
+ {
824
+ "epoch": 0.36,
825
+ "grad_norm": 0.598119854927063,
826
+ "learning_rate": 3.20055e-05,
827
+ "loss": 6.0337,
828
+ "step": 36000
829
+ },
830
+ {
831
+ "epoch": 0.36,
832
+ "eval_accuracy": 0.03535702778195055,
833
+ "eval_loss": 5.980613708496094,
834
+ "eval_runtime": 112.638,
835
+ "eval_samples_per_second": 31.89,
836
+ "eval_steps_per_second": 1.998,
837
+ "step": 36000
838
+ },
839
+ {
840
+ "epoch": 0.365,
841
+ "grad_norm": 0.5838174819946289,
842
+ "learning_rate": 3.1756000000000006e-05,
843
+ "loss": 6.0167,
844
+ "step": 36500
845
+ },
846
+ {
847
+ "epoch": 0.37,
848
+ "grad_norm": 0.7071039080619812,
849
+ "learning_rate": 3.1506e-05,
850
+ "loss": 6.0217,
851
+ "step": 37000
852
+ },
853
+ {
854
+ "epoch": 0.37,
855
+ "eval_accuracy": 0.03629073078656382,
856
+ "eval_loss": 5.958355903625488,
857
+ "eval_runtime": 112.2627,
858
+ "eval_samples_per_second": 31.996,
859
+ "eval_steps_per_second": 2.004,
860
+ "step": 37000
861
+ },
862
+ {
863
+ "epoch": 0.375,
864
+ "grad_norm": 0.606131374835968,
865
+ "learning_rate": 3.1256e-05,
866
+ "loss": 6.0086,
867
+ "step": 37500
868
+ },
869
+ {
870
+ "epoch": 0.38,
871
+ "grad_norm": 0.6173387169837952,
872
+ "learning_rate": 3.1006e-05,
873
+ "loss": 6.0045,
874
+ "step": 38000
875
+ },
876
+ {
877
+ "epoch": 0.38,
878
+ "eval_accuracy": 0.035918719126025685,
879
+ "eval_loss": 5.952617168426514,
880
+ "eval_runtime": 116.0654,
881
+ "eval_samples_per_second": 30.948,
882
+ "eval_steps_per_second": 1.939,
883
+ "step": 38000
884
+ },
885
+ {
886
+ "epoch": 0.385,
887
+ "grad_norm": 0.6699332594871521,
888
+ "learning_rate": 3.0756e-05,
889
+ "loss": 5.9996,
890
+ "step": 38500
891
+ },
892
+ {
893
+ "epoch": 0.39,
894
+ "grad_norm": 0.6398167610168457,
895
+ "learning_rate": 3.0506000000000002e-05,
896
+ "loss": 5.9896,
897
+ "step": 39000
898
+ },
899
+ {
900
+ "epoch": 0.39,
901
+ "eval_accuracy": 0.0355204462180538,
902
+ "eval_loss": 5.9288482666015625,
903
+ "eval_runtime": 110.5071,
904
+ "eval_samples_per_second": 32.505,
905
+ "eval_steps_per_second": 2.036,
906
+ "step": 39000
907
+ },
908
+ {
909
+ "epoch": 0.395,
910
+ "grad_norm": 0.7196072936058044,
911
+ "learning_rate": 3.0256e-05,
912
+ "loss": 5.9879,
913
+ "step": 39500
914
+ },
915
+ {
916
+ "epoch": 0.4,
917
+ "grad_norm": 0.6065074801445007,
918
+ "learning_rate": 3.0006e-05,
919
+ "loss": 5.9711,
920
+ "step": 40000
921
+ },
922
+ {
923
+ "epoch": 0.4,
924
+ "eval_accuracy": 0.03516381031378517,
925
+ "eval_loss": 5.915222644805908,
926
+ "eval_runtime": 110.7645,
927
+ "eval_samples_per_second": 32.429,
928
+ "eval_steps_per_second": 2.031,
929
+ "step": 40000
930
+ },
931
+ {
932
+ "epoch": 0.405,
933
+ "grad_norm": 0.6913782954216003,
934
+ "learning_rate": 2.9757e-05,
935
+ "loss": 5.9671,
936
+ "step": 40500
937
+ },
938
+ {
939
+ "epoch": 0.41,
940
+ "grad_norm": 0.6546822190284729,
941
+ "learning_rate": 2.9507e-05,
942
+ "loss": 5.9629,
943
+ "step": 41000
944
+ },
945
+ {
946
+ "epoch": 0.41,
947
+ "eval_accuracy": 0.03489303372107453,
948
+ "eval_loss": 5.896161079406738,
949
+ "eval_runtime": 111.3465,
950
+ "eval_samples_per_second": 32.26,
951
+ "eval_steps_per_second": 2.021,
952
+ "step": 41000
953
+ },
954
+ {
955
+ "epoch": 0.415,
956
+ "grad_norm": 0.6649766564369202,
957
+ "learning_rate": 2.9257e-05,
958
+ "loss": 5.9631,
959
+ "step": 41500
960
+ },
961
+ {
962
+ "epoch": 0.42,
963
+ "grad_norm": 0.7166668176651001,
964
+ "learning_rate": 2.9007000000000002e-05,
965
+ "loss": 5.9465,
966
+ "step": 42000
967
+ },
968
+ {
969
+ "epoch": 0.42,
970
+ "eval_accuracy": 0.035887695476207584,
971
+ "eval_loss": 5.882122993469238,
972
+ "eval_runtime": 110.5783,
973
+ "eval_samples_per_second": 32.484,
974
+ "eval_steps_per_second": 2.035,
975
+ "step": 42000
976
+ },
977
+ {
978
+ "epoch": 0.425,
979
+ "grad_norm": 0.6148796081542969,
980
+ "learning_rate": 2.8757e-05,
981
+ "loss": 5.9383,
982
+ "step": 42500
983
+ },
984
+ {
985
+ "epoch": 0.43,
986
+ "grad_norm": 0.712359607219696,
987
+ "learning_rate": 2.8507e-05,
988
+ "loss": 5.9463,
989
+ "step": 43000
990
+ },
991
+ {
992
+ "epoch": 0.43,
993
+ "eval_accuracy": 0.034527145149316284,
994
+ "eval_loss": 5.869154930114746,
995
+ "eval_runtime": 113.9173,
996
+ "eval_samples_per_second": 31.532,
997
+ "eval_steps_per_second": 1.975,
998
+ "step": 43000
999
+ },
1000
+ {
1001
+ "epoch": 0.435,
1002
+ "grad_norm": 0.6819197535514832,
1003
+ "learning_rate": 2.8257e-05,
1004
+ "loss": 5.9363,
1005
+ "step": 43500
1006
+ },
1007
+ {
1008
+ "epoch": 0.44,
1009
+ "grad_norm": 0.7428186535835266,
1010
+ "learning_rate": 2.8007e-05,
1011
+ "loss": 5.9317,
1012
+ "step": 44000
1013
+ },
1014
+ {
1015
+ "epoch": 0.44,
1016
+ "eval_accuracy": 0.034260450615792234,
1017
+ "eval_loss": 5.869858741760254,
1018
+ "eval_runtime": 111.9079,
1019
+ "eval_samples_per_second": 32.098,
1020
+ "eval_steps_per_second": 2.011,
1021
+ "step": 44000
1022
+ },
1023
+ {
1024
+ "epoch": 0.445,
1025
+ "grad_norm": 0.6326854825019836,
1026
+ "learning_rate": 2.7757500000000003e-05,
1027
+ "loss": 5.9114,
1028
+ "step": 44500
1029
+ },
1030
+ {
1031
+ "epoch": 1.00345,
1032
+ "grad_norm": 0.6722562313079834,
1033
+ "learning_rate": 2.7507500000000004e-05,
1034
+ "loss": 5.9097,
1035
+ "step": 45000
1036
+ },
1037
+ {
1038
+ "epoch": 1.00345,
1039
+ "eval_accuracy": 0.034564291887914274,
1040
+ "eval_loss": 5.8482561111450195,
1041
+ "eval_runtime": 111.3739,
1042
+ "eval_samples_per_second": 32.252,
1043
+ "eval_steps_per_second": 2.02,
1044
+ "step": 45000
1045
+ },
1046
+ {
1047
+ "epoch": 1.00845,
1048
+ "grad_norm": 0.6160574555397034,
1049
+ "learning_rate": 2.72575e-05,
1050
+ "loss": 5.8948,
1051
+ "step": 45500
1052
+ },
1053
+ {
1054
+ "epoch": 1.01345,
1055
+ "grad_norm": 0.6146988868713379,
1056
+ "learning_rate": 2.7007500000000002e-05,
1057
+ "loss": 5.9107,
1058
+ "step": 46000
1059
+ },
1060
+ {
1061
+ "epoch": 1.01345,
1062
+ "eval_accuracy": 0.03477846392657083,
1063
+ "eval_loss": 5.8351731300354,
1064
+ "eval_runtime": 112.8785,
1065
+ "eval_samples_per_second": 31.822,
1066
+ "eval_steps_per_second": 1.993,
1067
+ "step": 46000
1068
+ },
1069
+ {
1070
+ "epoch": 1.01845,
1071
+ "grad_norm": 0.6342368721961975,
1072
+ "learning_rate": 2.6758e-05,
1073
+ "loss": 5.893,
1074
+ "step": 46500
1075
+ },
1076
+ {
1077
+ "epoch": 1.02345,
1078
+ "grad_norm": 0.7240073680877686,
1079
+ "learning_rate": 2.6508e-05,
1080
+ "loss": 5.8838,
1081
+ "step": 47000
1082
+ },
1083
+ {
1084
+ "epoch": 1.02345,
1085
+ "eval_accuracy": 0.034256504625245196,
1086
+ "eval_loss": 5.818812847137451,
1087
+ "eval_runtime": 112.1297,
1088
+ "eval_samples_per_second": 32.034,
1089
+ "eval_steps_per_second": 2.007,
1090
+ "step": 47000
1091
+ },
1092
+ {
1093
+ "epoch": 1.02845,
1094
+ "grad_norm": 0.6988590955734253,
1095
+ "learning_rate": 2.6257999999999998e-05,
1096
+ "loss": 5.8907,
1097
+ "step": 47500
1098
+ },
1099
+ {
1100
+ "epoch": 1.03345,
1101
+ "grad_norm": 0.6514461636543274,
1102
+ "learning_rate": 2.6008e-05,
1103
+ "loss": 5.887,
1104
+ "step": 48000
1105
+ },
1106
+ {
1107
+ "epoch": 1.03345,
1108
+ "eval_accuracy": 0.03395511258863511,
1109
+ "eval_loss": 5.808635234832764,
1110
+ "eval_runtime": 112.3746,
1111
+ "eval_samples_per_second": 31.965,
1112
+ "eval_steps_per_second": 2.002,
1113
+ "step": 48000
1114
+ },
1115
+ {
1116
+ "epoch": 1.03845,
1117
+ "grad_norm": 0.6310043931007385,
1118
+ "learning_rate": 2.5758e-05,
1119
+ "loss": 5.873,
1120
+ "step": 48500
1121
+ },
1122
+ {
1123
+ "epoch": 1.04345,
1124
+ "grad_norm": 0.7601247429847717,
1125
+ "learning_rate": 2.5507999999999997e-05,
1126
+ "loss": 5.8563,
1127
+ "step": 49000
1128
+ },
1129
+ {
1130
+ "epoch": 1.04345,
1131
+ "eval_accuracy": 0.03384244775508516,
1132
+ "eval_loss": 5.797094821929932,
1133
+ "eval_runtime": 112.1133,
1134
+ "eval_samples_per_second": 32.039,
1135
+ "eval_steps_per_second": 2.007,
1136
+ "step": 49000
1137
+ },
1138
+ {
1139
+ "epoch": 1.04845,
1140
+ "grad_norm": 0.6604752540588379,
1141
+ "learning_rate": 2.5258500000000002e-05,
1142
+ "loss": 5.8583,
1143
+ "step": 49500
1144
+ },
1145
+ {
1146
+ "epoch": 1.05345,
1147
+ "grad_norm": 0.681904673576355,
1148
+ "learning_rate": 2.5008500000000003e-05,
1149
+ "loss": 5.8576,
1150
+ "step": 50000
1151
+ },
1152
+ {
1153
+ "epoch": 1.05345,
1154
+ "eval_accuracy": 0.03392898740984092,
1155
+ "eval_loss": 5.796751022338867,
1156
+ "eval_runtime": 113.244,
1157
+ "eval_samples_per_second": 31.719,
1158
+ "eval_steps_per_second": 1.987,
1159
+ "step": 50000
1160
+ },
1161
+ {
1162
+ "epoch": 1.0584500000000001,
1163
+ "grad_norm": 0.6337283849716187,
1164
+ "learning_rate": 2.47585e-05,
1165
+ "loss": 5.8495,
1166
+ "step": 50500
1167
+ },
1168
+ {
1169
+ "epoch": 1.06345,
1170
+ "grad_norm": 0.6972376108169556,
1171
+ "learning_rate": 2.45085e-05,
1172
+ "loss": 5.8567,
1173
+ "step": 51000
1174
+ },
1175
+ {
1176
+ "epoch": 1.06345,
1177
+ "eval_accuracy": 0.034259225998036255,
1178
+ "eval_loss": 5.779718399047852,
1179
+ "eval_runtime": 110.7791,
1180
+ "eval_samples_per_second": 32.425,
1181
+ "eval_steps_per_second": 2.031,
1182
+ "step": 51000
1183
+ },
1184
+ {
1185
+ "epoch": 1.06845,
1186
+ "grad_norm": 0.6530519127845764,
1187
+ "learning_rate": 2.42585e-05,
1188
+ "loss": 5.8519,
1189
+ "step": 51500
1190
+ },
1191
+ {
1192
+ "epoch": 1.07345,
1193
+ "grad_norm": 0.6328603625297546,
1194
+ "learning_rate": 2.4009e-05,
1195
+ "loss": 5.841,
1196
+ "step": 52000
1197
+ },
1198
+ {
1199
+ "epoch": 1.07345,
1200
+ "eval_accuracy": 0.033661476464479555,
1201
+ "eval_loss": 5.767716407775879,
1202
+ "eval_runtime": 114.652,
1203
+ "eval_samples_per_second": 31.33,
1204
+ "eval_steps_per_second": 1.962,
1205
+ "step": 52000
1206
+ },
1207
+ {
1208
+ "epoch": 1.07845,
1209
+ "grad_norm": 0.6603217124938965,
1210
+ "learning_rate": 2.3759e-05,
1211
+ "loss": 5.8284,
1212
+ "step": 52500
1213
+ },
1214
+ {
1215
+ "epoch": 1.08345,
1216
+ "grad_norm": 0.6852928400039673,
1217
+ "learning_rate": 2.3509e-05,
1218
+ "loss": 5.8192,
1219
+ "step": 53000
1220
+ },
1221
+ {
1222
+ "epoch": 1.08345,
1223
+ "eval_accuracy": 0.03317965741182208,
1224
+ "eval_loss": 5.76131534576416,
1225
+ "eval_runtime": 111.561,
1226
+ "eval_samples_per_second": 32.198,
1227
+ "eval_steps_per_second": 2.017,
1228
+ "step": 53000
1229
+ },
1230
+ {
1231
+ "epoch": 1.08845,
1232
+ "grad_norm": 0.6923466324806213,
1233
+ "learning_rate": 2.3259e-05,
1234
+ "loss": 5.8147,
1235
+ "step": 53500
1236
+ },
1237
+ {
1238
+ "epoch": 1.09345,
1239
+ "grad_norm": 0.7163240313529968,
1240
+ "learning_rate": 2.3009e-05,
1241
+ "loss": 5.8214,
1242
+ "step": 54000
1243
+ },
1244
+ {
1245
+ "epoch": 1.09345,
1246
+ "eval_accuracy": 0.03383822962725901,
1247
+ "eval_loss": 5.748566150665283,
1248
+ "eval_runtime": 113.5245,
1249
+ "eval_samples_per_second": 31.641,
1250
+ "eval_steps_per_second": 1.982,
1251
+ "step": 54000
1252
+ },
1253
+ {
1254
+ "epoch": 1.09845,
1255
+ "grad_norm": 0.7053197026252747,
1256
+ "learning_rate": 2.27595e-05,
1257
+ "loss": 5.8176,
1258
+ "step": 54500
1259
+ },
1260
+ {
1261
+ "epoch": 1.10345,
1262
+ "grad_norm": 0.7318024039268494,
1263
+ "learning_rate": 2.25095e-05,
1264
+ "loss": 5.8166,
1265
+ "step": 55000
1266
+ },
1267
+ {
1268
+ "epoch": 1.10345,
1269
+ "eval_accuracy": 0.033826935930176105,
1270
+ "eval_loss": 5.74090576171875,
1271
+ "eval_runtime": 112.6548,
1272
+ "eval_samples_per_second": 31.885,
1273
+ "eval_steps_per_second": 1.997,
1274
+ "step": 55000
1275
+ },
1276
+ {
1277
+ "epoch": 1.10845,
1278
+ "grad_norm": 0.727443516254425,
1279
+ "learning_rate": 2.22595e-05,
1280
+ "loss": 5.8054,
1281
+ "step": 55500
1282
+ },
1283
+ {
1284
+ "epoch": 1.11345,
1285
+ "grad_norm": 0.6675511598587036,
1286
+ "learning_rate": 2.2009500000000003e-05,
1287
+ "loss": 5.806,
1288
+ "step": 56000
1289
+ },
1290
+ {
1291
+ "epoch": 1.11345,
1292
+ "eval_accuracy": 0.03327177588079952,
1293
+ "eval_loss": 5.734244346618652,
1294
+ "eval_runtime": 114.861,
1295
+ "eval_samples_per_second": 31.273,
1296
+ "eval_steps_per_second": 1.959,
1297
+ "step": 56000
1298
+ },
1299
+ {
1300
+ "epoch": 1.11845,
1301
+ "grad_norm": 0.7548192143440247,
1302
+ "learning_rate": 2.17595e-05,
1303
+ "loss": 5.7987,
1304
+ "step": 56500
1305
+ },
1306
+ {
1307
+ "epoch": 1.12345,
1308
+ "grad_norm": 0.6868897676467896,
1309
+ "learning_rate": 2.15095e-05,
1310
+ "loss": 5.7961,
1311
+ "step": 57000
1312
+ },
1313
+ {
1314
+ "epoch": 1.12345,
1315
+ "eval_accuracy": 0.033481321585711266,
1316
+ "eval_loss": 5.723623752593994,
1317
+ "eval_runtime": 113.4262,
1318
+ "eval_samples_per_second": 31.668,
1319
+ "eval_steps_per_second": 1.984,
1320
+ "step": 57000
1321
+ },
1322
+ {
1323
+ "epoch": 1.12845,
1324
+ "grad_norm": 0.6014879941940308,
1325
+ "learning_rate": 2.1259500000000002e-05,
1326
+ "loss": 5.7892,
1327
+ "step": 57500
1328
+ },
1329
+ {
1330
+ "epoch": 1.13345,
1331
+ "grad_norm": 0.6603185534477234,
1332
+ "learning_rate": 2.101e-05,
1333
+ "loss": 5.7847,
1334
+ "step": 58000
1335
+ },
1336
+ {
1337
+ "epoch": 1.13345,
1338
+ "eval_accuracy": 0.033258985428681526,
1339
+ "eval_loss": 5.716407775878906,
1340
+ "eval_runtime": 112.5491,
1341
+ "eval_samples_per_second": 31.915,
1342
+ "eval_steps_per_second": 1.999,
1343
+ "step": 58000
1344
+ },
1345
+ {
1346
+ "epoch": 1.13845,
1347
+ "grad_norm": 0.7514461278915405,
1348
+ "learning_rate": 2.076e-05,
1349
+ "loss": 5.7785,
1350
+ "step": 58500
1351
+ },
1352
+ {
1353
+ "epoch": 1.14345,
1354
+ "grad_norm": 0.9591528177261353,
1355
+ "learning_rate": 2.0510000000000002e-05,
1356
+ "loss": 5.787,
1357
+ "step": 59000
1358
+ },
1359
+ {
1360
+ "epoch": 1.14345,
1361
+ "eval_accuracy": 0.03300685023958966,
1362
+ "eval_loss": 5.7095818519592285,
1363
+ "eval_runtime": 112.4202,
1364
+ "eval_samples_per_second": 31.952,
1365
+ "eval_steps_per_second": 2.001,
1366
+ "step": 59000
1367
+ },
1368
+ {
1369
+ "epoch": 1.14845,
1370
+ "grad_norm": 0.6442362666130066,
1371
+ "learning_rate": 2.0260000000000003e-05,
1372
+ "loss": 5.7765,
1373
+ "step": 59500
1374
+ },
1375
+ {
1376
+ "epoch": 1.15345,
1377
+ "grad_norm": 0.7159613370895386,
1378
+ "learning_rate": 2.001e-05,
1379
+ "loss": 5.7711,
1380
+ "step": 60000
1381
+ },
1382
+ {
1383
+ "epoch": 1.15345,
1384
+ "eval_accuracy": 0.03279621598556148,
1385
+ "eval_loss": 5.703481674194336,
1386
+ "eval_runtime": 112.2964,
1387
+ "eval_samples_per_second": 31.987,
1388
+ "eval_steps_per_second": 2.004,
1389
+ "step": 60000
1390
+ },
1391
+ {
1392
+ "epoch": 1.15845,
1393
+ "grad_norm": 0.6875385642051697,
1394
+ "learning_rate": 1.976e-05,
1395
+ "loss": 5.7593,
1396
+ "step": 60500
1397
+ },
1398
+ {
1399
+ "epoch": 1.16345,
1400
+ "grad_norm": 0.6751101016998291,
1401
+ "learning_rate": 1.951e-05,
1402
+ "loss": 5.7699,
1403
+ "step": 61000
1404
+ },
1405
+ {
1406
+ "epoch": 1.16345,
1407
+ "eval_accuracy": 0.03306223017588777,
1408
+ "eval_loss": 5.6887898445129395,
1409
+ "eval_runtime": 113.7541,
1410
+ "eval_samples_per_second": 31.577,
1411
+ "eval_steps_per_second": 1.978,
1412
+ "step": 61000
1413
+ },
1414
+ {
1415
+ "epoch": 1.16845,
1416
+ "grad_norm": 0.7514244914054871,
1417
+ "learning_rate": 1.92605e-05,
1418
+ "loss": 5.7672,
1419
+ "step": 61500
1420
+ },
1421
+ {
1422
+ "epoch": 1.1734499999999999,
1423
+ "grad_norm": 0.697762131690979,
1424
+ "learning_rate": 1.90105e-05,
1425
+ "loss": 5.763,
1426
+ "step": 62000
1427
+ },
1428
+ {
1429
+ "epoch": 1.1734499999999999,
1430
+ "eval_accuracy": 0.03338308002795394,
1431
+ "eval_loss": 5.687499046325684,
1432
+ "eval_runtime": 111.5862,
1433
+ "eval_samples_per_second": 32.19,
1434
+ "eval_steps_per_second": 2.016,
1435
+ "step": 62000
1436
+ },
1437
+ {
1438
+ "epoch": 1.17845,
1439
+ "grad_norm": 0.6761494874954224,
1440
+ "learning_rate": 1.87605e-05,
1441
+ "loss": 5.7669,
1442
+ "step": 62500
1443
+ },
1444
+ {
1445
+ "epoch": 1.1834500000000001,
1446
+ "grad_norm": 0.6467716693878174,
1447
+ "learning_rate": 1.8510500000000002e-05,
1448
+ "loss": 5.7434,
1449
+ "step": 63000
1450
+ },
1451
+ {
1452
+ "epoch": 1.1834500000000001,
1453
+ "eval_accuracy": 0.03302726053552262,
1454
+ "eval_loss": 5.680927276611328,
1455
+ "eval_runtime": 112.7327,
1456
+ "eval_samples_per_second": 31.863,
1457
+ "eval_steps_per_second": 1.996,
1458
+ "step": 63000
1459
+ },
1460
+ {
1461
+ "epoch": 1.18845,
1462
+ "grad_norm": 0.7104445695877075,
1463
+ "learning_rate": 1.82605e-05,
1464
+ "loss": 5.7506,
1465
+ "step": 63500
1466
+ },
1467
+ {
1468
+ "epoch": 1.19345,
1469
+ "grad_norm": 0.6914287805557251,
1470
+ "learning_rate": 1.8011e-05,
1471
+ "loss": 5.7477,
1472
+ "step": 64000
1473
+ },
1474
+ {
1475
+ "epoch": 1.19345,
1476
+ "eval_accuracy": 0.032914323564693565,
1477
+ "eval_loss": 5.668553829193115,
1478
+ "eval_runtime": 113.1627,
1479
+ "eval_samples_per_second": 31.742,
1480
+ "eval_steps_per_second": 1.988,
1481
+ "step": 64000
1482
+ },
1483
+ {
1484
+ "epoch": 1.19845,
1485
+ "grad_norm": 0.7188768982887268,
1486
+ "learning_rate": 1.7760999999999998e-05,
1487
+ "loss": 5.7329,
1488
+ "step": 64500
1489
+ },
1490
+ {
1491
+ "epoch": 1.20345,
1492
+ "grad_norm": 0.6479863524436951,
1493
+ "learning_rate": 1.7511e-05,
1494
+ "loss": 5.7409,
1495
+ "step": 65000
1496
+ },
1497
+ {
1498
+ "epoch": 1.20345,
1499
+ "eval_accuracy": 0.03304100346811748,
1500
+ "eval_loss": 5.662350654602051,
1501
+ "eval_runtime": 110.8399,
1502
+ "eval_samples_per_second": 32.407,
1503
+ "eval_steps_per_second": 2.03,
1504
+ "step": 65000
1505
+ },
1506
+ {
1507
+ "epoch": 1.20845,
1508
+ "grad_norm": 0.678534209728241,
1509
+ "learning_rate": 1.7261000000000003e-05,
1510
+ "loss": 5.7384,
1511
+ "step": 65500
1512
+ },
1513
+ {
1514
+ "epoch": 1.21345,
1515
+ "grad_norm": 0.8188093900680542,
1516
+ "learning_rate": 1.7011e-05,
1517
+ "loss": 5.737,
1518
+ "step": 66000
1519
+ },
1520
+ {
1521
+ "epoch": 1.21345,
1522
+ "eval_accuracy": 0.03385346931488896,
1523
+ "eval_loss": 5.675750732421875,
1524
+ "eval_runtime": 111.9995,
1525
+ "eval_samples_per_second": 32.072,
1526
+ "eval_steps_per_second": 2.009,
1527
+ "step": 66000
1528
+ },
1529
+ {
1530
+ "epoch": 1.21845,
1531
+ "grad_norm": 0.7928422689437866,
1532
+ "learning_rate": 1.6761e-05,
1533
+ "loss": 5.7299,
1534
+ "step": 66500
1535
+ },
1536
+ {
1537
+ "epoch": 1.22345,
1538
+ "grad_norm": 0.7139099836349487,
1539
+ "learning_rate": 1.6511e-05,
1540
+ "loss": 5.729,
1541
+ "step": 67000
1542
+ },
1543
+ {
1544
+ "epoch": 1.22345,
1545
+ "eval_accuracy": 0.032598372183651296,
1546
+ "eval_loss": 5.654570579528809,
1547
+ "eval_runtime": 111.7834,
1548
+ "eval_samples_per_second": 32.134,
1549
+ "eval_steps_per_second": 2.013,
1550
+ "step": 67000
1551
+ },
1552
+ {
1553
+ "epoch": 1.22845,
1554
+ "grad_norm": 0.6944179534912109,
1555
+ "learning_rate": 1.6261000000000002e-05,
1556
+ "loss": 5.7183,
1557
+ "step": 67500
1558
+ },
1559
+ {
1560
+ "epoch": 1.23345,
1561
+ "grad_norm": 0.7094106078147888,
1562
+ "learning_rate": 1.60115e-05,
1563
+ "loss": 5.7232,
1564
+ "step": 68000
1565
+ },
1566
+ {
1567
+ "epoch": 1.23345,
1568
+ "eval_accuracy": 0.032930243595521276,
1569
+ "eval_loss": 5.646746635437012,
1570
+ "eval_runtime": 117.6522,
1571
+ "eval_samples_per_second": 30.531,
1572
+ "eval_steps_per_second": 1.912,
1573
+ "step": 68000
1574
+ },
1575
+ {
1576
+ "epoch": 1.23845,
1577
+ "grad_norm": 0.6927877068519592,
1578
+ "learning_rate": 1.57615e-05,
1579
+ "loss": 5.7162,
1580
+ "step": 68500
1581
+ },
1582
+ {
1583
+ "epoch": 1.24345,
1584
+ "grad_norm": 0.7557797431945801,
1585
+ "learning_rate": 1.5511500000000002e-05,
1586
+ "loss": 5.7127,
1587
+ "step": 69000
1588
+ },
1589
+ {
1590
+ "epoch": 1.24345,
1591
+ "eval_accuracy": 0.032887245905422496,
1592
+ "eval_loss": 5.644942283630371,
1593
+ "eval_runtime": 111.9326,
1594
+ "eval_samples_per_second": 32.091,
1595
+ "eval_steps_per_second": 2.01,
1596
+ "step": 69000
1597
+ },
1598
+ {
1599
+ "epoch": 1.24845,
1600
+ "grad_norm": 0.673581063747406,
1601
+ "learning_rate": 1.52615e-05,
1602
+ "loss": 5.7094,
1603
+ "step": 69500
1604
+ },
1605
+ {
1606
+ "epoch": 1.25345,
1607
+ "grad_norm": 0.6678490042686462,
1608
+ "learning_rate": 1.50115e-05,
1609
+ "loss": 5.7187,
1610
+ "step": 70000
1611
+ },
1612
+ {
1613
+ "epoch": 1.25345,
1614
+ "eval_accuracy": 0.03288316384623591,
1615
+ "eval_loss": 5.635218143463135,
1616
+ "eval_runtime": 116.4191,
1617
+ "eval_samples_per_second": 30.854,
1618
+ "eval_steps_per_second": 1.933,
1619
+ "step": 70000
1620
+ },
1621
+ {
1622
+ "epoch": 1.25845,
1623
+ "grad_norm": 0.7218269109725952,
1624
+ "learning_rate": 1.4761500000000001e-05,
1625
+ "loss": 5.7138,
1626
+ "step": 70500
1627
+ },
1628
+ {
1629
+ "epoch": 1.26345,
1630
+ "grad_norm": 0.7058696150779724,
1631
+ "learning_rate": 1.4512000000000001e-05,
1632
+ "loss": 5.717,
1633
+ "step": 71000
1634
+ },
1635
+ {
1636
+ "epoch": 1.26345,
1637
+ "eval_accuracy": 0.03260463134107074,
1638
+ "eval_loss": 5.626367568969727,
1639
+ "eval_runtime": 111.4646,
1640
+ "eval_samples_per_second": 32.225,
1641
+ "eval_steps_per_second": 2.019,
1642
+ "step": 71000
1643
+ },
1644
+ {
1645
+ "epoch": 1.26845,
1646
+ "grad_norm": 0.7154064774513245,
1647
+ "learning_rate": 1.4262e-05,
1648
+ "loss": 5.6955,
1649
+ "step": 71500
1650
+ },
1651
+ {
1652
+ "epoch": 1.27345,
1653
+ "grad_norm": 0.6907570362091064,
1654
+ "learning_rate": 1.4012e-05,
1655
+ "loss": 5.714,
1656
+ "step": 72000
1657
+ },
1658
+ {
1659
+ "epoch": 1.27345,
1660
+ "eval_accuracy": 0.03297637086432977,
1661
+ "eval_loss": 5.6219401359558105,
1662
+ "eval_runtime": 113.1231,
1663
+ "eval_samples_per_second": 31.753,
1664
+ "eval_steps_per_second": 1.989,
1665
+ "step": 72000
1666
+ },
1667
+ {
1668
+ "epoch": 1.2784499999999999,
1669
+ "grad_norm": 0.708003580570221,
1670
+ "learning_rate": 1.3762e-05,
1671
+ "loss": 5.6993,
1672
+ "step": 72500
1673
+ },
1674
+ {
1675
+ "epoch": 1.28345,
1676
+ "grad_norm": 0.8350797295570374,
1677
+ "learning_rate": 1.35125e-05,
1678
+ "loss": 5.7079,
1679
+ "step": 73000
1680
+ },
1681
+ {
1682
+ "epoch": 1.28345,
1683
+ "eval_accuracy": 0.032977323344806644,
1684
+ "eval_loss": 5.616916656494141,
1685
+ "eval_runtime": 112.8971,
1686
+ "eval_samples_per_second": 31.817,
1687
+ "eval_steps_per_second": 1.993,
1688
+ "step": 73000
1689
+ },
1690
+ {
1691
+ "epoch": 1.28845,
1692
+ "grad_norm": 0.6496825218200684,
1693
+ "learning_rate": 1.32625e-05,
1694
+ "loss": 5.7047,
1695
+ "step": 73500
1696
+ },
1697
+ {
1698
+ "epoch": 1.29345,
1699
+ "grad_norm": 0.7288230657577515,
1700
+ "learning_rate": 1.30125e-05,
1701
+ "loss": 5.7034,
1702
+ "step": 74000
1703
+ },
1704
+ {
1705
+ "epoch": 1.29345,
1706
+ "eval_accuracy": 0.032624225225166385,
1707
+ "eval_loss": 5.613090991973877,
1708
+ "eval_runtime": 112.7724,
1709
+ "eval_samples_per_second": 31.852,
1710
+ "eval_steps_per_second": 1.995,
1711
+ "step": 74000
1712
+ },
1713
+ {
1714
+ "epoch": 1.2984499999999999,
1715
+ "grad_norm": 0.6955094337463379,
1716
+ "learning_rate": 1.27625e-05,
1717
+ "loss": 5.6884,
1718
+ "step": 74500
1719
+ },
1720
+ {
1721
+ "epoch": 1.30345,
1722
+ "grad_norm": 0.6996705532073975,
1723
+ "learning_rate": 1.25125e-05,
1724
+ "loss": 5.6768,
1725
+ "step": 75000
1726
+ },
1727
+ {
1728
+ "epoch": 1.30345,
1729
+ "eval_accuracy": 0.03247931212404235,
1730
+ "eval_loss": 5.61249303817749,
1731
+ "eval_runtime": 113.0871,
1732
+ "eval_samples_per_second": 31.763,
1733
+ "eval_steps_per_second": 1.99,
1734
+ "step": 75000
1735
+ },
1736
+ {
1737
+ "epoch": 1.3084500000000001,
1738
+ "grad_norm": 0.7164750695228577,
1739
+ "learning_rate": 1.22625e-05,
1740
+ "loss": 5.685,
1741
+ "step": 75500
1742
+ },
1743
+ {
1744
+ "epoch": 1.31345,
1745
+ "grad_norm": 0.6893213391304016,
1746
+ "learning_rate": 1.20125e-05,
1747
+ "loss": 5.6955,
1748
+ "step": 76000
1749
+ },
1750
+ {
1751
+ "epoch": 1.31345,
1752
+ "eval_accuracy": 0.0328188033797273,
1753
+ "eval_loss": 5.6074957847595215,
1754
+ "eval_runtime": 114.2316,
1755
+ "eval_samples_per_second": 31.445,
1756
+ "eval_steps_per_second": 1.97,
1757
+ "step": 76000
1758
+ },
1759
+ {
1760
+ "epoch": 1.31845,
1761
+ "grad_norm": 0.7451682090759277,
1762
+ "learning_rate": 1.17625e-05,
1763
+ "loss": 5.6853,
1764
+ "step": 76500
1765
+ },
1766
+ {
1767
+ "epoch": 1.32345,
1768
+ "grad_norm": 0.7856729626655579,
1769
+ "learning_rate": 1.15125e-05,
1770
+ "loss": 5.6947,
1771
+ "step": 77000
1772
+ },
1773
+ {
1774
+ "epoch": 1.32345,
1775
+ "eval_accuracy": 0.032477407163088605,
1776
+ "eval_loss": 5.601708889007568,
1777
+ "eval_runtime": 111.6415,
1778
+ "eval_samples_per_second": 32.174,
1779
+ "eval_steps_per_second": 2.015,
1780
+ "step": 77000
1781
+ },
1782
+ {
1783
+ "epoch": 1.3284500000000001,
1784
+ "grad_norm": 0.7268499135971069,
1785
+ "learning_rate": 1.1262500000000001e-05,
1786
+ "loss": 5.6676,
1787
+ "step": 77500
1788
+ },
1789
+ {
1790
+ "epoch": 1.33345,
1791
+ "grad_norm": 0.6918110847473145,
1792
+ "learning_rate": 1.1013000000000001e-05,
1793
+ "loss": 5.7056,
1794
+ "step": 78000
1795
+ },
1796
+ {
1797
+ "epoch": 1.33345,
1798
+ "eval_accuracy": 0.032296980147041215,
1799
+ "eval_loss": 5.595623970031738,
1800
+ "eval_runtime": 110.9779,
1801
+ "eval_samples_per_second": 32.367,
1802
+ "eval_steps_per_second": 2.027,
1803
+ "step": 78000
1804
+ },
1805
+ {
1806
+ "epoch": 1.33845,
1807
+ "grad_norm": 0.7075666189193726,
1808
+ "learning_rate": 1.0763e-05,
1809
+ "loss": 5.6793,
1810
+ "step": 78500
1811
+ },
1812
+ {
1813
+ "epoch": 1.34345,
1814
+ "grad_norm": 0.7042005658149719,
1815
+ "learning_rate": 1.0513e-05,
1816
+ "loss": 5.6636,
1817
+ "step": 79000
1818
+ },
1819
+ {
1820
+ "epoch": 1.34345,
1821
+ "eval_accuracy": 0.03255183670892414,
1822
+ "eval_loss": 5.592087268829346,
1823
+ "eval_runtime": 111.6356,
1824
+ "eval_samples_per_second": 32.176,
1825
+ "eval_steps_per_second": 2.015,
1826
+ "step": 79000
1827
+ },
1828
+ {
1829
+ "epoch": 1.34845,
1830
+ "grad_norm": 0.7470083832740784,
1831
+ "learning_rate": 1.0263e-05,
1832
+ "loss": 5.6727,
1833
+ "step": 79500
1834
+ },
1835
+ {
1836
+ "epoch": 1.35345,
1837
+ "grad_norm": 0.7401617169380188,
1838
+ "learning_rate": 1.00135e-05,
1839
+ "loss": 5.6723,
1840
+ "step": 80000
1841
+ },
1842
+ {
1843
+ "epoch": 1.35345,
1844
+ "eval_accuracy": 0.03255442201307565,
1845
+ "eval_loss": 5.588088512420654,
1846
+ "eval_runtime": 112.4329,
1847
+ "eval_samples_per_second": 31.948,
1848
+ "eval_steps_per_second": 2.001,
1849
+ "step": 80000
1850
+ },
1851
+ {
1852
+ "epoch": 1.35845,
1853
+ "grad_norm": 0.7123810648918152,
1854
+ "learning_rate": 9.7635e-06,
1855
+ "loss": 5.6695,
1856
+ "step": 80500
1857
+ },
1858
+ {
1859
+ "epoch": 1.36345,
1860
+ "grad_norm": 0.7234753966331482,
1861
+ "learning_rate": 9.5135e-06,
1862
+ "loss": 5.659,
1863
+ "step": 81000
1864
+ },
1865
+ {
1866
+ "epoch": 1.36345,
1867
+ "eval_accuracy": 0.03243753905169955,
1868
+ "eval_loss": 5.5822529792785645,
1869
+ "eval_runtime": 115.5372,
1870
+ "eval_samples_per_second": 31.09,
1871
+ "eval_steps_per_second": 1.947,
1872
+ "step": 81000
1873
+ },
1874
+ {
1875
+ "epoch": 1.36845,
1876
+ "grad_norm": 0.7145719528198242,
1877
+ "learning_rate": 9.2635e-06,
1878
+ "loss": 5.6632,
1879
+ "step": 81500
1880
+ },
1881
+ {
1882
+ "epoch": 1.37345,
1883
+ "grad_norm": 0.7814493179321289,
1884
+ "learning_rate": 9.013500000000001e-06,
1885
+ "loss": 5.6729,
1886
+ "step": 82000
1887
+ },
1888
+ {
1889
+ "epoch": 1.37345,
1890
+ "eval_accuracy": 0.032616741449990966,
1891
+ "eval_loss": 5.579476833343506,
1892
+ "eval_runtime": 111.3925,
1893
+ "eval_samples_per_second": 32.246,
1894
+ "eval_steps_per_second": 2.02,
1895
+ "step": 82000
1896
+ },
1897
+ {
1898
+ "epoch": 1.37845,
1899
+ "grad_norm": 0.7196946144104004,
1900
+ "learning_rate": 8.7635e-06,
1901
+ "loss": 5.6638,
1902
+ "step": 82500
1903
+ },
1904
+ {
1905
+ "epoch": 1.38345,
1906
+ "grad_norm": 0.7334076762199402,
1907
+ "learning_rate": 8.514e-06,
1908
+ "loss": 5.6595,
1909
+ "step": 83000
1910
+ },
1911
+ {
1912
+ "epoch": 1.38345,
1913
+ "eval_accuracy": 0.03224581833856925,
1914
+ "eval_loss": 5.579442977905273,
1915
+ "eval_runtime": 111.9776,
1916
+ "eval_samples_per_second": 32.078,
1917
+ "eval_steps_per_second": 2.009,
1918
+ "step": 83000
1919
+ },
1920
+ {
1921
+ "epoch": 1.38845,
1922
+ "grad_norm": 0.797461748123169,
1923
+ "learning_rate": 8.264e-06,
1924
+ "loss": 5.66,
1925
+ "step": 83500
1926
+ },
1927
+ {
1928
+ "epoch": 1.39345,
1929
+ "grad_norm": 0.7179501056671143,
1930
+ "learning_rate": 8.014e-06,
1931
+ "loss": 5.6565,
1932
+ "step": 84000
1933
+ },
1934
+ {
1935
+ "epoch": 1.39345,
1936
+ "eval_accuracy": 0.032768049777174,
1937
+ "eval_loss": 5.575778961181641,
1938
+ "eval_runtime": 113.2284,
1939
+ "eval_samples_per_second": 31.723,
1940
+ "eval_steps_per_second": 1.987,
1941
+ "step": 84000
1942
+ },
1943
+ {
1944
+ "epoch": 1.39845,
1945
+ "grad_norm": 0.7444576025009155,
1946
+ "learning_rate": 7.764e-06,
1947
+ "loss": 5.6539,
1948
+ "step": 84500
1949
+ },
1950
+ {
1951
+ "epoch": 1.4034499999999999,
1952
+ "grad_norm": 0.681348979473114,
1953
+ "learning_rate": 7.514500000000001e-06,
1954
+ "loss": 5.6649,
1955
+ "step": 85000
1956
+ },
1957
+ {
1958
+ "epoch": 1.4034499999999999,
1959
+ "eval_accuracy": 0.03250598157739475,
1960
+ "eval_loss": 5.571649074554443,
1961
+ "eval_runtime": 110.7941,
1962
+ "eval_samples_per_second": 32.42,
1963
+ "eval_steps_per_second": 2.031,
1964
+ "step": 85000
1965
+ },
1966
+ {
1967
+ "epoch": 1.40845,
1968
+ "grad_norm": 0.6666921973228455,
1969
+ "learning_rate": 7.2645000000000005e-06,
1970
+ "loss": 5.6487,
1971
+ "step": 85500
1972
+ },
1973
+ {
1974
+ "epoch": 1.41345,
1975
+ "grad_norm": 0.7330692410469055,
1976
+ "learning_rate": 7.0145e-06,
1977
+ "loss": 5.6561,
1978
+ "step": 86000
1979
+ },
1980
+ {
1981
+ "epoch": 1.41345,
1982
+ "eval_accuracy": 0.032128118965355834,
1983
+ "eval_loss": 5.5695481300354,
1984
+ "eval_runtime": 113.9227,
1985
+ "eval_samples_per_second": 31.53,
1986
+ "eval_steps_per_second": 1.975,
1987
+ "step": 86000
1988
+ },
1989
+ {
1990
+ "epoch": 1.41845,
1991
+ "grad_norm": 0.7140607237815857,
1992
+ "learning_rate": 6.7645e-06,
1993
+ "loss": 5.6569,
1994
+ "step": 86500
1995
+ },
1996
+ {
1997
+ "epoch": 1.4234499999999999,
1998
+ "grad_norm": 0.695405125617981,
1999
+ "learning_rate": 6.5145e-06,
2000
+ "loss": 5.6405,
2001
+ "step": 87000
2002
+ },
2003
+ {
2004
+ "epoch": 1.4234499999999999,
2005
+ "eval_accuracy": 0.03226895000729328,
2006
+ "eval_loss": 5.565379619598389,
2007
+ "eval_runtime": 110.9885,
2008
+ "eval_samples_per_second": 32.364,
2009
+ "eval_steps_per_second": 2.027,
2010
+ "step": 87000
2011
+ },
2012
+ {
2013
+ "epoch": 1.42845,
2014
+ "grad_norm": 0.748406708240509,
2015
+ "learning_rate": 6.265e-06,
2016
+ "loss": 5.661,
2017
+ "step": 87500
2018
+ },
2019
+ {
2020
+ "epoch": 1.4334500000000001,
2021
+ "grad_norm": 0.7667502760887146,
2022
+ "learning_rate": 6.015000000000001e-06,
2023
+ "loss": 5.6482,
2024
+ "step": 88000
2025
+ },
2026
+ {
2027
+ "epoch": 1.4334500000000001,
2028
+ "eval_accuracy": 0.03212390083752969,
2029
+ "eval_loss": 5.562798023223877,
2030
+ "eval_runtime": 113.7863,
2031
+ "eval_samples_per_second": 31.568,
2032
+ "eval_steps_per_second": 1.977,
2033
+ "step": 88000
2034
+ },
2035
+ {
2036
+ "epoch": 1.43845,
2037
+ "grad_norm": 0.7025783658027649,
2038
+ "learning_rate": 5.765e-06,
2039
+ "loss": 5.6537,
2040
+ "step": 88500
2041
+ },
2042
+ {
2043
+ "epoch": 1.44345,
2044
+ "grad_norm": 0.7218544483184814,
2045
+ "learning_rate": 5.515e-06,
2046
+ "loss": 5.6425,
2047
+ "step": 89000
2048
+ },
2049
+ {
2050
+ "epoch": 1.44345,
2051
+ "eval_accuracy": 0.03228555038131876,
2052
+ "eval_loss": 5.562201023101807,
2053
+ "eval_runtime": 110.7556,
2054
+ "eval_samples_per_second": 32.432,
2055
+ "eval_steps_per_second": 2.032,
2056
+ "step": 89000
2057
+ },
2058
+ {
2059
+ "epoch": 2.0019,
2060
+ "grad_norm": 0.7732229232788086,
2061
+ "learning_rate": 5.265e-06,
2062
+ "loss": 5.6439,
2063
+ "step": 89500
2064
+ },
2065
+ {
2066
+ "epoch": 2.0069,
2067
+ "grad_norm": 0.7334641814231873,
2068
+ "learning_rate": 5.015e-06,
2069
+ "loss": 5.6379,
2070
+ "step": 90000
2071
+ },
2072
+ {
2073
+ "epoch": 2.0069,
2074
+ "eval_accuracy": 0.032302831098542,
2075
+ "eval_loss": 5.558170318603516,
2076
+ "eval_runtime": 110.8037,
2077
+ "eval_samples_per_second": 32.418,
2078
+ "eval_steps_per_second": 2.031,
2079
+ "step": 90000
2080
+ },
2081
+ {
2082
+ "epoch": 2.0119,
2083
+ "grad_norm": 0.7253163456916809,
2084
+ "learning_rate": 4.765e-06,
2085
+ "loss": 5.6416,
2086
+ "step": 90500
2087
+ },
2088
+ {
2089
+ "epoch": 2.0169,
2090
+ "grad_norm": 0.6948328614234924,
2091
+ "learning_rate": 4.515000000000001e-06,
2092
+ "loss": 5.6357,
2093
+ "step": 91000
2094
+ },
2095
+ {
2096
+ "epoch": 2.0169,
2097
+ "eval_accuracy": 0.03219384011825997,
2098
+ "eval_loss": 5.557282447814941,
2099
+ "eval_runtime": 112.1174,
2100
+ "eval_samples_per_second": 32.038,
2101
+ "eval_steps_per_second": 2.007,
2102
+ "step": 91000
2103
+ },
2104
+ {
2105
+ "epoch": 2.0219,
2106
+ "grad_norm": 0.8627682328224182,
2107
+ "learning_rate": 4.2655e-06,
2108
+ "loss": 5.6417,
2109
+ "step": 91500
2110
+ },
2111
+ {
2112
+ "epoch": 2.0269,
2113
+ "grad_norm": 0.6853375434875488,
2114
+ "learning_rate": 4.015500000000001e-06,
2115
+ "loss": 5.6381,
2116
+ "step": 92000
2117
+ },
2118
+ {
2119
+ "epoch": 2.0269,
2120
+ "eval_accuracy": 0.032045117095228455,
2121
+ "eval_loss": 5.556839942932129,
2122
+ "eval_runtime": 112.6404,
2123
+ "eval_samples_per_second": 31.889,
2124
+ "eval_steps_per_second": 1.998,
2125
+ "step": 92000
2126
+ },
2127
+ {
2128
+ "epoch": 2.0319,
2129
+ "grad_norm": 0.7689598798751831,
2130
+ "learning_rate": 3.7655e-06,
2131
+ "loss": 5.6349,
2132
+ "step": 92500
2133
+ },
2134
+ {
2135
+ "epoch": 2.0369,
2136
+ "grad_norm": 0.714249849319458,
2137
+ "learning_rate": 3.516e-06,
2138
+ "loss": 5.6427,
2139
+ "step": 93000
2140
+ },
2141
+ {
2142
+ "epoch": 2.0369,
2143
+ "eval_accuracy": 0.032379982017168595,
2144
+ "eval_loss": 5.5526018142700195,
2145
+ "eval_runtime": 114.2105,
2146
+ "eval_samples_per_second": 31.451,
2147
+ "eval_steps_per_second": 1.97,
2148
+ "step": 93000
2149
+ },
2150
+ {
2151
+ "epoch": 2.0419,
2152
+ "grad_norm": 0.7157047390937805,
2153
+ "learning_rate": 3.266e-06,
2154
+ "loss": 5.6238,
2155
+ "step": 93500
2156
+ },
2157
+ {
2158
+ "epoch": 2.0469,
2159
+ "grad_norm": 0.702506959438324,
2160
+ "learning_rate": 3.016e-06,
2161
+ "loss": 5.6364,
2162
+ "step": 94000
2163
+ },
2164
+ {
2165
+ "epoch": 2.0469,
2166
+ "eval_accuracy": 0.03230364751037931,
2167
+ "eval_loss": 5.55258321762085,
2168
+ "eval_runtime": 113.8093,
2169
+ "eval_samples_per_second": 31.562,
2170
+ "eval_steps_per_second": 1.977,
2171
+ "step": 94000
2172
+ },
2173
+ {
2174
+ "epoch": 2.0519,
2175
+ "grad_norm": 0.7739561200141907,
2176
+ "learning_rate": 2.7660000000000003e-06,
2177
+ "loss": 5.6289,
2178
+ "step": 94500
2179
+ },
2180
+ {
2181
+ "epoch": 2.0569,
2182
+ "grad_norm": 0.700249195098877,
2183
+ "learning_rate": 2.516e-06,
2184
+ "loss": 5.626,
2185
+ "step": 95000
2186
+ },
2187
+ {
2188
+ "epoch": 2.0569,
2189
+ "eval_accuracy": 0.032115600650516954,
2190
+ "eval_loss": 5.550052642822266,
2191
+ "eval_runtime": 112.9135,
2192
+ "eval_samples_per_second": 31.812,
2193
+ "eval_steps_per_second": 1.993,
2194
+ "step": 95000
2195
+ },
2196
+ {
2197
+ "epoch": 2.0619,
2198
+ "grad_norm": 0.694694995880127,
2199
+ "learning_rate": 2.266e-06,
2200
+ "loss": 5.6414,
2201
+ "step": 95500
2202
+ },
2203
+ {
2204
+ "epoch": 2.0669,
2205
+ "grad_norm": 0.8426607251167297,
2206
+ "learning_rate": 2.0165e-06,
2207
+ "loss": 5.636,
2208
+ "step": 96000
2209
+ },
2210
+ {
2211
+ "epoch": 2.0669,
2212
+ "eval_accuracy": 0.032368688320085694,
2213
+ "eval_loss": 5.549187183380127,
2214
+ "eval_runtime": 111.6948,
2215
+ "eval_samples_per_second": 32.159,
2216
+ "eval_steps_per_second": 2.014,
2217
+ "step": 96000
2218
+ },
2219
+ {
2220
+ "epoch": 2.0719,
2221
+ "grad_norm": 0.7448583841323853,
2222
+ "learning_rate": 1.7665000000000002e-06,
2223
+ "loss": 5.633,
2224
+ "step": 96500
2225
+ },
2226
+ {
2227
+ "epoch": 2.0769,
2228
+ "grad_norm": 0.7047140002250671,
2229
+ "learning_rate": 1.5165e-06,
2230
+ "loss": 5.632,
2231
+ "step": 97000
2232
+ },
2233
+ {
2234
+ "epoch": 2.0769,
2235
+ "eval_accuracy": 0.032333582611081,
2236
+ "eval_loss": 5.548858165740967,
2237
+ "eval_runtime": 111.3384,
2238
+ "eval_samples_per_second": 32.262,
2239
+ "eval_steps_per_second": 2.021,
2240
+ "step": 97000
2241
+ },
2242
+ {
2243
+ "epoch": 2.0819,
2244
+ "grad_norm": 0.7185714244842529,
2245
+ "learning_rate": 1.2665e-06,
2246
+ "loss": 5.6218,
2247
+ "step": 97500
2248
+ },
2249
+ {
2250
+ "epoch": 2.0869,
2251
+ "grad_norm": 0.906790018081665,
2252
+ "learning_rate": 1.0165000000000001e-06,
2253
+ "loss": 5.6133,
2254
+ "step": 98000
2255
+ },
2256
+ {
2257
+ "epoch": 2.0869,
2258
+ "eval_accuracy": 0.03233031696373172,
2259
+ "eval_loss": 5.547926425933838,
2260
+ "eval_runtime": 112.1516,
2261
+ "eval_samples_per_second": 32.028,
2262
+ "eval_steps_per_second": 2.006,
2263
+ "step": 98000
2264
+ },
2265
+ {
2266
+ "epoch": 2.0919,
2267
+ "grad_norm": 0.7219076752662659,
2268
+ "learning_rate": 7.67e-07,
2269
+ "loss": 5.6305,
2270
+ "step": 98500
2271
+ },
2272
+ {
2273
+ "epoch": 2.0969,
2274
+ "grad_norm": 0.7102829217910767,
2275
+ "learning_rate": 5.17e-07,
2276
+ "loss": 5.6291,
2277
+ "step": 99000
2278
+ },
2279
+ {
2280
+ "epoch": 2.0969,
2281
+ "eval_accuracy": 0.032251125015511826,
2282
+ "eval_loss": 5.54772424697876,
2283
+ "eval_runtime": 113.02,
2284
+ "eval_samples_per_second": 31.782,
2285
+ "eval_steps_per_second": 1.991,
2286
+ "step": 99000
2287
+ },
2288
+ {
2289
+ "epoch": 2.1019,
2290
+ "grad_norm": 0.7246349453926086,
2291
+ "learning_rate": 2.67e-07,
2292
+ "loss": 5.6351,
2293
+ "step": 99500
2294
+ },
2295
+ {
2296
+ "epoch": 2.1069,
2297
+ "grad_norm": 0.6908907294273376,
2298
+ "learning_rate": 1.7000000000000003e-08,
2299
+ "loss": 5.6271,
2300
+ "step": 100000
2301
+ },
2302
+ {
2303
+ "epoch": 2.1069,
2304
+ "eval_accuracy": 0.0322300343763811,
2305
+ "eval_loss": 5.546974182128906,
2306
+ "eval_runtime": 114.2737,
2307
+ "eval_samples_per_second": 31.433,
2308
+ "eval_steps_per_second": 1.969,
2309
+ "step": 100000
2310
+ },
2311
+ {
2312
+ "epoch": 1.00001,
2313
+ "step": 100001,
2314
+ "total_flos": 9.182126159167488e+17,
2315
+ "train_loss": 5.625401986489168e-05,
2316
+ "train_runtime": 26.8473,
2317
+ "train_samples_per_second": 59596.412,
2318
+ "train_steps_per_second": 3724.776
2319
+ }
2320
+ ],
2321
+ "logging_steps": 500,
2322
+ "max_steps": 100000,
2323
+ "num_input_tokens_seen": 0,
2324
+ "num_train_epochs": 9223372036854775807,
2325
+ "save_steps": 500,
2326
+ "stateful_callbacks": {
2327
+ "TrainerControl": {
2328
+ "args": {
2329
+ "should_epoch_stop": false,
2330
+ "should_evaluate": false,
2331
+ "should_log": false,
2332
+ "should_save": true,
2333
+ "should_training_stop": true
2334
+ },
2335
+ "attributes": {}
2336
+ }
2337
+ },
2338
+ "total_flos": 9.182126159167488e+17,
2339
+ "train_batch_size": 8,
2340
+ "trial_name": null,
2341
+ "trial_params": null
2342
+ }