yuccaaa commited on
Commit
82f9d88
·
verified ·
1 Parent(s): 68691b6

Upload BIO/pretrain_output/qwen2.5-7b-instruct-bio/bio_all/v0-20250608-015630/checkpoint-1300/trainer_state.json with huggingface_hub

Browse files
BIO/pretrain_output/qwen2.5-7b-instruct-bio/bio_all/v0-20250608-015630/checkpoint-1300/trainer_state.json ADDED
@@ -0,0 +1,2878 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1300,
3
+ "best_metric": 1.61236322,
4
+ "best_model_checkpoint": "/oss/wangyujia/BIO/pretrain_output/qwen2.5-7b-instruct-bio/bio_all/v0-20250608-015630/checkpoint-1300",
5
+ "epoch": 0.974878140232471,
6
+ "eval_steps": 50,
7
+ "global_step": 1300,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.0007499062617172854,
14
+ "grad_norm": 2.5990765920410994,
15
+ "learning_rate": 1.4925373134328358e-07,
16
+ "loss": 1.9313178062438965,
17
+ "memory(GiB)": 33.98,
18
+ "step": 1,
19
+ "token_acc": 0.4522775684089792,
20
+ "train_speed(iter/s)": 0.035384
21
+ },
22
+ {
23
+ "epoch": 0.0037495313085864268,
24
+ "grad_norm": 2.464822528919476,
25
+ "learning_rate": 7.462686567164179e-07,
26
+ "loss": 1.9278267621994019,
27
+ "memory(GiB)": 47.91,
28
+ "step": 5,
29
+ "token_acc": 0.52122773204604,
30
+ "train_speed(iter/s)": 0.062923
31
+ },
32
+ {
33
+ "epoch": 0.0074990626171728535,
34
+ "grad_norm": 1.7325416016205337,
35
+ "learning_rate": 1.4925373134328358e-06,
36
+ "loss": 1.8955060958862304,
37
+ "memory(GiB)": 47.91,
38
+ "step": 10,
39
+ "token_acc": 0.5187258813526064,
40
+ "train_speed(iter/s)": 0.069591
41
+ },
42
+ {
43
+ "epoch": 0.01124859392575928,
44
+ "grad_norm": 1.7497313992660668,
45
+ "learning_rate": 2.238805970149254e-06,
46
+ "loss": 1.9058597564697266,
47
+ "memory(GiB)": 47.91,
48
+ "step": 15,
49
+ "token_acc": 0.5233299897068754,
50
+ "train_speed(iter/s)": 0.072156
51
+ },
52
+ {
53
+ "epoch": 0.014998125234345707,
54
+ "grad_norm": 2.2036202707744037,
55
+ "learning_rate": 2.9850746268656716e-06,
56
+ "loss": 1.8909317016601563,
57
+ "memory(GiB)": 47.91,
58
+ "step": 20,
59
+ "token_acc": 0.5260313022881105,
60
+ "train_speed(iter/s)": 0.073089
61
+ },
62
+ {
63
+ "epoch": 0.018747656542932135,
64
+ "grad_norm": 0.940413680068897,
65
+ "learning_rate": 3.73134328358209e-06,
66
+ "loss": 1.8793617248535157,
67
+ "memory(GiB)": 47.91,
68
+ "step": 25,
69
+ "token_acc": 0.5194384134524312,
70
+ "train_speed(iter/s)": 0.074012
71
+ },
72
+ {
73
+ "epoch": 0.02249718785151856,
74
+ "grad_norm": 0.8391713031013472,
75
+ "learning_rate": 4.477611940298508e-06,
76
+ "loss": 1.8508609771728515,
77
+ "memory(GiB)": 47.91,
78
+ "step": 30,
79
+ "token_acc": 0.5043917156593213,
80
+ "train_speed(iter/s)": 0.074566
81
+ },
82
+ {
83
+ "epoch": 0.026246719160104987,
84
+ "grad_norm": 0.8841568952278445,
85
+ "learning_rate": 5.2238805970149255e-06,
86
+ "loss": 1.824697494506836,
87
+ "memory(GiB)": 47.91,
88
+ "step": 35,
89
+ "token_acc": 0.528556076112075,
90
+ "train_speed(iter/s)": 0.075038
91
+ },
92
+ {
93
+ "epoch": 0.029996250468691414,
94
+ "grad_norm": 0.8946312876696884,
95
+ "learning_rate": 5.970149253731343e-06,
96
+ "loss": 1.8307044982910157,
97
+ "memory(GiB)": 47.91,
98
+ "step": 40,
99
+ "token_acc": 0.5308448585032051,
100
+ "train_speed(iter/s)": 0.075325
101
+ },
102
+ {
103
+ "epoch": 0.03374578177727784,
104
+ "grad_norm": 0.7947638691192278,
105
+ "learning_rate": 6.7164179104477625e-06,
106
+ "loss": 1.816474723815918,
107
+ "memory(GiB)": 47.91,
108
+ "step": 45,
109
+ "token_acc": 0.5064540778037032,
110
+ "train_speed(iter/s)": 0.075663
111
+ },
112
+ {
113
+ "epoch": 0.03749531308586427,
114
+ "grad_norm": 0.9078284980783232,
115
+ "learning_rate": 7.46268656716418e-06,
116
+ "loss": 1.8235408782958984,
117
+ "memory(GiB)": 47.91,
118
+ "step": 50,
119
+ "token_acc": 0.5335809746292172,
120
+ "train_speed(iter/s)": 0.075798
121
+ },
122
+ {
123
+ "epoch": 0.03749531308586427,
124
+ "eval_loss": 1.8199940919876099,
125
+ "eval_runtime": 43.3327,
126
+ "eval_samples_per_second": 59.632,
127
+ "eval_steps_per_second": 1.246,
128
+ "eval_token_acc": 0.5329827582536721,
129
+ "step": 50
130
+ },
131
+ {
132
+ "epoch": 0.0412448443944507,
133
+ "grad_norm": 0.9483395054820509,
134
+ "learning_rate": 8.208955223880599e-06,
135
+ "loss": 1.820401382446289,
136
+ "memory(GiB)": 55.02,
137
+ "step": 55,
138
+ "token_acc": 0.5211876688973266,
139
+ "train_speed(iter/s)": 0.067429
140
+ },
141
+ {
142
+ "epoch": 0.04499437570303712,
143
+ "grad_norm": 1.068284349391847,
144
+ "learning_rate": 8.955223880597016e-06,
145
+ "loss": 1.8140316009521484,
146
+ "memory(GiB)": 55.02,
147
+ "step": 60,
148
+ "token_acc": 0.5256004431606853,
149
+ "train_speed(iter/s)": 0.068168
150
+ },
151
+ {
152
+ "epoch": 0.048743907011623545,
153
+ "grad_norm": 1.0374195399684023,
154
+ "learning_rate": 9.701492537313434e-06,
155
+ "loss": 1.7972326278686523,
156
+ "memory(GiB)": 55.02,
157
+ "step": 65,
158
+ "token_acc": 0.5301121869202227,
159
+ "train_speed(iter/s)": 0.068826
160
+ },
161
+ {
162
+ "epoch": 0.05249343832020997,
163
+ "grad_norm": 1.1664573975367183,
164
+ "learning_rate": 9.999861447984952e-06,
165
+ "loss": 1.8000951766967774,
166
+ "memory(GiB)": 55.02,
167
+ "step": 70,
168
+ "token_acc": 0.5479384871713907,
169
+ "train_speed(iter/s)": 0.069381
170
+ },
171
+ {
172
+ "epoch": 0.0562429696287964,
173
+ "grad_norm": 1.3747388495283737,
174
+ "learning_rate": 9.99901476903367e-06,
175
+ "loss": 1.7784345626831055,
176
+ "memory(GiB)": 55.02,
177
+ "step": 75,
178
+ "token_acc": 0.5400628212450028,
179
+ "train_speed(iter/s)": 0.069877
180
+ },
181
+ {
182
+ "epoch": 0.05999250093738283,
183
+ "grad_norm": 1.049733513434474,
184
+ "learning_rate": 9.997398514657146e-06,
185
+ "loss": 1.7863637924194335,
186
+ "memory(GiB)": 55.02,
187
+ "step": 80,
188
+ "token_acc": 0.543317168313022,
189
+ "train_speed(iter/s)": 0.070337
190
+ },
191
+ {
192
+ "epoch": 0.06374203224596925,
193
+ "grad_norm": 1.1176985642823551,
194
+ "learning_rate": 9.995012933670341e-06,
195
+ "loss": 1.8001277923583985,
196
+ "memory(GiB)": 55.02,
197
+ "step": 85,
198
+ "token_acc": 0.5290585589203952,
199
+ "train_speed(iter/s)": 0.070763
200
+ },
201
+ {
202
+ "epoch": 0.06749156355455568,
203
+ "grad_norm": 1.2869315249626965,
204
+ "learning_rate": 9.991858393322517e-06,
205
+ "loss": 1.8029617309570312,
206
+ "memory(GiB)": 55.02,
207
+ "step": 90,
208
+ "token_acc": 0.5462134374959176,
209
+ "train_speed(iter/s)": 0.071137
210
+ },
211
+ {
212
+ "epoch": 0.0712410948631421,
213
+ "grad_norm": 1.2836534355867217,
214
+ "learning_rate": 9.987935379240715e-06,
215
+ "loss": 1.8010761260986328,
216
+ "memory(GiB)": 55.02,
217
+ "step": 95,
218
+ "token_acc": 0.5229859052198638,
219
+ "train_speed(iter/s)": 0.071455
220
+ },
221
+ {
222
+ "epoch": 0.07499062617172854,
223
+ "grad_norm": 0.9256584210711153,
224
+ "learning_rate": 9.98324449535498e-06,
225
+ "loss": 1.7873167037963866,
226
+ "memory(GiB)": 55.02,
227
+ "step": 100,
228
+ "token_acc": 0.5686663397712219,
229
+ "train_speed(iter/s)": 0.071771
230
+ },
231
+ {
232
+ "epoch": 0.07499062617172854,
233
+ "eval_loss": 1.7935261726379395,
234
+ "eval_runtime": 42.5716,
235
+ "eval_samples_per_second": 60.698,
236
+ "eval_steps_per_second": 1.268,
237
+ "eval_token_acc": 0.5386120720851657,
238
+ "step": 100
239
+ },
240
+ {
241
+ "epoch": 0.07874015748031496,
242
+ "grad_norm": 0.9939350045074621,
243
+ "learning_rate": 9.977786463805399e-06,
244
+ "loss": 1.7998859405517578,
245
+ "memory(GiB)": 55.02,
246
+ "step": 105,
247
+ "token_acc": 0.5293935626767713,
248
+ "train_speed(iter/s)": 0.06774
249
+ },
250
+ {
251
+ "epoch": 0.0824896887889014,
252
+ "grad_norm": 1.0230340656701442,
253
+ "learning_rate": 9.97156212483093e-06,
254
+ "loss": 1.7763248443603517,
255
+ "memory(GiB)": 55.02,
256
+ "step": 110,
257
+ "token_acc": 0.5234773241685314,
258
+ "train_speed(iter/s)": 0.068148
259
+ },
260
+ {
261
+ "epoch": 0.08623922009748781,
262
+ "grad_norm": 0.7782735221046598,
263
+ "learning_rate": 9.964572436640046e-06,
264
+ "loss": 1.7829460144042968,
265
+ "memory(GiB)": 55.02,
266
+ "step": 115,
267
+ "token_acc": 0.536639456424291,
268
+ "train_speed(iter/s)": 0.068543
269
+ },
270
+ {
271
+ "epoch": 0.08998875140607424,
272
+ "grad_norm": 0.7126371800506376,
273
+ "learning_rate": 9.956818475263228e-06,
274
+ "loss": 1.7885452270507813,
275
+ "memory(GiB)": 55.02,
276
+ "step": 120,
277
+ "token_acc": 0.5467124795676686,
278
+ "train_speed(iter/s)": 0.068903
279
+ },
280
+ {
281
+ "epoch": 0.09373828271466067,
282
+ "grad_norm": 0.903212458549272,
283
+ "learning_rate": 9.948301434387308e-06,
284
+ "loss": 1.7893110275268556,
285
+ "memory(GiB)": 55.02,
286
+ "step": 125,
287
+ "token_acc": 0.5382898013748363,
288
+ "train_speed(iter/s)": 0.069222
289
+ },
290
+ {
291
+ "epoch": 0.09748781402324709,
292
+ "grad_norm": 0.9564978234402557,
293
+ "learning_rate": 9.939022625171723e-06,
294
+ "loss": 1.7851387023925782,
295
+ "memory(GiB)": 55.02,
296
+ "step": 130,
297
+ "token_acc": 0.5361465251716474,
298
+ "train_speed(iter/s)": 0.069535
299
+ },
300
+ {
301
+ "epoch": 0.10123734533183353,
302
+ "grad_norm": 0.9221350564014876,
303
+ "learning_rate": 9.928983476046643e-06,
304
+ "loss": 1.7801361083984375,
305
+ "memory(GiB)": 55.02,
306
+ "step": 135,
307
+ "token_acc": 0.5709161091447238,
308
+ "train_speed(iter/s)": 0.069829
309
+ },
310
+ {
311
+ "epoch": 0.10498687664041995,
312
+ "grad_norm": 0.6552383137988806,
313
+ "learning_rate": 9.918185532493095e-06,
314
+ "loss": 1.792304229736328,
315
+ "memory(GiB)": 55.02,
316
+ "step": 140,
317
+ "token_acc": 0.5654260190875665,
318
+ "train_speed(iter/s)": 0.070115
319
+ },
320
+ {
321
+ "epoch": 0.10873640794900638,
322
+ "grad_norm": 0.6178739968771222,
323
+ "learning_rate": 9.906630456805024e-06,
324
+ "loss": 1.7768011093139648,
325
+ "memory(GiB)": 55.02,
326
+ "step": 145,
327
+ "token_acc": 0.5628114715561824,
328
+ "train_speed(iter/s)": 0.070369
329
+ },
330
+ {
331
+ "epoch": 0.1124859392575928,
332
+ "grad_norm": 0.6956981375298377,
333
+ "learning_rate": 9.894320027833405e-06,
334
+ "loss": 1.791607666015625,
335
+ "memory(GiB)": 55.02,
336
+ "step": 150,
337
+ "token_acc": 0.5352548522917879,
338
+ "train_speed(iter/s)": 0.070604
339
+ },
340
+ {
341
+ "epoch": 0.1124859392575928,
342
+ "eval_loss": 1.7808756828308105,
343
+ "eval_runtime": 42.4149,
344
+ "eval_samples_per_second": 60.922,
345
+ "eval_steps_per_second": 1.273,
346
+ "eval_token_acc": 0.5413158372963355,
347
+ "step": 150
348
+ },
349
+ {
350
+ "epoch": 0.11623547056617922,
351
+ "grad_norm": 0.8022685967483424,
352
+ "learning_rate": 9.881256140712389e-06,
353
+ "loss": 1.8093143463134767,
354
+ "memory(GiB)": 55.02,
355
+ "step": 155,
356
+ "token_acc": 0.5188995435520638,
357
+ "train_speed(iter/s)": 0.068014
358
+ },
359
+ {
360
+ "epoch": 0.11998500187476566,
361
+ "grad_norm": 1.0203184952354634,
362
+ "learning_rate": 9.86744080656756e-06,
363
+ "loss": 1.7841522216796875,
364
+ "memory(GiB)": 55.02,
365
+ "step": 160,
366
+ "token_acc": 0.5357633114226129,
367
+ "train_speed(iter/s)": 0.06829
368
+ },
369
+ {
370
+ "epoch": 0.12373453318335208,
371
+ "grad_norm": 0.9081018676373969,
372
+ "learning_rate": 9.852876152206325e-06,
373
+ "loss": 1.7839433670043945,
374
+ "memory(GiB)": 55.02,
375
+ "step": 165,
376
+ "token_acc": 0.5591644401908787,
377
+ "train_speed(iter/s)": 0.068546
378
+ },
379
+ {
380
+ "epoch": 0.1274840644919385,
381
+ "grad_norm": 0.9907488698487368,
382
+ "learning_rate": 9.837564419790506e-06,
383
+ "loss": 1.768780517578125,
384
+ "memory(GiB)": 55.02,
385
+ "step": 170,
386
+ "token_acc": 0.5500457735733904,
387
+ "train_speed(iter/s)": 0.068794
388
+ },
389
+ {
390
+ "epoch": 0.13123359580052493,
391
+ "grad_norm": 1.405852122526834,
392
+ "learning_rate": 9.821507966491178e-06,
393
+ "loss": 1.7799259185791017,
394
+ "memory(GiB)": 55.02,
395
+ "step": 175,
396
+ "token_acc": 0.5457307133754187,
397
+ "train_speed(iter/s)": 0.069045
398
+ },
399
+ {
400
+ "epoch": 0.13498312710911137,
401
+ "grad_norm": 1.2404474949089965,
402
+ "learning_rate": 9.804709264125772e-06,
403
+ "loss": 1.7638275146484375,
404
+ "memory(GiB)": 55.02,
405
+ "step": 180,
406
+ "token_acc": 0.5435971411710707,
407
+ "train_speed(iter/s)": 0.06928
408
+ },
409
+ {
410
+ "epoch": 0.13873265841769777,
411
+ "grad_norm": 1.2017488538172039,
412
+ "learning_rate": 9.787170898777571e-06,
413
+ "loss": 1.7629226684570312,
414
+ "memory(GiB)": 55.02,
415
+ "step": 185,
416
+ "token_acc": 0.5377509092423444,
417
+ "train_speed(iter/s)": 0.069496
418
+ },
419
+ {
420
+ "epoch": 0.1424821897262842,
421
+ "grad_norm": 1.4587980981731037,
422
+ "learning_rate": 9.768895570397586e-06,
423
+ "loss": 1.782264518737793,
424
+ "memory(GiB)": 55.02,
425
+ "step": 190,
426
+ "token_acc": 0.5378942424306155,
427
+ "train_speed(iter/s)": 0.069699
428
+ },
429
+ {
430
+ "epoch": 0.14623172103487064,
431
+ "grad_norm": 0.9636603641214471,
432
+ "learning_rate": 9.749886092388907e-06,
433
+ "loss": 1.7805797576904296,
434
+ "memory(GiB)": 55.02,
435
+ "step": 195,
436
+ "token_acc": 0.5591106553685625,
437
+ "train_speed(iter/s)": 0.069901
438
+ },
439
+ {
440
+ "epoch": 0.14998125234345708,
441
+ "grad_norm": 0.8643693943688852,
442
+ "learning_rate": 9.7301453911736e-06,
443
+ "loss": 1.7657100677490234,
444
+ "memory(GiB)": 55.02,
445
+ "step": 200,
446
+ "token_acc": 0.572268859208869,
447
+ "train_speed(iter/s)": 0.070095
448
+ },
449
+ {
450
+ "epoch": 0.14998125234345708,
451
+ "eval_loss": 1.7679647207260132,
452
+ "eval_runtime": 42.6698,
453
+ "eval_samples_per_second": 60.558,
454
+ "eval_steps_per_second": 1.266,
455
+ "eval_token_acc": 0.5451552593969315,
456
+ "step": 200
457
+ },
458
+ {
459
+ "epoch": 0.15373078365204348,
460
+ "grad_norm": 0.8657531020540058,
461
+ "learning_rate": 9.709676505742194e-06,
462
+ "loss": 1.7681110382080079,
463
+ "memory(GiB)": 55.02,
464
+ "step": 205,
465
+ "token_acc": 0.539625558780686,
466
+ "train_speed(iter/s)": 0.068103
467
+ },
468
+ {
469
+ "epoch": 0.15748031496062992,
470
+ "grad_norm": 0.7555909763046894,
471
+ "learning_rate": 9.688482587185839e-06,
472
+ "loss": 1.7621929168701171,
473
+ "memory(GiB)": 55.02,
474
+ "step": 210,
475
+ "token_acc": 0.5633362247982546,
476
+ "train_speed(iter/s)": 0.068316
477
+ },
478
+ {
479
+ "epoch": 0.16122984626921635,
480
+ "grad_norm": 0.8576972485284384,
481
+ "learning_rate": 9.666566898211219e-06,
482
+ "loss": 1.7515193939208984,
483
+ "memory(GiB)": 55.02,
484
+ "step": 215,
485
+ "token_acc": 0.5416328755467514,
486
+ "train_speed(iter/s)": 0.068518
487
+ },
488
+ {
489
+ "epoch": 0.1649793775778028,
490
+ "grad_norm": 0.8321394026566008,
491
+ "learning_rate": 9.64393281263826e-06,
492
+ "loss": 1.760385513305664,
493
+ "memory(GiB)": 55.02,
494
+ "step": 220,
495
+ "token_acc": 0.5313785935489345,
496
+ "train_speed(iter/s)": 0.068709
497
+ },
498
+ {
499
+ "epoch": 0.1687289088863892,
500
+ "grad_norm": 1.1758097242819285,
501
+ "learning_rate": 9.620583814880763e-06,
502
+ "loss": 1.7725101470947267,
503
+ "memory(GiB)": 55.02,
504
+ "step": 225,
505
+ "token_acc": 0.5523288911890283,
506
+ "train_speed(iter/s)": 0.068896
507
+ },
508
+ {
509
+ "epoch": 0.17247844019497563,
510
+ "grad_norm": 1.1624264053744016,
511
+ "learning_rate": 9.59652349940998e-06,
512
+ "loss": 1.7543354034423828,
513
+ "memory(GiB)": 55.02,
514
+ "step": 230,
515
+ "token_acc": 0.5785625028925812,
516
+ "train_speed(iter/s)": 0.069071
517
+ },
518
+ {
519
+ "epoch": 0.17622797150356206,
520
+ "grad_norm": 0.7310036085359608,
521
+ "learning_rate": 9.571755570201266e-06,
522
+ "loss": 1.7507953643798828,
523
+ "memory(GiB)": 55.02,
524
+ "step": 235,
525
+ "token_acc": 0.5411558669001751,
526
+ "train_speed(iter/s)": 0.069246
527
+ },
528
+ {
529
+ "epoch": 0.17997750281214847,
530
+ "grad_norm": 0.8077849501412758,
531
+ "learning_rate": 9.54628384016387e-06,
532
+ "loss": 1.7498720169067383,
533
+ "memory(GiB)": 55.02,
534
+ "step": 240,
535
+ "token_acc": 0.5107944339998463,
536
+ "train_speed(iter/s)": 0.069418
537
+ },
538
+ {
539
+ "epoch": 0.1837270341207349,
540
+ "grad_norm": 0.9225839356980445,
541
+ "learning_rate": 9.520112230553959e-06,
542
+ "loss": 1.7527772903442382,
543
+ "memory(GiB)": 55.02,
544
+ "step": 245,
545
+ "token_acc": 0.5650339768500239,
546
+ "train_speed(iter/s)": 0.069581
547
+ },
548
+ {
549
+ "epoch": 0.18747656542932134,
550
+ "grad_norm": 0.9228761761744316,
551
+ "learning_rate": 9.493244770370947e-06,
552
+ "loss": 1.739366340637207,
553
+ "memory(GiB)": 55.02,
554
+ "step": 250,
555
+ "token_acc": 0.5212063259548612,
556
+ "train_speed(iter/s)": 0.069727
557
+ },
558
+ {
559
+ "epoch": 0.18747656542932134,
560
+ "eval_loss": 1.7509726285934448,
561
+ "eval_runtime": 42.5093,
562
+ "eval_samples_per_second": 60.787,
563
+ "eval_steps_per_second": 1.27,
564
+ "eval_token_acc": 0.5502310060336623,
565
+ "step": 250
566
+ },
567
+ {
568
+ "epoch": 0.19122609673790777,
569
+ "grad_norm": 0.753239532675668,
570
+ "learning_rate": 9.465685595737263e-06,
571
+ "loss": 1.743215560913086,
572
+ "memory(GiB)": 56.25,
573
+ "step": 255,
574
+ "token_acc": 0.5399775807972443,
575
+ "train_speed(iter/s)": 0.068194
576
+ },
577
+ {
578
+ "epoch": 0.19497562804649418,
579
+ "grad_norm": 0.8488311405689892,
580
+ "learning_rate": 9.437438949261602e-06,
581
+ "loss": 1.7420181274414062,
582
+ "memory(GiB)": 56.25,
583
+ "step": 260,
584
+ "token_acc": 0.5663597161966146,
585
+ "train_speed(iter/s)": 0.068357
586
+ },
587
+ {
588
+ "epoch": 0.19872515935508062,
589
+ "grad_norm": 0.6800443905154068,
590
+ "learning_rate": 9.408509179385806e-06,
591
+ "loss": 1.7466453552246093,
592
+ "memory(GiB)": 56.25,
593
+ "step": 265,
594
+ "token_acc": 0.5432435786077677,
595
+ "train_speed(iter/s)": 0.068528
596
+ },
597
+ {
598
+ "epoch": 0.20247469066366705,
599
+ "grad_norm": 0.87203145697415,
600
+ "learning_rate": 9.378900739715429e-06,
601
+ "loss": 1.7354595184326171,
602
+ "memory(GiB)": 56.25,
603
+ "step": 270,
604
+ "token_acc": 0.5630195594992452,
605
+ "train_speed(iter/s)": 0.068693
606
+ },
607
+ {
608
+ "epoch": 0.20622422197225346,
609
+ "grad_norm": 0.8095685158538298,
610
+ "learning_rate": 9.348618188334135e-06,
611
+ "loss": 1.740234375,
612
+ "memory(GiB)": 56.25,
613
+ "step": 275,
614
+ "token_acc": 0.5514069894056335,
615
+ "train_speed(iter/s)": 0.068842
616
+ },
617
+ {
618
+ "epoch": 0.2099737532808399,
619
+ "grad_norm": 1.00206570868519,
620
+ "learning_rate": 9.317666187101996e-06,
621
+ "loss": 1.7255264282226563,
622
+ "memory(GiB)": 56.25,
623
+ "step": 280,
624
+ "token_acc": 0.5434367055763205,
625
+ "train_speed(iter/s)": 0.068989
626
+ },
627
+ {
628
+ "epoch": 0.21372328458942633,
629
+ "grad_norm": 0.7742674981350653,
630
+ "learning_rate": 9.286049500937826e-06,
631
+ "loss": 1.7388019561767578,
632
+ "memory(GiB)": 56.25,
633
+ "step": 285,
634
+ "token_acc": 0.5465616294659914,
635
+ "train_speed(iter/s)": 0.069143
636
+ },
637
+ {
638
+ "epoch": 0.21747281589801276,
639
+ "grad_norm": 0.7030518360692997,
640
+ "learning_rate": 9.253772997085635e-06,
641
+ "loss": 1.7331905364990234,
642
+ "memory(GiB)": 56.25,
643
+ "step": 290,
644
+ "token_acc": 0.5593017630617072,
645
+ "train_speed(iter/s)": 0.069286
646
+ },
647
+ {
648
+ "epoch": 0.22122234720659917,
649
+ "grad_norm": 0.8016063917603425,
650
+ "learning_rate": 9.220841644365343e-06,
651
+ "loss": 1.7626869201660156,
652
+ "memory(GiB)": 56.25,
653
+ "step": 295,
654
+ "token_acc": 0.5474898329303144,
655
+ "train_speed(iter/s)": 0.069424
656
+ },
657
+ {
658
+ "epoch": 0.2249718785151856,
659
+ "grad_norm": 0.7507835036035847,
660
+ "learning_rate": 9.18726051240786e-06,
661
+ "loss": 1.7293975830078125,
662
+ "memory(GiB)": 56.25,
663
+ "step": 300,
664
+ "token_acc": 0.5329011645735721,
665
+ "train_speed(iter/s)": 0.069556
666
+ },
667
+ {
668
+ "epoch": 0.2249718785151856,
669
+ "eval_loss": 1.7327964305877686,
670
+ "eval_runtime": 42.3753,
671
+ "eval_samples_per_second": 60.979,
672
+ "eval_steps_per_second": 1.274,
673
+ "eval_token_acc": 0.5556736652702177,
674
+ "step": 300
675
+ },
676
+ {
677
+ "epoch": 0.22872140982377204,
678
+ "grad_norm": 0.6548926760419295,
679
+ "learning_rate": 9.15303477087463e-06,
680
+ "loss": 1.7221017837524415,
681
+ "memory(GiB)": 56.25,
682
+ "step": 305,
683
+ "token_acc": 0.5494975649750837,
684
+ "train_speed(iter/s)": 0.068232
685
+ },
686
+ {
687
+ "epoch": 0.23247094113235844,
688
+ "grad_norm": 0.6751231265805964,
689
+ "learning_rate": 9.118169688661785e-06,
690
+ "loss": 1.7264883041381835,
691
+ "memory(GiB)": 56.25,
692
+ "step": 310,
693
+ "token_acc": 0.5448713201882233,
694
+ "train_speed(iter/s)": 0.068365
695
+ },
696
+ {
697
+ "epoch": 0.23622047244094488,
698
+ "grad_norm": 0.605793575125745,
699
+ "learning_rate": 9.082670633089028e-06,
700
+ "loss": 1.73472900390625,
701
+ "memory(GiB)": 56.25,
702
+ "step": 315,
703
+ "token_acc": 0.5575434638870522,
704
+ "train_speed(iter/s)": 0.068493
705
+ },
706
+ {
707
+ "epoch": 0.2399700037495313,
708
+ "grad_norm": 0.7043926428771533,
709
+ "learning_rate": 9.046543069073361e-06,
710
+ "loss": 1.7091964721679687,
711
+ "memory(GiB)": 56.25,
712
+ "step": 320,
713
+ "token_acc": 0.5550820254340313,
714
+ "train_speed(iter/s)": 0.068631
715
+ },
716
+ {
717
+ "epoch": 0.24371953505811775,
718
+ "grad_norm": 0.9370678193137794,
719
+ "learning_rate": 9.009792558287777e-06,
720
+ "loss": 1.6914459228515626,
721
+ "memory(GiB)": 56.25,
722
+ "step": 325,
723
+ "token_acc": 0.5512165093950109,
724
+ "train_speed(iter/s)": 0.068756
725
+ },
726
+ {
727
+ "epoch": 0.24746906636670415,
728
+ "grad_norm": 0.731053265622514,
729
+ "learning_rate": 8.972424758305073e-06,
730
+ "loss": 1.7122673034667968,
731
+ "memory(GiB)": 56.25,
732
+ "step": 330,
733
+ "token_acc": 0.5968278226389376,
734
+ "train_speed(iter/s)": 0.068874
735
+ },
736
+ {
737
+ "epoch": 0.2512185976752906,
738
+ "grad_norm": 0.7366718319888682,
739
+ "learning_rate": 8.934445421726888e-06,
740
+ "loss": 1.7182106018066405,
741
+ "memory(GiB)": 56.25,
742
+ "step": 335,
743
+ "token_acc": 0.5372630837386368,
744
+ "train_speed(iter/s)": 0.068995
745
+ },
746
+ {
747
+ "epoch": 0.254968128983877,
748
+ "grad_norm": 0.7982735213900777,
749
+ "learning_rate": 8.895860395298121e-06,
750
+ "loss": 1.6983676910400392,
751
+ "memory(GiB)": 56.25,
752
+ "step": 340,
753
+ "token_acc": 0.5665421173385775,
754
+ "train_speed(iter/s)": 0.06911
755
+ },
756
+ {
757
+ "epoch": 0.25871766029246346,
758
+ "grad_norm": 0.6678550800092121,
759
+ "learning_rate": 8.85667561900685e-06,
760
+ "loss": 1.706036376953125,
761
+ "memory(GiB)": 56.25,
762
+ "step": 345,
763
+ "token_acc": 0.5582484033156679,
764
+ "train_speed(iter/s)": 0.069229
765
+ },
766
+ {
767
+ "epoch": 0.26246719160104987,
768
+ "grad_norm": 0.9786171582688546,
769
+ "learning_rate": 8.816897125169894e-06,
770
+ "loss": 1.7054430007934571,
771
+ "memory(GiB)": 56.25,
772
+ "step": 350,
773
+ "token_acc": 0.5696195618783663,
774
+ "train_speed(iter/s)": 0.069341
775
+ },
776
+ {
777
+ "epoch": 0.26246719160104987,
778
+ "eval_loss": 1.7168406248092651,
779
+ "eval_runtime": 42.5781,
780
+ "eval_samples_per_second": 60.688,
781
+ "eval_steps_per_second": 1.268,
782
+ "eval_token_acc": 0.5604134336360598,
783
+ "step": 350
784
+ },
785
+ {
786
+ "epoch": 0.26621672290963627,
787
+ "grad_norm": 0.6683654189193357,
788
+ "learning_rate": 8.77653103750417e-06,
789
+ "loss": 1.7113515853881835,
790
+ "memory(GiB)": 56.25,
791
+ "step": 355,
792
+ "token_acc": 0.5511189438910106,
793
+ "train_speed(iter/s)": 0.068103
794
+ },
795
+ {
796
+ "epoch": 0.26996625421822273,
797
+ "grad_norm": 0.5865077122513812,
798
+ "learning_rate": 8.735583570183974e-06,
799
+ "loss": 1.692765235900879,
800
+ "memory(GiB)": 56.25,
801
+ "step": 360,
802
+ "token_acc": 0.5464648214223854,
803
+ "train_speed(iter/s)": 0.068223
804
+ },
805
+ {
806
+ "epoch": 0.27371578552680914,
807
+ "grad_norm": 0.7402904205476919,
808
+ "learning_rate": 8.694061026884336e-06,
809
+ "loss": 1.7002754211425781,
810
+ "memory(GiB)": 56.25,
811
+ "step": 365,
812
+ "token_acc": 0.5705252525252525,
813
+ "train_speed(iter/s)": 0.068345
814
+ },
815
+ {
816
+ "epoch": 0.27746531683539555,
817
+ "grad_norm": 0.7571500537288366,
818
+ "learning_rate": 8.6519697998106e-06,
819
+ "loss": 1.7284061431884765,
820
+ "memory(GiB)": 56.25,
821
+ "step": 370,
822
+ "token_acc": 0.5748253023425441,
823
+ "train_speed(iter/s)": 0.068462
824
+ },
825
+ {
826
+ "epoch": 0.281214848143982,
827
+ "grad_norm": 0.6730846815952083,
828
+ "learning_rate": 8.609316368714371e-06,
829
+ "loss": 1.702937126159668,
830
+ "memory(GiB)": 56.25,
831
+ "step": 375,
832
+ "token_acc": 0.5256013175105557,
833
+ "train_speed(iter/s)": 0.068577
834
+ },
835
+ {
836
+ "epoch": 0.2849643794525684,
837
+ "grad_norm": 0.8462516862337176,
838
+ "learning_rate": 8.566107299895988e-06,
839
+ "loss": 1.7094646453857423,
840
+ "memory(GiB)": 56.25,
841
+ "step": 380,
842
+ "token_acc": 0.5898579849946409,
843
+ "train_speed(iter/s)": 0.068693
844
+ },
845
+ {
846
+ "epoch": 0.2887139107611549,
847
+ "grad_norm": 0.7482611525034915,
848
+ "learning_rate": 8.52234924519367e-06,
849
+ "loss": 1.7032199859619142,
850
+ "memory(GiB)": 56.25,
851
+ "step": 385,
852
+ "token_acc": 0.5420495124984953,
853
+ "train_speed(iter/s)": 0.068801
854
+ },
855
+ {
856
+ "epoch": 0.2924634420697413,
857
+ "grad_norm": 0.696958027931327,
858
+ "learning_rate": 8.478048940959503e-06,
859
+ "loss": 1.6919610977172852,
860
+ "memory(GiB)": 56.25,
861
+ "step": 390,
862
+ "token_acc": 0.5534309086430637,
863
+ "train_speed(iter/s)": 0.068909
864
+ },
865
+ {
866
+ "epoch": 0.2962129733783277,
867
+ "grad_norm": 0.5696305911831858,
868
+ "learning_rate": 8.433213207022404e-06,
869
+ "loss": 1.6833906173706055,
870
+ "memory(GiB)": 56.25,
871
+ "step": 395,
872
+ "token_acc": 0.5494928069022507,
873
+ "train_speed(iter/s)": 0.069018
874
+ },
875
+ {
876
+ "epoch": 0.29996250468691416,
877
+ "grad_norm": 1.0078405970909443,
878
+ "learning_rate": 8.387848945638235e-06,
879
+ "loss": 1.6984685897827148,
880
+ "memory(GiB)": 56.25,
881
+ "step": 400,
882
+ "token_acc": 0.5672347570979136,
883
+ "train_speed(iter/s)": 0.069122
884
+ },
885
+ {
886
+ "epoch": 0.29996250468691416,
887
+ "eval_loss": 1.7021883726119995,
888
+ "eval_runtime": 42.5994,
889
+ "eval_samples_per_second": 60.658,
890
+ "eval_steps_per_second": 1.268,
891
+ "eval_token_acc": 0.5648435441261425,
892
+ "step": 400
893
+ },
894
+ {
895
+ "epoch": 0.30371203599550056,
896
+ "grad_norm": 0.8681641024485509,
897
+ "learning_rate": 8.341963140427242e-06,
898
+ "loss": 1.7013641357421876,
899
+ "memory(GiB)": 56.25,
900
+ "step": 405,
901
+ "token_acc": 0.5476675242907254,
902
+ "train_speed(iter/s)": 0.068148
903
+ },
904
+ {
905
+ "epoch": 0.30746156730408697,
906
+ "grad_norm": 0.8528961417036697,
907
+ "learning_rate": 8.295562855298954e-06,
908
+ "loss": 1.6895477294921875,
909
+ "memory(GiB)": 56.25,
910
+ "step": 410,
911
+ "token_acc": 0.5703600854040318,
912
+ "train_speed(iter/s)": 0.068257
913
+ },
914
+ {
915
+ "epoch": 0.31121109861267343,
916
+ "grad_norm": 0.7966281406635305,
917
+ "learning_rate": 8.248655233364724e-06,
918
+ "loss": 1.703116226196289,
919
+ "memory(GiB)": 56.25,
920
+ "step": 415,
921
+ "token_acc": 0.5499540331642208,
922
+ "train_speed(iter/s)": 0.068363
923
+ },
924
+ {
925
+ "epoch": 0.31496062992125984,
926
+ "grad_norm": 0.7388959481441792,
927
+ "learning_rate": 8.201247495838087e-06,
928
+ "loss": 1.7138626098632812,
929
+ "memory(GiB)": 56.25,
930
+ "step": 420,
931
+ "token_acc": 0.5509550893931054,
932
+ "train_speed(iter/s)": 0.06846
933
+ },
934
+ {
935
+ "epoch": 0.31871016122984624,
936
+ "grad_norm": 0.6668812721055195,
937
+ "learning_rate": 8.153346940923076e-06,
938
+ "loss": 1.7163211822509765,
939
+ "memory(GiB)": 56.25,
940
+ "step": 425,
941
+ "token_acc": 0.5339552948740659,
942
+ "train_speed(iter/s)": 0.068558
943
+ },
944
+ {
945
+ "epoch": 0.3224596925384327,
946
+ "grad_norm": 0.8897674249190526,
947
+ "learning_rate": 8.104960942690709e-06,
948
+ "loss": 1.6957286834716796,
949
+ "memory(GiB)": 56.25,
950
+ "step": 430,
951
+ "token_acc": 0.5618721676102363,
952
+ "train_speed(iter/s)": 0.068655
953
+ },
954
+ {
955
+ "epoch": 0.3262092238470191,
956
+ "grad_norm": 0.6998717792447529,
957
+ "learning_rate": 8.056096949943777e-06,
958
+ "loss": 1.7086776733398437,
959
+ "memory(GiB)": 56.25,
960
+ "step": 435,
961
+ "token_acc": 0.5755440200763176,
962
+ "train_speed(iter/s)": 0.068745
963
+ },
964
+ {
965
+ "epoch": 0.3299587551556056,
966
+ "grad_norm": 0.8395556848202133,
967
+ "learning_rate": 8.006762485070138e-06,
968
+ "loss": 1.6877761840820313,
969
+ "memory(GiB)": 56.25,
970
+ "step": 440,
971
+ "token_acc": 0.574201770399201,
972
+ "train_speed(iter/s)": 0.068841
973
+ },
974
+ {
975
+ "epoch": 0.333708286464192,
976
+ "grad_norm": 0.7641827410573238,
977
+ "learning_rate": 7.956965142884678e-06,
978
+ "loss": 1.6771770477294923,
979
+ "memory(GiB)": 56.25,
980
+ "step": 445,
981
+ "token_acc": 0.5614385821706016,
982
+ "train_speed(iter/s)": 0.068935
983
+ },
984
+ {
985
+ "epoch": 0.3374578177727784,
986
+ "grad_norm": 0.7082113636629225,
987
+ "learning_rate": 7.906712589460124e-06,
988
+ "loss": 1.6755485534667969,
989
+ "memory(GiB)": 56.25,
990
+ "step": 450,
991
+ "token_acc": 0.5444774209358365,
992
+ "train_speed(iter/s)": 0.06903
993
+ },
994
+ {
995
+ "epoch": 0.3374578177727784,
996
+ "eval_loss": 1.6896531581878662,
997
+ "eval_runtime": 42.4554,
998
+ "eval_samples_per_second": 60.864,
999
+ "eval_steps_per_second": 1.272,
1000
+ "eval_token_acc": 0.5685763214385071,
1001
+ "step": 450
1002
+ },
1003
+ {
1004
+ "epoch": 0.34120734908136485,
1005
+ "grad_norm": 0.7807331567305439,
1006
+ "learning_rate": 7.85601256094689e-06,
1007
+ "loss": 1.6793529510498046,
1008
+ "memory(GiB)": 56.25,
1009
+ "step": 455,
1010
+ "token_acc": 0.5562552637973148,
1011
+ "train_speed(iter/s)": 0.068158
1012
+ },
1013
+ {
1014
+ "epoch": 0.34495688038995126,
1015
+ "grad_norm": 0.7642434347013445,
1016
+ "learning_rate": 7.804872862382132e-06,
1017
+ "loss": 1.7076757431030274,
1018
+ "memory(GiB)": 56.25,
1019
+ "step": 460,
1020
+ "token_acc": 0.5478764760999728,
1021
+ "train_speed(iter/s)": 0.068252
1022
+ },
1023
+ {
1024
+ "epoch": 0.34870641169853767,
1025
+ "grad_norm": 0.7018356558770414,
1026
+ "learning_rate": 7.753301366488187e-06,
1027
+ "loss": 1.6917272567749024,
1028
+ "memory(GiB)": 56.25,
1029
+ "step": 465,
1030
+ "token_acc": 0.6022955543080534,
1031
+ "train_speed(iter/s)": 0.068345
1032
+ },
1033
+ {
1034
+ "epoch": 0.35245594300712413,
1035
+ "grad_norm": 0.7600704177992731,
1036
+ "learning_rate": 7.701306012460627e-06,
1037
+ "loss": 1.6827001571655273,
1038
+ "memory(GiB)": 56.25,
1039
+ "step": 470,
1040
+ "token_acc": 0.5491566959780885,
1041
+ "train_speed(iter/s)": 0.068431
1042
+ },
1043
+ {
1044
+ "epoch": 0.35620547431571054,
1045
+ "grad_norm": 0.6673098312724823,
1046
+ "learning_rate": 7.648894804746031e-06,
1047
+ "loss": 1.6575685501098634,
1048
+ "memory(GiB)": 56.25,
1049
+ "step": 475,
1050
+ "token_acc": 0.5520260966055696,
1051
+ "train_speed(iter/s)": 0.068525
1052
+ },
1053
+ {
1054
+ "epoch": 0.35995500562429694,
1055
+ "grad_norm": 0.6492156502549102,
1056
+ "learning_rate": 7.596075811809753e-06,
1057
+ "loss": 1.6792295455932618,
1058
+ "memory(GiB)": 56.25,
1059
+ "step": 480,
1060
+ "token_acc": 0.5631662021022937,
1061
+ "train_speed(iter/s)": 0.068609
1062
+ },
1063
+ {
1064
+ "epoch": 0.3637045369328834,
1065
+ "grad_norm": 0.6958194870105822,
1066
+ "learning_rate": 7.542857164893816e-06,
1067
+ "loss": 1.7015853881835938,
1068
+ "memory(GiB)": 56.25,
1069
+ "step": 485,
1070
+ "token_acc": 0.5768759371536606,
1071
+ "train_speed(iter/s)": 0.06869
1072
+ },
1073
+ {
1074
+ "epoch": 0.3674540682414698,
1075
+ "grad_norm": 0.685535234176806,
1076
+ "learning_rate": 7.489247056765134e-06,
1077
+ "loss": 1.7032451629638672,
1078
+ "memory(GiB)": 56.25,
1079
+ "step": 490,
1080
+ "token_acc": 0.561515250858049,
1081
+ "train_speed(iter/s)": 0.068773
1082
+ },
1083
+ {
1084
+ "epoch": 0.3712035995500562,
1085
+ "grad_norm": 0.7726913261926227,
1086
+ "learning_rate": 7.4352537404542935e-06,
1087
+ "loss": 1.6790489196777343,
1088
+ "memory(GiB)": 56.25,
1089
+ "step": 495,
1090
+ "token_acc": 0.5535088912920483,
1091
+ "train_speed(iter/s)": 0.068855
1092
+ },
1093
+ {
1094
+ "epoch": 0.3749531308586427,
1095
+ "grad_norm": 0.6215971963792798,
1096
+ "learning_rate": 7.380885527985016e-06,
1097
+ "loss": 1.6753128051757813,
1098
+ "memory(GiB)": 56.25,
1099
+ "step": 500,
1100
+ "token_acc": 0.5804309499145921,
1101
+ "train_speed(iter/s)": 0.068939
1102
+ },
1103
+ {
1104
+ "epoch": 0.3749531308586427,
1105
+ "eval_loss": 1.6785035133361816,
1106
+ "eval_runtime": 42.5761,
1107
+ "eval_samples_per_second": 60.691,
1108
+ "eval_steps_per_second": 1.268,
1109
+ "eval_token_acc": 0.5717880388170251,
1110
+ "step": 500
1111
+ },
1112
+ {
1113
+ "epoch": 0.3787026621672291,
1114
+ "grad_norm": 0.7571930333892343,
1115
+ "learning_rate": 7.326150789094571e-06,
1116
+ "loss": 1.6797908782958983,
1117
+ "memory(GiB)": 56.25,
1118
+ "step": 505,
1119
+ "token_acc": 0.5666724708340588,
1120
+ "train_speed(iter/s)": 0.068153
1121
+ },
1122
+ {
1123
+ "epoch": 0.38245219347581555,
1124
+ "grad_norm": 0.5273195202991797,
1125
+ "learning_rate": 7.271057949945297e-06,
1126
+ "loss": 1.6877239227294922,
1127
+ "memory(GiB)": 56.25,
1128
+ "step": 510,
1129
+ "token_acc": 0.5795585438375612,
1130
+ "train_speed(iter/s)": 0.068239
1131
+ },
1132
+ {
1133
+ "epoch": 0.38620172478440196,
1134
+ "grad_norm": 0.644647069123782,
1135
+ "learning_rate": 7.2156154918274194e-06,
1136
+ "loss": 1.6607694625854492,
1137
+ "memory(GiB)": 56.25,
1138
+ "step": 515,
1139
+ "token_acc": 0.5639819575190945,
1140
+ "train_speed(iter/s)": 0.068326
1141
+ },
1142
+ {
1143
+ "epoch": 0.38995125609298836,
1144
+ "grad_norm": 0.8427621851278584,
1145
+ "learning_rate": 7.159831949853409e-06,
1146
+ "loss": 1.6399917602539062,
1147
+ "memory(GiB)": 56.25,
1148
+ "step": 520,
1149
+ "token_acc": 0.5831872255951778,
1150
+ "train_speed(iter/s)": 0.068407
1151
+ },
1152
+ {
1153
+ "epoch": 0.3937007874015748,
1154
+ "grad_norm": 0.7860592513933189,
1155
+ "learning_rate": 7.103715911644029e-06,
1156
+ "loss": 1.6666389465332032,
1157
+ "memory(GiB)": 56.25,
1158
+ "step": 525,
1159
+ "token_acc": 0.6023049439476434,
1160
+ "train_speed(iter/s)": 0.068489
1161
+ },
1162
+ {
1163
+ "epoch": 0.39745031871016123,
1164
+ "grad_norm": 0.8223721237706454,
1165
+ "learning_rate": 7.047276016006318e-06,
1166
+ "loss": 1.695573616027832,
1167
+ "memory(GiB)": 56.25,
1168
+ "step": 530,
1169
+ "token_acc": 0.5720906143643139,
1170
+ "train_speed(iter/s)": 0.068571
1171
+ },
1172
+ {
1173
+ "epoch": 0.40119985001874764,
1174
+ "grad_norm": 0.7877770309362587,
1175
+ "learning_rate": 6.990520951603682e-06,
1176
+ "loss": 1.6714542388916016,
1177
+ "memory(GiB)": 56.25,
1178
+ "step": 535,
1179
+ "token_acc": 0.5545731394354149,
1180
+ "train_speed(iter/s)": 0.068649
1181
+ },
1182
+ {
1183
+ "epoch": 0.4049493813273341,
1184
+ "grad_norm": 0.5849943208055685,
1185
+ "learning_rate": 6.933459455618312e-06,
1186
+ "loss": 1.663173294067383,
1187
+ "memory(GiB)": 56.25,
1188
+ "step": 540,
1189
+ "token_acc": 0.5916160291243944,
1190
+ "train_speed(iter/s)": 0.068729
1191
+ },
1192
+ {
1193
+ "epoch": 0.4086989126359205,
1194
+ "grad_norm": 0.6975030189627052,
1195
+ "learning_rate": 6.876100312406141e-06,
1196
+ "loss": 1.672979736328125,
1197
+ "memory(GiB)": 56.25,
1198
+ "step": 545,
1199
+ "token_acc": 0.5678786183750525,
1200
+ "train_speed(iter/s)": 0.068809
1201
+ },
1202
+ {
1203
+ "epoch": 0.4124484439445069,
1204
+ "grad_norm": 0.8832443309131802,
1205
+ "learning_rate": 6.818452352144527e-06,
1206
+ "loss": 1.663263702392578,
1207
+ "memory(GiB)": 56.25,
1208
+ "step": 550,
1209
+ "token_acc": 0.5777180525291963,
1210
+ "train_speed(iter/s)": 0.068885
1211
+ },
1212
+ {
1213
+ "epoch": 0.4124484439445069,
1214
+ "eval_loss": 1.669090747833252,
1215
+ "eval_runtime": 42.3533,
1216
+ "eval_samples_per_second": 61.011,
1217
+ "eval_steps_per_second": 1.275,
1218
+ "eval_token_acc": 0.5746502507096283,
1219
+ "step": 550
1220
+ },
1221
+ {
1222
+ "epoch": 0.4161979752530934,
1223
+ "grad_norm": 0.7101210397707838,
1224
+ "learning_rate": 6.760524449472889e-06,
1225
+ "loss": 1.653615951538086,
1226
+ "memory(GiB)": 56.25,
1227
+ "step": 555,
1228
+ "token_acc": 0.563550813623823,
1229
+ "train_speed(iter/s)": 0.068165
1230
+ },
1231
+ {
1232
+ "epoch": 0.4199475065616798,
1233
+ "grad_norm": 0.7092912767199613,
1234
+ "learning_rate": 6.702325522126503e-06,
1235
+ "loss": 1.663838768005371,
1236
+ "memory(GiB)": 56.25,
1237
+ "step": 560,
1238
+ "token_acc": 0.5644253522152244,
1239
+ "train_speed(iter/s)": 0.068242
1240
+ },
1241
+ {
1242
+ "epoch": 0.4236970378702662,
1243
+ "grad_norm": 0.8492548795119265,
1244
+ "learning_rate": 6.643864529563644e-06,
1245
+ "loss": 1.670602798461914,
1246
+ "memory(GiB)": 56.25,
1247
+ "step": 565,
1248
+ "token_acc": 0.5895219375017935,
1249
+ "train_speed(iter/s)": 0.068317
1250
+ },
1251
+ {
1252
+ "epoch": 0.42744656917885265,
1253
+ "grad_norm": 0.8995573558948677,
1254
+ "learning_rate": 6.5851504715863345e-06,
1255
+ "loss": 1.661231231689453,
1256
+ "memory(GiB)": 56.25,
1257
+ "step": 570,
1258
+ "token_acc": 0.5656868865305689,
1259
+ "train_speed(iter/s)": 0.068394
1260
+ },
1261
+ {
1262
+ "epoch": 0.43119610048743906,
1263
+ "grad_norm": 0.8124537569507534,
1264
+ "learning_rate": 6.526192386954853e-06,
1265
+ "loss": 1.6499761581420898,
1266
+ "memory(GiB)": 56.25,
1267
+ "step": 575,
1268
+ "token_acc": 0.5715577630764308,
1269
+ "train_speed(iter/s)": 0.06847
1270
+ },
1271
+ {
1272
+ "epoch": 0.4349456317960255,
1273
+ "grad_norm": 0.7462194182878145,
1274
+ "learning_rate": 6.466999351996266e-06,
1275
+ "loss": 1.6548486709594727,
1276
+ "memory(GiB)": 56.25,
1277
+ "step": 580,
1278
+ "token_acc": 0.5821864123070383,
1279
+ "train_speed(iter/s)": 0.068539
1280
+ },
1281
+ {
1282
+ "epoch": 0.43869516310461193,
1283
+ "grad_norm": 0.6235784183842846,
1284
+ "learning_rate": 6.407580479207166e-06,
1285
+ "loss": 1.6605405807495117,
1286
+ "memory(GiB)": 56.25,
1287
+ "step": 585,
1288
+ "token_acc": 0.5868372560515956,
1289
+ "train_speed(iter/s)": 0.068608
1290
+ },
1291
+ {
1292
+ "epoch": 0.44244469441319834,
1293
+ "grad_norm": 0.6346754736055541,
1294
+ "learning_rate": 6.347944915850846e-06,
1295
+ "loss": 1.6546539306640624,
1296
+ "memory(GiB)": 56.25,
1297
+ "step": 590,
1298
+ "token_acc": 0.5951824777399143,
1299
+ "train_speed(iter/s)": 0.068679
1300
+ },
1301
+ {
1302
+ "epoch": 0.4461942257217848,
1303
+ "grad_norm": 0.5320819638022591,
1304
+ "learning_rate": 6.288101842549117e-06,
1305
+ "loss": 1.6418937683105468,
1306
+ "memory(GiB)": 56.25,
1307
+ "step": 595,
1308
+ "token_acc": 0.595732223476298,
1309
+ "train_speed(iter/s)": 0.068751
1310
+ },
1311
+ {
1312
+ "epoch": 0.4499437570303712,
1313
+ "grad_norm": 0.5547338870237839,
1314
+ "learning_rate": 6.228060471868998e-06,
1315
+ "loss": 1.6649009704589843,
1316
+ "memory(GiB)": 56.25,
1317
+ "step": 600,
1318
+ "token_acc": 0.5625491462592676,
1319
+ "train_speed(iter/s)": 0.068817
1320
+ },
1321
+ {
1322
+ "epoch": 0.4499437570303712,
1323
+ "eval_loss": 1.6608483791351318,
1324
+ "eval_runtime": 42.6065,
1325
+ "eval_samples_per_second": 60.648,
1326
+ "eval_steps_per_second": 1.267,
1327
+ "eval_token_acc": 0.5769941290838164,
1328
+ "step": 600
1329
+ },
1330
+ {
1331
+ "epoch": 0.4536932883389576,
1332
+ "grad_norm": 0.5482512110483113,
1333
+ "learning_rate": 6.167830046904481e-06,
1334
+ "loss": 1.656534767150879,
1335
+ "memory(GiB)": 56.25,
1336
+ "step": 605,
1337
+ "token_acc": 0.5782699227350105,
1338
+ "train_speed(iter/s)": 0.068151
1339
+ },
1340
+ {
1341
+ "epoch": 0.4574428196475441,
1342
+ "grad_norm": 0.7099660977474908,
1343
+ "learning_rate": 6.1074198398535964e-06,
1344
+ "loss": 1.6757530212402343,
1345
+ "memory(GiB)": 56.25,
1346
+ "step": 610,
1347
+ "token_acc": 0.5806772839007541,
1348
+ "train_speed(iter/s)": 0.068224
1349
+ },
1350
+ {
1351
+ "epoch": 0.4611923509561305,
1352
+ "grad_norm": 0.6849024601166154,
1353
+ "learning_rate": 6.0468391505910064e-06,
1354
+ "loss": 1.6539661407470703,
1355
+ "memory(GiB)": 56.25,
1356
+ "step": 615,
1357
+ "token_acc": 0.5897343123691958,
1358
+ "train_speed(iter/s)": 0.068293
1359
+ },
1360
+ {
1361
+ "epoch": 0.4649418822647169,
1362
+ "grad_norm": 0.6387929116637958,
1363
+ "learning_rate": 5.986097305236327e-06,
1364
+ "loss": 1.6756690979003905,
1365
+ "memory(GiB)": 56.25,
1366
+ "step": 620,
1367
+ "token_acc": 0.566821454125962,
1368
+ "train_speed(iter/s)": 0.068364
1369
+ },
1370
+ {
1371
+ "epoch": 0.46869141357330335,
1372
+ "grad_norm": 0.5972563894220153,
1373
+ "learning_rate": 5.925203654718416e-06,
1374
+ "loss": 1.647392463684082,
1375
+ "memory(GiB)": 56.25,
1376
+ "step": 625,
1377
+ "token_acc": 0.5551161839909385,
1378
+ "train_speed(iter/s)": 0.068434
1379
+ },
1380
+ {
1381
+ "epoch": 0.47244094488188976,
1382
+ "grad_norm": 0.7595259975476277,
1383
+ "learning_rate": 5.8641675733358415e-06,
1384
+ "loss": 1.6608802795410156,
1385
+ "memory(GiB)": 56.25,
1386
+ "step": 630,
1387
+ "token_acc": 0.5813017902089991,
1388
+ "train_speed(iter/s)": 0.068499
1389
+ },
1390
+ {
1391
+ "epoch": 0.47619047619047616,
1392
+ "grad_norm": 0.6016960158110866,
1393
+ "learning_rate": 5.8029984573137545e-06,
1394
+ "loss": 1.6570241928100586,
1395
+ "memory(GiB)": 56.25,
1396
+ "step": 635,
1397
+ "token_acc": 0.5774235481033102,
1398
+ "train_speed(iter/s)": 0.068565
1399
+ },
1400
+ {
1401
+ "epoch": 0.4799400074990626,
1402
+ "grad_norm": 0.6354804378746023,
1403
+ "learning_rate": 5.741705723357372e-06,
1404
+ "loss": 1.6393739700317382,
1405
+ "memory(GiB)": 56.25,
1406
+ "step": 640,
1407
+ "token_acc": 0.5658296440587923,
1408
+ "train_speed(iter/s)": 0.068634
1409
+ },
1410
+ {
1411
+ "epoch": 0.48368953880764903,
1412
+ "grad_norm": 0.6139385718585026,
1413
+ "learning_rate": 5.680298807202332e-06,
1414
+ "loss": 1.6520938873291016,
1415
+ "memory(GiB)": 56.25,
1416
+ "step": 645,
1417
+ "token_acc": 0.5701600067018514,
1418
+ "train_speed(iter/s)": 0.068699
1419
+ },
1420
+ {
1421
+ "epoch": 0.4874390701162355,
1422
+ "grad_norm": 0.5387776161432817,
1423
+ "learning_rate": 5.618787162162093e-06,
1424
+ "loss": 1.6615371704101562,
1425
+ "memory(GiB)": 56.25,
1426
+ "step": 650,
1427
+ "token_acc": 0.58879061422444,
1428
+ "train_speed(iter/s)": 0.068765
1429
+ },
1430
+ {
1431
+ "epoch": 0.4874390701162355,
1432
+ "eval_loss": 1.6536719799041748,
1433
+ "eval_runtime": 42.5747,
1434
+ "eval_samples_per_second": 60.693,
1435
+ "eval_steps_per_second": 1.268,
1436
+ "eval_token_acc": 0.5791460048943351,
1437
+ "step": 650
1438
+ },
1439
+ {
1440
+ "epoch": 0.4911886014248219,
1441
+ "grad_norm": 0.7530435416345947,
1442
+ "learning_rate": 5.557180257672651e-06,
1443
+ "loss": 1.6289875030517578,
1444
+ "memory(GiB)": 56.25,
1445
+ "step": 655,
1446
+ "token_acc": 0.5682923514329584,
1447
+ "train_speed(iter/s)": 0.06816
1448
+ },
1449
+ {
1450
+ "epoch": 0.4949381327334083,
1451
+ "grad_norm": 0.7717324275253389,
1452
+ "learning_rate": 5.495487577834758e-06,
1453
+ "loss": 1.6309438705444337,
1454
+ "memory(GiB)": 56.25,
1455
+ "step": 660,
1456
+ "token_acc": 0.587423404922088,
1457
+ "train_speed(iter/s)": 0.068227
1458
+ },
1459
+ {
1460
+ "epoch": 0.49868766404199477,
1461
+ "grad_norm": 0.5895405199877466,
1462
+ "learning_rate": 5.433718619953883e-06,
1463
+ "loss": 1.6529617309570312,
1464
+ "memory(GiB)": 56.25,
1465
+ "step": 665,
1466
+ "token_acc": 0.6041080816725761,
1467
+ "train_speed(iter/s)": 0.068294
1468
+ },
1469
+ {
1470
+ "epoch": 0.5024371953505812,
1471
+ "grad_norm": 0.6454461142803576,
1472
+ "learning_rate": 5.3718828930781564e-06,
1473
+ "loss": 1.6488149642944336,
1474
+ "memory(GiB)": 56.25,
1475
+ "step": 670,
1476
+ "token_acc": 0.5847953625785856,
1477
+ "train_speed(iter/s)": 0.068355
1478
+ },
1479
+ {
1480
+ "epoch": 0.5061867266591676,
1481
+ "grad_norm": 0.673823253803827,
1482
+ "learning_rate": 5.309989916534482e-06,
1483
+ "loss": 1.6447179794311524,
1484
+ "memory(GiB)": 56.25,
1485
+ "step": 675,
1486
+ "token_acc": 0.5982730626303525,
1487
+ "train_speed(iter/s)": 0.068416
1488
+ },
1489
+ {
1490
+ "epoch": 0.509936257967754,
1491
+ "grad_norm": 0.5749920984370196,
1492
+ "learning_rate": 5.2480492184630975e-06,
1493
+ "loss": 1.6585128784179688,
1494
+ "memory(GiB)": 56.25,
1495
+ "step": 680,
1496
+ "token_acc": 0.5631895577350786,
1497
+ "train_speed(iter/s)": 0.068475
1498
+ },
1499
+ {
1500
+ "epoch": 0.5136857892763405,
1501
+ "grad_norm": 0.6387955325138472,
1502
+ "learning_rate": 5.1860703343507415e-06,
1503
+ "loss": 1.6461997985839845,
1504
+ "memory(GiB)": 56.25,
1505
+ "step": 685,
1506
+ "token_acc": 0.5752287735058492,
1507
+ "train_speed(iter/s)": 0.068538
1508
+ },
1509
+ {
1510
+ "epoch": 0.5174353205849269,
1511
+ "grad_norm": 0.4715376939278375,
1512
+ "learning_rate": 5.124062805562725e-06,
1513
+ "loss": 1.6515190124511718,
1514
+ "memory(GiB)": 56.25,
1515
+ "step": 690,
1516
+ "token_acc": 0.5873850771494762,
1517
+ "train_speed(iter/s)": 0.068601
1518
+ },
1519
+ {
1520
+ "epoch": 0.5211848518935133,
1521
+ "grad_norm": 0.625438137972778,
1522
+ "learning_rate": 5.062036177874075e-06,
1523
+ "loss": 1.6295310974121093,
1524
+ "memory(GiB)": 56.25,
1525
+ "step": 695,
1526
+ "token_acc": 0.5652246926848928,
1527
+ "train_speed(iter/s)": 0.06866
1528
+ },
1529
+ {
1530
+ "epoch": 0.5249343832020997,
1531
+ "grad_norm": 0.5483253705980401,
1532
+ "learning_rate": 5e-06,
1533
+ "loss": 1.6582225799560546,
1534
+ "memory(GiB)": 56.25,
1535
+ "step": 700,
1536
+ "token_acc": 0.5789502069653619,
1537
+ "train_speed(iter/s)": 0.068716
1538
+ },
1539
+ {
1540
+ "epoch": 0.5249343832020997,
1541
+ "eval_loss": 1.646588921546936,
1542
+ "eval_runtime": 42.7729,
1543
+ "eval_samples_per_second": 60.412,
1544
+ "eval_steps_per_second": 1.262,
1545
+ "eval_token_acc": 0.581152436927787,
1546
+ "step": 700
1547
+ },
1548
+ {
1549
+ "epoch": 0.5286839145106862,
1550
+ "grad_norm": 0.6796052373678554,
1551
+ "learning_rate": 4.937963822125928e-06,
1552
+ "loss": 1.6299430847167968,
1553
+ "memory(GiB)": 56.25,
1554
+ "step": 705,
1555
+ "token_acc": 0.5749007372754903,
1556
+ "train_speed(iter/s)": 0.068145
1557
+ },
1558
+ {
1559
+ "epoch": 0.5324334458192725,
1560
+ "grad_norm": 0.5623730859974818,
1561
+ "learning_rate": 4.875937194437275e-06,
1562
+ "loss": 1.6273029327392579,
1563
+ "memory(GiB)": 56.25,
1564
+ "step": 710,
1565
+ "token_acc": 0.5774565153542408,
1566
+ "train_speed(iter/s)": 0.068206
1567
+ },
1568
+ {
1569
+ "epoch": 0.536182977127859,
1570
+ "grad_norm": 0.5556142061160076,
1571
+ "learning_rate": 4.813929665649261e-06,
1572
+ "loss": 1.6527215957641601,
1573
+ "memory(GiB)": 56.25,
1574
+ "step": 715,
1575
+ "token_acc": 0.5842528134716451,
1576
+ "train_speed(iter/s)": 0.068268
1577
+ },
1578
+ {
1579
+ "epoch": 0.5399325084364455,
1580
+ "grad_norm": 0.6937818396456759,
1581
+ "learning_rate": 4.751950781536905e-06,
1582
+ "loss": 1.6325241088867188,
1583
+ "memory(GiB)": 56.25,
1584
+ "step": 720,
1585
+ "token_acc": 0.550934829059829,
1586
+ "train_speed(iter/s)": 0.068328
1587
+ },
1588
+ {
1589
+ "epoch": 0.5436820397450318,
1590
+ "grad_norm": 0.6616141973881087,
1591
+ "learning_rate": 4.6900100834655185e-06,
1592
+ "loss": 1.6385433197021484,
1593
+ "memory(GiB)": 56.25,
1594
+ "step": 725,
1595
+ "token_acc": 0.6094119122781343,
1596
+ "train_speed(iter/s)": 0.068386
1597
+ },
1598
+ {
1599
+ "epoch": 0.5474315710536183,
1600
+ "grad_norm": 0.6804973836917435,
1601
+ "learning_rate": 4.628117106921845e-06,
1602
+ "loss": 1.64478759765625,
1603
+ "memory(GiB)": 56.25,
1604
+ "step": 730,
1605
+ "token_acc": 0.5699465347192243,
1606
+ "train_speed(iter/s)": 0.068446
1607
+ },
1608
+ {
1609
+ "epoch": 0.5511811023622047,
1610
+ "grad_norm": 0.644058141393178,
1611
+ "learning_rate": 4.566281380046117e-06,
1612
+ "loss": 1.6421289443969727,
1613
+ "memory(GiB)": 56.25,
1614
+ "step": 735,
1615
+ "token_acc": 0.6001181374556146,
1616
+ "train_speed(iter/s)": 0.068503
1617
+ },
1618
+ {
1619
+ "epoch": 0.5549306336707911,
1620
+ "grad_norm": 0.656894451589833,
1621
+ "learning_rate": 4.5045124221652445e-06,
1622
+ "loss": 1.6506340026855468,
1623
+ "memory(GiB)": 56.25,
1624
+ "step": 740,
1625
+ "token_acc": 0.5980861889787159,
1626
+ "train_speed(iter/s)": 0.068564
1627
+ },
1628
+ {
1629
+ "epoch": 0.5586801649793776,
1630
+ "grad_norm": 0.6955052749545766,
1631
+ "learning_rate": 4.44281974232735e-06,
1632
+ "loss": 1.615028190612793,
1633
+ "memory(GiB)": 56.25,
1634
+ "step": 745,
1635
+ "token_acc": 0.5978904615718982,
1636
+ "train_speed(iter/s)": 0.06862
1637
+ },
1638
+ {
1639
+ "epoch": 0.562429696287964,
1640
+ "grad_norm": 0.5828454100208457,
1641
+ "learning_rate": 4.381212837837909e-06,
1642
+ "loss": 1.6183147430419922,
1643
+ "memory(GiB)": 56.25,
1644
+ "step": 750,
1645
+ "token_acc": 0.5896384248921636,
1646
+ "train_speed(iter/s)": 0.068677
1647
+ },
1648
+ {
1649
+ "epoch": 0.562429696287964,
1650
+ "eval_loss": 1.6402919292449951,
1651
+ "eval_runtime": 42.4068,
1652
+ "eval_samples_per_second": 60.934,
1653
+ "eval_steps_per_second": 1.273,
1654
+ "eval_token_acc": 0.5830287350554422,
1655
+ "step": 750
1656
+ },
1657
+ {
1658
+ "epoch": 0.5661792275965505,
1659
+ "grad_norm": 0.6333529913255437,
1660
+ "learning_rate": 4.319701192797671e-06,
1661
+ "loss": 1.6344381332397462,
1662
+ "memory(GiB)": 56.25,
1663
+ "step": 755,
1664
+ "token_acc": 0.5781321414957826,
1665
+ "train_speed(iter/s)": 0.068158
1666
+ },
1667
+ {
1668
+ "epoch": 0.5699287589051368,
1669
+ "grad_norm": 0.7581894334078688,
1670
+ "learning_rate": 4.258294276642629e-06,
1671
+ "loss": 1.6357366561889648,
1672
+ "memory(GiB)": 56.25,
1673
+ "step": 760,
1674
+ "token_acc": 0.5599813650128116,
1675
+ "train_speed(iter/s)": 0.068217
1676
+ },
1677
+ {
1678
+ "epoch": 0.5736782902137233,
1679
+ "grad_norm": 0.6702527130448565,
1680
+ "learning_rate": 4.197001542686248e-06,
1681
+ "loss": 1.623959732055664,
1682
+ "memory(GiB)": 56.25,
1683
+ "step": 765,
1684
+ "token_acc": 0.600753268699251,
1685
+ "train_speed(iter/s)": 0.068276
1686
+ },
1687
+ {
1688
+ "epoch": 0.5774278215223098,
1689
+ "grad_norm": 0.5903205080818366,
1690
+ "learning_rate": 4.135832426664159e-06,
1691
+ "loss": 1.644424819946289,
1692
+ "memory(GiB)": 56.25,
1693
+ "step": 770,
1694
+ "token_acc": 0.5540505934872132,
1695
+ "train_speed(iter/s)": 0.068329
1696
+ },
1697
+ {
1698
+ "epoch": 0.5811773528308961,
1699
+ "grad_norm": 0.6763632334448669,
1700
+ "learning_rate": 4.074796345281586e-06,
1701
+ "loss": 1.637106704711914,
1702
+ "memory(GiB)": 56.25,
1703
+ "step": 775,
1704
+ "token_acc": 0.5886795020069392,
1705
+ "train_speed(iter/s)": 0.068385
1706
+ },
1707
+ {
1708
+ "epoch": 0.5849268841394826,
1709
+ "grad_norm": 0.5776530336437159,
1710
+ "learning_rate": 4.013902694763675e-06,
1711
+ "loss": 1.6276361465454101,
1712
+ "memory(GiB)": 56.25,
1713
+ "step": 780,
1714
+ "token_acc": 0.5987051157048562,
1715
+ "train_speed(iter/s)": 0.068441
1716
+ },
1717
+ {
1718
+ "epoch": 0.588676415448069,
1719
+ "grad_norm": 0.6034625719784629,
1720
+ "learning_rate": 3.953160849408996e-06,
1721
+ "loss": 1.629465103149414,
1722
+ "memory(GiB)": 56.25,
1723
+ "step": 785,
1724
+ "token_acc": 0.618713946271952,
1725
+ "train_speed(iter/s)": 0.068491
1726
+ },
1727
+ {
1728
+ "epoch": 0.5924259467566554,
1729
+ "grad_norm": 0.5419553921726837,
1730
+ "learning_rate": 3.892580160146406e-06,
1731
+ "loss": 1.6470441818237305,
1732
+ "memory(GiB)": 56.25,
1733
+ "step": 790,
1734
+ "token_acc": 0.5608493116499604,
1735
+ "train_speed(iter/s)": 0.068545
1736
+ },
1737
+ {
1738
+ "epoch": 0.5961754780652418,
1739
+ "grad_norm": 0.5829879329157514,
1740
+ "learning_rate": 3.832169953095521e-06,
1741
+ "loss": 1.625351333618164,
1742
+ "memory(GiB)": 56.25,
1743
+ "step": 795,
1744
+ "token_acc": 0.5870836712285511,
1745
+ "train_speed(iter/s)": 0.068598
1746
+ },
1747
+ {
1748
+ "epoch": 0.5999250093738283,
1749
+ "grad_norm": 0.6022627590387872,
1750
+ "learning_rate": 3.771939528131002e-06,
1751
+ "loss": 1.6552854537963868,
1752
+ "memory(GiB)": 56.25,
1753
+ "step": 800,
1754
+ "token_acc": 0.5696196443784868,
1755
+ "train_speed(iter/s)": 0.068653
1756
+ },
1757
+ {
1758
+ "epoch": 0.5999250093738283,
1759
+ "eval_loss": 1.6350337266921997,
1760
+ "eval_runtime": 42.2859,
1761
+ "eval_samples_per_second": 61.108,
1762
+ "eval_steps_per_second": 1.277,
1763
+ "eval_token_acc": 0.5844979583867213,
1764
+ "step": 800
1765
+ },
1766
+ {
1767
+ "epoch": 0.6036745406824147,
1768
+ "grad_norm": 0.5942705099349358,
1769
+ "learning_rate": 3.7118981574508845e-06,
1770
+ "loss": 1.6126163482666016,
1771
+ "memory(GiB)": 69.47,
1772
+ "step": 805,
1773
+ "token_acc": 0.581555841193403,
1774
+ "train_speed(iter/s)": 0.068172
1775
+ },
1776
+ {
1777
+ "epoch": 0.6074240719910011,
1778
+ "grad_norm": 0.6117465912856315,
1779
+ "learning_rate": 3.652055084149155e-06,
1780
+ "loss": 1.6182636260986327,
1781
+ "memory(GiB)": 69.47,
1782
+ "step": 810,
1783
+ "token_acc": 0.5813004785607525,
1784
+ "train_speed(iter/s)": 0.068228
1785
+ },
1786
+ {
1787
+ "epoch": 0.6111736032995876,
1788
+ "grad_norm": 0.575721344187492,
1789
+ "learning_rate": 3.5924195207928353e-06,
1790
+ "loss": 1.6351320266723632,
1791
+ "memory(GiB)": 69.47,
1792
+ "step": 815,
1793
+ "token_acc": 0.5948800021886178,
1794
+ "train_speed(iter/s)": 0.068282
1795
+ },
1796
+ {
1797
+ "epoch": 0.6149231346081739,
1798
+ "grad_norm": 0.6604622423054451,
1799
+ "learning_rate": 3.5330006480037347e-06,
1800
+ "loss": 1.624082374572754,
1801
+ "memory(GiB)": 69.47,
1802
+ "step": 820,
1803
+ "token_acc": 0.592462272265228,
1804
+ "train_speed(iter/s)": 0.068336
1805
+ },
1806
+ {
1807
+ "epoch": 0.6186726659167604,
1808
+ "grad_norm": 0.5815560175144889,
1809
+ "learning_rate": 3.4738076130451486e-06,
1810
+ "loss": 1.611598587036133,
1811
+ "memory(GiB)": 69.47,
1812
+ "step": 825,
1813
+ "token_acc": 0.6089666369873775,
1814
+ "train_speed(iter/s)": 0.068389
1815
+ },
1816
+ {
1817
+ "epoch": 0.6224221972253469,
1818
+ "grad_norm": 0.6364044991983494,
1819
+ "learning_rate": 3.4148495284136667e-06,
1820
+ "loss": 1.6444446563720703,
1821
+ "memory(GiB)": 69.47,
1822
+ "step": 830,
1823
+ "token_acc": 0.5723734347579862,
1824
+ "train_speed(iter/s)": 0.068442
1825
+ },
1826
+ {
1827
+ "epoch": 0.6261717285339332,
1828
+ "grad_norm": 0.6154495413241989,
1829
+ "learning_rate": 3.3561354704363563e-06,
1830
+ "loss": 1.6572345733642577,
1831
+ "memory(GiB)": 69.47,
1832
+ "step": 835,
1833
+ "token_acc": 0.5823670492160711,
1834
+ "train_speed(iter/s)": 0.068497
1835
+ },
1836
+ {
1837
+ "epoch": 0.6299212598425197,
1838
+ "grad_norm": 0.5623546818589049,
1839
+ "learning_rate": 3.2976744778734988e-06,
1840
+ "loss": 1.6390718460083007,
1841
+ "memory(GiB)": 69.47,
1842
+ "step": 840,
1843
+ "token_acc": 0.5890854687950369,
1844
+ "train_speed(iter/s)": 0.068547
1845
+ },
1846
+ {
1847
+ "epoch": 0.6336707911511061,
1848
+ "grad_norm": 0.5272938620213984,
1849
+ "learning_rate": 3.2394755505271125e-06,
1850
+ "loss": 1.6225774765014649,
1851
+ "memory(GiB)": 69.47,
1852
+ "step": 845,
1853
+ "token_acc": 0.5662987770510283,
1854
+ "train_speed(iter/s)": 0.068596
1855
+ },
1856
+ {
1857
+ "epoch": 0.6374203224596925,
1858
+ "grad_norm": 0.6526804036833902,
1859
+ "learning_rate": 3.181547647855475e-06,
1860
+ "loss": 1.613861083984375,
1861
+ "memory(GiB)": 69.47,
1862
+ "step": 850,
1863
+ "token_acc": 0.5893846343231525,
1864
+ "train_speed(iter/s)": 0.068648
1865
+ },
1866
+ {
1867
+ "epoch": 0.6374203224596925,
1868
+ "eval_loss": 1.6293845176696777,
1869
+ "eval_runtime": 42.4837,
1870
+ "eval_samples_per_second": 60.823,
1871
+ "eval_steps_per_second": 1.271,
1872
+ "eval_token_acc": 0.5861385264416682,
1873
+ "step": 850
1874
+ },
1875
+ {
1876
+ "epoch": 0.641169853768279,
1877
+ "grad_norm": 0.6043970243871736,
1878
+ "learning_rate": 3.1238996875938604e-06,
1879
+ "loss": 1.611886215209961,
1880
+ "memory(GiB)": 69.47,
1881
+ "step": 855,
1882
+ "token_acc": 0.5842907996411311,
1883
+ "train_speed(iter/s)": 0.068192
1884
+ },
1885
+ {
1886
+ "epoch": 0.6449193850768654,
1887
+ "grad_norm": 0.5603396498660076,
1888
+ "learning_rate": 3.0665405443816886e-06,
1889
+ "loss": 1.6631004333496093,
1890
+ "memory(GiB)": 69.47,
1891
+ "step": 860,
1892
+ "token_acc": 0.5783650290912643,
1893
+ "train_speed(iter/s)": 0.068245
1894
+ },
1895
+ {
1896
+ "epoch": 0.6486689163854518,
1897
+ "grad_norm": 0.6300453949234538,
1898
+ "learning_rate": 3.009479048396321e-06,
1899
+ "loss": 1.6396780014038086,
1900
+ "memory(GiB)": 69.47,
1901
+ "step": 865,
1902
+ "token_acc": 0.6025950351187603,
1903
+ "train_speed(iter/s)": 0.068295
1904
+ },
1905
+ {
1906
+ "epoch": 0.6524184476940382,
1907
+ "grad_norm": 0.594302672746407,
1908
+ "learning_rate": 2.952723983993684e-06,
1909
+ "loss": 1.6129501342773438,
1910
+ "memory(GiB)": 69.47,
1911
+ "step": 870,
1912
+ "token_acc": 0.5875128078883285,
1913
+ "train_speed(iter/s)": 0.068345
1914
+ },
1915
+ {
1916
+ "epoch": 0.6561679790026247,
1917
+ "grad_norm": 0.5632989526424523,
1918
+ "learning_rate": 2.8962840883559724e-06,
1919
+ "loss": 1.6319787979125977,
1920
+ "memory(GiB)": 69.47,
1921
+ "step": 875,
1922
+ "token_acc": 0.5844086299001264,
1923
+ "train_speed(iter/s)": 0.068395
1924
+ },
1925
+ {
1926
+ "epoch": 0.6599175103112112,
1927
+ "grad_norm": 0.6424604810798464,
1928
+ "learning_rate": 2.840168050146591e-06,
1929
+ "loss": 1.6522933959960937,
1930
+ "memory(GiB)": 69.47,
1931
+ "step": 880,
1932
+ "token_acc": 0.608855717389555,
1933
+ "train_speed(iter/s)": 0.068443
1934
+ },
1935
+ {
1936
+ "epoch": 0.6636670416197975,
1937
+ "grad_norm": 0.6469172755932601,
1938
+ "learning_rate": 2.7843845081725814e-06,
1939
+ "loss": 1.643354606628418,
1940
+ "memory(GiB)": 69.47,
1941
+ "step": 885,
1942
+ "token_acc": 0.5810390809667215,
1943
+ "train_speed(iter/s)": 0.06849
1944
+ },
1945
+ {
1946
+ "epoch": 0.667416572928384,
1947
+ "grad_norm": 0.5183485601959567,
1948
+ "learning_rate": 2.728942050054705e-06,
1949
+ "loss": 1.6205764770507813,
1950
+ "memory(GiB)": 69.47,
1951
+ "step": 890,
1952
+ "token_acc": 0.5878430401103352,
1953
+ "train_speed(iter/s)": 0.068532
1954
+ },
1955
+ {
1956
+ "epoch": 0.6711661042369704,
1957
+ "grad_norm": 0.6352153807462149,
1958
+ "learning_rate": 2.6738492109054305e-06,
1959
+ "loss": 1.6091579437255858,
1960
+ "memory(GiB)": 69.47,
1961
+ "step": 895,
1962
+ "token_acc": 0.5991015059266862,
1963
+ "train_speed(iter/s)": 0.068581
1964
+ },
1965
+ {
1966
+ "epoch": 0.6749156355455568,
1967
+ "grad_norm": 0.5703799504436093,
1968
+ "learning_rate": 2.6191144720149853e-06,
1969
+ "loss": 1.6384305953979492,
1970
+ "memory(GiB)": 69.47,
1971
+ "step": 900,
1972
+ "token_acc": 0.5845274359009385,
1973
+ "train_speed(iter/s)": 0.068623
1974
+ },
1975
+ {
1976
+ "epoch": 0.6749156355455568,
1977
+ "eval_loss": 1.6254901885986328,
1978
+ "eval_runtime": 42.4366,
1979
+ "eval_samples_per_second": 60.891,
1980
+ "eval_steps_per_second": 1.272,
1981
+ "eval_token_acc": 0.5872697791215511,
1982
+ "step": 900
1983
+ },
1984
+ {
1985
+ "epoch": 0.6786651668541432,
1986
+ "grad_norm": 0.5425064005079052,
1987
+ "learning_rate": 2.5647462595457073e-06,
1988
+ "loss": 1.5991607666015626,
1989
+ "memory(GiB)": 69.47,
1990
+ "step": 905,
1991
+ "token_acc": 0.5766468975192909,
1992
+ "train_speed(iter/s)": 0.0682
1993
+ },
1994
+ {
1995
+ "epoch": 0.6824146981627297,
1996
+ "grad_norm": 0.6083526490876717,
1997
+ "learning_rate": 2.5107529432348664e-06,
1998
+ "loss": 1.6035924911499024,
1999
+ "memory(GiB)": 69.47,
2000
+ "step": 910,
2001
+ "token_acc": 0.5805827292561053,
2002
+ "train_speed(iter/s)": 0.06825
2003
+ },
2004
+ {
2005
+ "epoch": 0.6861642294713161,
2006
+ "grad_norm": 0.6382000976807077,
2007
+ "learning_rate": 2.4571428351061872e-06,
2008
+ "loss": 1.610848617553711,
2009
+ "memory(GiB)": 69.47,
2010
+ "step": 915,
2011
+ "token_acc": 0.6075299795955721,
2012
+ "train_speed(iter/s)": 0.068299
2013
+ },
2014
+ {
2015
+ "epoch": 0.6899137607799025,
2016
+ "grad_norm": 0.6269897687508513,
2017
+ "learning_rate": 2.403924188190247e-06,
2018
+ "loss": 1.6187686920166016,
2019
+ "memory(GiB)": 69.47,
2020
+ "step": 920,
2021
+ "token_acc": 0.5857751413329086,
2022
+ "train_speed(iter/s)": 0.068347
2023
+ },
2024
+ {
2025
+ "epoch": 0.693663292088489,
2026
+ "grad_norm": 0.5561791127288412,
2027
+ "learning_rate": 2.3511051952539703e-06,
2028
+ "loss": 1.6138473510742188,
2029
+ "memory(GiB)": 69.47,
2030
+ "step": 925,
2031
+ "token_acc": 0.5752952655172179,
2032
+ "train_speed(iter/s)": 0.068393
2033
+ },
2034
+ {
2035
+ "epoch": 0.6974128233970753,
2036
+ "grad_norm": 0.5157680455833342,
2037
+ "learning_rate": 2.2986939875393753e-06,
2038
+ "loss": 1.6303401947021485,
2039
+ "memory(GiB)": 69.47,
2040
+ "step": 930,
2041
+ "token_acc": 0.5684129567444703,
2042
+ "train_speed(iter/s)": 0.068439
2043
+ },
2044
+ {
2045
+ "epoch": 0.7011623547056618,
2046
+ "grad_norm": 0.4944152577486627,
2047
+ "learning_rate": 2.246698633511813e-06,
2048
+ "loss": 1.6404861450195312,
2049
+ "memory(GiB)": 69.47,
2050
+ "step": 935,
2051
+ "token_acc": 0.6276472253680634,
2052
+ "train_speed(iter/s)": 0.068486
2053
+ },
2054
+ {
2055
+ "epoch": 0.7049118860142483,
2056
+ "grad_norm": 0.5307892695648406,
2057
+ "learning_rate": 2.1951271376178708e-06,
2058
+ "loss": 1.6242366790771485,
2059
+ "memory(GiB)": 69.47,
2060
+ "step": 940,
2061
+ "token_acc": 0.5880176396748096,
2062
+ "train_speed(iter/s)": 0.068532
2063
+ },
2064
+ {
2065
+ "epoch": 0.7086614173228346,
2066
+ "grad_norm": 0.5381438614462183,
2067
+ "learning_rate": 2.143987439053111e-06,
2068
+ "loss": 1.6123714447021484,
2069
+ "memory(GiB)": 69.47,
2070
+ "step": 945,
2071
+ "token_acc": 0.6045954904448173,
2072
+ "train_speed(iter/s)": 0.068578
2073
+ },
2074
+ {
2075
+ "epoch": 0.7124109486314211,
2076
+ "grad_norm": 0.6061036399917892,
2077
+ "learning_rate": 2.0932874105398774e-06,
2078
+ "loss": 1.6138986587524413,
2079
+ "memory(GiB)": 69.47,
2080
+ "step": 950,
2081
+ "token_acc": 0.6026358079705822,
2082
+ "train_speed(iter/s)": 0.068622
2083
+ },
2084
+ {
2085
+ "epoch": 0.7124109486314211,
2086
+ "eval_loss": 1.6221652030944824,
2087
+ "eval_runtime": 42.632,
2088
+ "eval_samples_per_second": 60.612,
2089
+ "eval_steps_per_second": 1.267,
2090
+ "eval_token_acc": 0.58834860073544,
2091
+ "step": 950
2092
+ },
2093
+ {
2094
+ "epoch": 0.7161604799400075,
2095
+ "grad_norm": 0.5014848788091244,
2096
+ "learning_rate": 2.043034857115323e-06,
2097
+ "loss": 1.6557624816894532,
2098
+ "memory(GiB)": 69.47,
2099
+ "step": 955,
2100
+ "token_acc": 0.5814458131233236,
2101
+ "train_speed(iter/s)": 0.068219
2102
+ },
2103
+ {
2104
+ "epoch": 0.7199100112485939,
2105
+ "grad_norm": 0.5129920621567414,
2106
+ "learning_rate": 1.9932375149298628e-06,
2107
+ "loss": 1.621377944946289,
2108
+ "memory(GiB)": 69.47,
2109
+ "step": 960,
2110
+ "token_acc": 0.6076489727869205,
2111
+ "train_speed(iter/s)": 0.068261
2112
+ },
2113
+ {
2114
+ "epoch": 0.7236595425571803,
2115
+ "grad_norm": 0.506502880183358,
2116
+ "learning_rate": 1.9439030500562243e-06,
2117
+ "loss": 1.6196178436279296,
2118
+ "memory(GiB)": 69.47,
2119
+ "step": 965,
2120
+ "token_acc": 0.5846923847966841,
2121
+ "train_speed(iter/s)": 0.068305
2122
+ },
2123
+ {
2124
+ "epoch": 0.7274090738657668,
2125
+ "grad_norm": 0.5526154345489055,
2126
+ "learning_rate": 1.895039057309293e-06,
2127
+ "loss": 1.6083795547485351,
2128
+ "memory(GiB)": 69.47,
2129
+ "step": 970,
2130
+ "token_acc": 0.5995856662175794,
2131
+ "train_speed(iter/s)": 0.068352
2132
+ },
2133
+ {
2134
+ "epoch": 0.7311586051743532,
2135
+ "grad_norm": 0.5152057399852353,
2136
+ "learning_rate": 1.846653059076925e-06,
2137
+ "loss": 1.6139591217041016,
2138
+ "memory(GiB)": 69.47,
2139
+ "step": 975,
2140
+ "token_acc": 0.5797222490647304,
2141
+ "train_speed(iter/s)": 0.068398
2142
+ },
2143
+ {
2144
+ "epoch": 0.7349081364829396,
2145
+ "grad_norm": 0.4824927120599927,
2146
+ "learning_rate": 1.7987525041619147e-06,
2147
+ "loss": 1.6110733032226563,
2148
+ "memory(GiB)": 69.47,
2149
+ "step": 980,
2150
+ "token_acc": 0.5872091501557306,
2151
+ "train_speed(iter/s)": 0.068442
2152
+ },
2153
+ {
2154
+ "epoch": 0.7386576677915261,
2155
+ "grad_norm": 0.5019512033941552,
2156
+ "learning_rate": 1.7513447666352752e-06,
2157
+ "loss": 1.6114360809326171,
2158
+ "memory(GiB)": 69.47,
2159
+ "step": 985,
2160
+ "token_acc": 0.5956581172217651,
2161
+ "train_speed(iter/s)": 0.068486
2162
+ },
2163
+ {
2164
+ "epoch": 0.7424071991001124,
2165
+ "grad_norm": 0.47021109811395784,
2166
+ "learning_rate": 1.7044371447010483e-06,
2167
+ "loss": 1.6211065292358398,
2168
+ "memory(GiB)": 69.47,
2169
+ "step": 990,
2170
+ "token_acc": 0.569245869458531,
2171
+ "train_speed(iter/s)": 0.068527
2172
+ },
2173
+ {
2174
+ "epoch": 0.7461567304086989,
2175
+ "grad_norm": 0.5121836319300844,
2176
+ "learning_rate": 1.6580368595727586e-06,
2177
+ "loss": 1.6363491058349608,
2178
+ "memory(GiB)": 69.47,
2179
+ "step": 995,
2180
+ "token_acc": 0.5638876766133871,
2181
+ "train_speed(iter/s)": 0.06857
2182
+ },
2183
+ {
2184
+ "epoch": 0.7499062617172854,
2185
+ "grad_norm": 0.47784184287569526,
2186
+ "learning_rate": 1.6121510543617668e-06,
2187
+ "loss": 1.642209243774414,
2188
+ "memory(GiB)": 69.47,
2189
+ "step": 1000,
2190
+ "token_acc": 0.5950520124176243,
2191
+ "train_speed(iter/s)": 0.068612
2192
+ },
2193
+ {
2194
+ "epoch": 0.7499062617172854,
2195
+ "eval_loss": 1.6193331480026245,
2196
+ "eval_runtime": 42.5704,
2197
+ "eval_samples_per_second": 60.7,
2198
+ "eval_steps_per_second": 1.268,
2199
+ "eval_token_acc": 0.589054532607981,
2200
+ "step": 1000
2201
+ },
2202
+ {
2203
+ "epoch": 0.7536557930258717,
2204
+ "grad_norm": 0.5585585189121498,
2205
+ "learning_rate": 1.566786792977597e-06,
2206
+ "loss": 1.6204872131347656,
2207
+ "memory(GiB)": 69.47,
2208
+ "step": 1005,
2209
+ "token_acc": 0.5842172093073306,
2210
+ "train_speed(iter/s)": 0.068226
2211
+ },
2212
+ {
2213
+ "epoch": 0.7574053243344582,
2214
+ "grad_norm": 0.48263411156016095,
2215
+ "learning_rate": 1.5219510590404973e-06,
2216
+ "loss": 1.6084114074707032,
2217
+ "memory(GiB)": 69.47,
2218
+ "step": 1010,
2219
+ "token_acc": 0.6060496626072535,
2220
+ "train_speed(iter/s)": 0.068269
2221
+ },
2222
+ {
2223
+ "epoch": 0.7611548556430446,
2224
+ "grad_norm": 0.4497762254615072,
2225
+ "learning_rate": 1.4776507548063319e-06,
2226
+ "loss": 1.599934959411621,
2227
+ "memory(GiB)": 69.47,
2228
+ "step": 1015,
2229
+ "token_acc": 0.5598463390701008,
2230
+ "train_speed(iter/s)": 0.068312
2231
+ },
2232
+ {
2233
+ "epoch": 0.7649043869516311,
2234
+ "grad_norm": 0.5283692793491782,
2235
+ "learning_rate": 1.4338927001040154e-06,
2236
+ "loss": 1.6040872573852538,
2237
+ "memory(GiB)": 69.47,
2238
+ "step": 1020,
2239
+ "token_acc": 0.5726066007732008,
2240
+ "train_speed(iter/s)": 0.068356
2241
+ },
2242
+ {
2243
+ "epoch": 0.7686539182602175,
2244
+ "grad_norm": 0.5067077833706963,
2245
+ "learning_rate": 1.3906836312856304e-06,
2246
+ "loss": 1.6202468872070312,
2247
+ "memory(GiB)": 69.47,
2248
+ "step": 1025,
2249
+ "token_acc": 0.5568129330254041,
2250
+ "train_speed(iter/s)": 0.068396
2251
+ },
2252
+ {
2253
+ "epoch": 0.7724034495688039,
2254
+ "grad_norm": 0.5090445248471341,
2255
+ "learning_rate": 1.3480302001894007e-06,
2256
+ "loss": 1.6604093551635741,
2257
+ "memory(GiB)": 69.47,
2258
+ "step": 1030,
2259
+ "token_acc": 0.5570990237099024,
2260
+ "train_speed(iter/s)": 0.06844
2261
+ },
2262
+ {
2263
+ "epoch": 0.7761529808773904,
2264
+ "grad_norm": 0.5108687306008184,
2265
+ "learning_rate": 1.3059389731156635e-06,
2266
+ "loss": 1.6176353454589845,
2267
+ "memory(GiB)": 69.47,
2268
+ "step": 1035,
2269
+ "token_acc": 0.6020968399214611,
2270
+ "train_speed(iter/s)": 0.06848
2271
+ },
2272
+ {
2273
+ "epoch": 0.7799025121859767,
2274
+ "grad_norm": 0.5073039466614888,
2275
+ "learning_rate": 1.2644164298160278e-06,
2276
+ "loss": 1.6030658721923827,
2277
+ "memory(GiB)": 69.47,
2278
+ "step": 1040,
2279
+ "token_acc": 0.5653390050744758,
2280
+ "train_speed(iter/s)": 0.068521
2281
+ },
2282
+ {
2283
+ "epoch": 0.7836520434945632,
2284
+ "grad_norm": 0.5102888558915185,
2285
+ "learning_rate": 1.2234689624958307e-06,
2286
+ "loss": 1.6093769073486328,
2287
+ "memory(GiB)": 69.47,
2288
+ "step": 1045,
2289
+ "token_acc": 0.5926654563459317,
2290
+ "train_speed(iter/s)": 0.06856
2291
+ },
2292
+ {
2293
+ "epoch": 0.7874015748031497,
2294
+ "grad_norm": 0.5211478805411307,
2295
+ "learning_rate": 1.1831028748301071e-06,
2296
+ "loss": 1.6387947082519532,
2297
+ "memory(GiB)": 69.47,
2298
+ "step": 1050,
2299
+ "token_acc": 0.5662679083094556,
2300
+ "train_speed(iter/s)": 0.068597
2301
+ },
2302
+ {
2303
+ "epoch": 0.7874015748031497,
2304
+ "eval_loss": 1.6168898344039917,
2305
+ "eval_runtime": 43.2438,
2306
+ "eval_samples_per_second": 59.754,
2307
+ "eval_steps_per_second": 1.249,
2308
+ "eval_token_acc": 0.5897800736992036,
2309
+ "step": 1050
2310
+ },
2311
+ {
2312
+ "epoch": 0.791151106111736,
2313
+ "grad_norm": 0.49447345448344293,
2314
+ "learning_rate": 1.14332438099315e-06,
2315
+ "loss": 1.603018569946289,
2316
+ "memory(GiB)": 69.47,
2317
+ "step": 1055,
2318
+ "token_acc": 0.5800278558567661,
2319
+ "train_speed(iter/s)": 0.06823
2320
+ },
2321
+ {
2322
+ "epoch": 0.7949006374203225,
2323
+ "grad_norm": 0.4741061460173634,
2324
+ "learning_rate": 1.1041396047018793e-06,
2325
+ "loss": 1.6282894134521484,
2326
+ "memory(GiB)": 69.47,
2327
+ "step": 1060,
2328
+ "token_acc": 0.5936924273646574,
2329
+ "train_speed(iter/s)": 0.068272
2330
+ },
2331
+ {
2332
+ "epoch": 0.7986501687289089,
2333
+ "grad_norm": 0.5034300959507061,
2334
+ "learning_rate": 1.065554578273113e-06,
2335
+ "loss": 1.606418991088867,
2336
+ "memory(GiB)": 69.47,
2337
+ "step": 1065,
2338
+ "token_acc": 0.5734419887017269,
2339
+ "train_speed(iter/s)": 0.068313
2340
+ },
2341
+ {
2342
+ "epoch": 0.8023997000374953,
2343
+ "grad_norm": 0.45462008885394545,
2344
+ "learning_rate": 1.0275752416949291e-06,
2345
+ "loss": 1.6223371505737305,
2346
+ "memory(GiB)": 69.47,
2347
+ "step": 1070,
2348
+ "token_acc": 0.591089095797518,
2349
+ "train_speed(iter/s)": 0.068354
2350
+ },
2351
+ {
2352
+ "epoch": 0.8061492313460817,
2353
+ "grad_norm": 0.5314761160291909,
2354
+ "learning_rate": 9.902074417122233e-07,
2355
+ "loss": 1.603017807006836,
2356
+ "memory(GiB)": 69.47,
2357
+ "step": 1075,
2358
+ "token_acc": 0.5886109380502208,
2359
+ "train_speed(iter/s)": 0.068393
2360
+ },
2361
+ {
2362
+ "epoch": 0.8098987626546682,
2363
+ "grad_norm": 0.5248943174736966,
2364
+ "learning_rate": 9.534569309266395e-07,
2365
+ "loss": 1.5781710624694825,
2366
+ "memory(GiB)": 69.47,
2367
+ "step": 1080,
2368
+ "token_acc": 0.5995606920789402,
2369
+ "train_speed(iter/s)": 0.068431
2370
+ },
2371
+ {
2372
+ "epoch": 0.8136482939632546,
2373
+ "grad_norm": 0.515744567059207,
2374
+ "learning_rate": 9.173293669109728e-07,
2375
+ "loss": 1.6221351623535156,
2376
+ "memory(GiB)": 69.47,
2377
+ "step": 1085,
2378
+ "token_acc": 0.5510625879767913,
2379
+ "train_speed(iter/s)": 0.068471
2380
+ },
2381
+ {
2382
+ "epoch": 0.817397825271841,
2383
+ "grad_norm": 0.4846549490970001,
2384
+ "learning_rate": 8.818303113382176e-07,
2385
+ "loss": 1.5834909439086915,
2386
+ "memory(GiB)": 69.47,
2387
+ "step": 1090,
2388
+ "token_acc": 0.5919089498189343,
2389
+ "train_speed(iter/s)": 0.06851
2390
+ },
2391
+ {
2392
+ "epoch": 0.8211473565804275,
2393
+ "grad_norm": 0.4513353191128082,
2394
+ "learning_rate": 8.46965229125371e-07,
2395
+ "loss": 1.6127058029174806,
2396
+ "memory(GiB)": 69.47,
2397
+ "step": 1095,
2398
+ "token_acc": 0.6122484158559205,
2399
+ "train_speed(iter/s)": 0.068551
2400
+ },
2401
+ {
2402
+ "epoch": 0.8248968878890138,
2403
+ "grad_norm": 0.4544425755483677,
2404
+ "learning_rate": 8.127394875921401e-07,
2405
+ "loss": 1.607330322265625,
2406
+ "memory(GiB)": 69.47,
2407
+ "step": 1100,
2408
+ "token_acc": 0.6195696090184728,
2409
+ "train_speed(iter/s)": 0.068591
2410
+ },
2411
+ {
2412
+ "epoch": 0.8248968878890138,
2413
+ "eval_loss": 1.6152210235595703,
2414
+ "eval_runtime": 42.4585,
2415
+ "eval_samples_per_second": 60.859,
2416
+ "eval_steps_per_second": 1.272,
2417
+ "eval_token_acc": 0.5902661096809664,
2418
+ "step": 1100
2419
+ },
2420
+ {
2421
+ "epoch": 0.8286464191976003,
2422
+ "grad_norm": 0.4699375444459336,
2423
+ "learning_rate": 7.791583556346577e-07,
2424
+ "loss": 1.6486019134521483,
2425
+ "memory(GiB)": 69.47,
2426
+ "step": 1105,
2427
+ "token_acc": 0.5841665130220646,
2428
+ "train_speed(iter/s)": 0.068244
2429
+ },
2430
+ {
2431
+ "epoch": 0.8323959505061868,
2432
+ "grad_norm": 0.4655325816892184,
2433
+ "learning_rate": 7.46227002914367e-07,
2434
+ "loss": 1.6185653686523438,
2435
+ "memory(GiB)": 69.47,
2436
+ "step": 1110,
2437
+ "token_acc": 0.5508876719131502,
2438
+ "train_speed(iter/s)": 0.068285
2439
+ },
2440
+ {
2441
+ "epoch": 0.8361454818147731,
2442
+ "grad_norm": 0.443815549611747,
2443
+ "learning_rate": 7.139504990621754e-07,
2444
+ "loss": 1.6039731979370118,
2445
+ "memory(GiB)": 69.47,
2446
+ "step": 1115,
2447
+ "token_acc": 0.5742345033607169,
2448
+ "train_speed(iter/s)": 0.068324
2449
+ },
2450
+ {
2451
+ "epoch": 0.8398950131233596,
2452
+ "grad_norm": 0.47193117910563975,
2453
+ "learning_rate": 6.82333812898005e-07,
2454
+ "loss": 1.6046607971191407,
2455
+ "memory(GiB)": 69.47,
2456
+ "step": 1120,
2457
+ "token_acc": 0.5756778822186418,
2458
+ "train_speed(iter/s)": 0.068363
2459
+ },
2460
+ {
2461
+ "epoch": 0.843644544431946,
2462
+ "grad_norm": 0.5016745684818081,
2463
+ "learning_rate": 6.513818116658671e-07,
2464
+ "loss": 1.6025604248046874,
2465
+ "memory(GiB)": 69.47,
2466
+ "step": 1125,
2467
+ "token_acc": 0.5735500039494754,
2468
+ "train_speed(iter/s)": 0.0684
2469
+ },
2470
+ {
2471
+ "epoch": 0.8473940757405324,
2472
+ "grad_norm": 0.5220705510390201,
2473
+ "learning_rate": 6.210992602845722e-07,
2474
+ "loss": 1.5999751091003418,
2475
+ "memory(GiB)": 69.47,
2476
+ "step": 1130,
2477
+ "token_acc": 0.5666828322017459,
2478
+ "train_speed(iter/s)": 0.068438
2479
+ },
2480
+ {
2481
+ "epoch": 0.8511436070491188,
2482
+ "grad_norm": 0.4658872880073119,
2483
+ "learning_rate": 5.914908206141956e-07,
2484
+ "loss": 1.6399084091186524,
2485
+ "memory(GiB)": 69.47,
2486
+ "step": 1135,
2487
+ "token_acc": 0.5555669822893787,
2488
+ "train_speed(iter/s)": 0.068473
2489
+ },
2490
+ {
2491
+ "epoch": 0.8548931383577053,
2492
+ "grad_norm": 0.49112601151308977,
2493
+ "learning_rate": 5.625610507383988e-07,
2494
+ "loss": 1.6437812805175782,
2495
+ "memory(GiB)": 69.47,
2496
+ "step": 1140,
2497
+ "token_acc": 0.571141520165108,
2498
+ "train_speed(iter/s)": 0.068511
2499
+ },
2500
+ {
2501
+ "epoch": 0.8586426696662918,
2502
+ "grad_norm": 0.4822913125455344,
2503
+ "learning_rate": 5.343144042627391e-07,
2504
+ "loss": 1.621147918701172,
2505
+ "memory(GiB)": 69.47,
2506
+ "step": 1145,
2507
+ "token_acc": 0.5925554632407973,
2508
+ "train_speed(iter/s)": 0.068547
2509
+ },
2510
+ {
2511
+ "epoch": 0.8623922009748781,
2512
+ "grad_norm": 0.435848921973967,
2513
+ "learning_rate": 5.06755229629054e-07,
2514
+ "loss": 1.5979487419128418,
2515
+ "memory(GiB)": 69.47,
2516
+ "step": 1150,
2517
+ "token_acc": 0.6014443218658418,
2518
+ "train_speed(iter/s)": 0.068584
2519
+ },
2520
+ {
2521
+ "epoch": 0.8623922009748781,
2522
+ "eval_loss": 1.613835096359253,
2523
+ "eval_runtime": 42.8701,
2524
+ "eval_samples_per_second": 60.275,
2525
+ "eval_steps_per_second": 1.26,
2526
+ "eval_token_acc": 0.5905893996338843,
2527
+ "step": 1150
2528
+ },
2529
+ {
2530
+ "epoch": 0.8661417322834646,
2531
+ "grad_norm": 0.46089404385561655,
2532
+ "learning_rate": 4.798877694460424e-07,
2533
+ "loss": 1.6086740493774414,
2534
+ "memory(GiB)": 69.47,
2535
+ "step": 1155,
2536
+ "token_acc": 0.5848029286936108,
2537
+ "train_speed(iter/s)": 0.068246
2538
+ },
2539
+ {
2540
+ "epoch": 0.869891263592051,
2541
+ "grad_norm": 0.42872964695512505,
2542
+ "learning_rate": 4.5371615983612947e-07,
2543
+ "loss": 1.6226417541503906,
2544
+ "memory(GiB)": 69.47,
2545
+ "step": 1160,
2546
+ "token_acc": 0.590493341547794,
2547
+ "train_speed(iter/s)": 0.068282
2548
+ },
2549
+ {
2550
+ "epoch": 0.8736407949006374,
2551
+ "grad_norm": 0.43701593339026695,
2552
+ "learning_rate": 4.282444297987359e-07,
2553
+ "loss": 1.61004638671875,
2554
+ "memory(GiB)": 69.47,
2555
+ "step": 1165,
2556
+ "token_acc": 0.5865000985543102,
2557
+ "train_speed(iter/s)": 0.06832
2558
+ },
2559
+ {
2560
+ "epoch": 0.8773903262092239,
2561
+ "grad_norm": 0.47705938510535956,
2562
+ "learning_rate": 4.03476500590021e-07,
2563
+ "loss": 1.5912567138671876,
2564
+ "memory(GiB)": 69.47,
2565
+ "step": 1170,
2566
+ "token_acc": 0.612401185770751,
2567
+ "train_speed(iter/s)": 0.068355
2568
+ },
2569
+ {
2570
+ "epoch": 0.8811398575178103,
2571
+ "grad_norm": 0.4892608214978031,
2572
+ "learning_rate": 3.794161851192374e-07,
2573
+ "loss": 1.61513671875,
2574
+ "memory(GiB)": 69.47,
2575
+ "step": 1175,
2576
+ "token_acc": 0.5986275418415744,
2577
+ "train_speed(iter/s)": 0.068391
2578
+ },
2579
+ {
2580
+ "epoch": 0.8848893888263967,
2581
+ "grad_norm": 0.4619591737564222,
2582
+ "learning_rate": 3.560671873617405e-07,
2583
+ "loss": 1.605311965942383,
2584
+ "memory(GiB)": 69.47,
2585
+ "step": 1180,
2586
+ "token_acc": 0.5807709544290082,
2587
+ "train_speed(iter/s)": 0.068428
2588
+ },
2589
+ {
2590
+ "epoch": 0.8886389201349831,
2591
+ "grad_norm": 0.4929784580597144,
2592
+ "learning_rate": 3.334331017887837e-07,
2593
+ "loss": 1.6141300201416016,
2594
+ "memory(GiB)": 69.47,
2595
+ "step": 1185,
2596
+ "token_acc": 0.5794582531848332,
2597
+ "train_speed(iter/s)": 0.068462
2598
+ },
2599
+ {
2600
+ "epoch": 0.8923884514435696,
2601
+ "grad_norm": 0.4359896715792382,
2602
+ "learning_rate": 3.1151741281416236e-07,
2603
+ "loss": 1.6315603256225586,
2604
+ "memory(GiB)": 69.47,
2605
+ "step": 1190,
2606
+ "token_acc": 0.6129624092848794,
2607
+ "train_speed(iter/s)": 0.068499
2608
+ },
2609
+ {
2610
+ "epoch": 0.896137982752156,
2611
+ "grad_norm": 0.4199860900705256,
2612
+ "learning_rate": 2.903234942578081e-07,
2613
+ "loss": 1.6342578887939454,
2614
+ "memory(GiB)": 69.47,
2615
+ "step": 1195,
2616
+ "token_acc": 0.5895935653783188,
2617
+ "train_speed(iter/s)": 0.068535
2618
+ },
2619
+ {
2620
+ "epoch": 0.8998875140607424,
2621
+ "grad_norm": 0.42147511115613456,
2622
+ "learning_rate": 2.698546088264009e-07,
2623
+ "loss": 1.5956247329711915,
2624
+ "memory(GiB)": 69.47,
2625
+ "step": 1200,
2626
+ "token_acc": 0.6071841830830988,
2627
+ "train_speed(iter/s)": 0.068572
2628
+ },
2629
+ {
2630
+ "epoch": 0.8998875140607424,
2631
+ "eval_loss": 1.6130276918411255,
2632
+ "eval_runtime": 42.5246,
2633
+ "eval_samples_per_second": 60.765,
2634
+ "eval_steps_per_second": 1.27,
2635
+ "eval_token_acc": 0.5908747294950227,
2636
+ "step": 1200
2637
+ },
2638
+ {
2639
+ "epoch": 0.9036370453693289,
2640
+ "grad_norm": 0.42547933706404784,
2641
+ "learning_rate": 2.5011390761109424e-07,
2642
+ "loss": 1.6252113342285157,
2643
+ "memory(GiB)": 69.47,
2644
+ "step": 1205,
2645
+ "token_acc": 0.591078917410982,
2646
+ "train_speed(iter/s)": 0.068254
2647
+ },
2648
+ {
2649
+ "epoch": 0.9073865766779152,
2650
+ "grad_norm": 0.4405736679459605,
2651
+ "learning_rate": 2.3110442960241507e-07,
2652
+ "loss": 1.6035770416259765,
2653
+ "memory(GiB)": 69.47,
2654
+ "step": 1210,
2655
+ "token_acc": 0.5825880077861605,
2656
+ "train_speed(iter/s)": 0.068289
2657
+ },
2658
+ {
2659
+ "epoch": 0.9111361079865017,
2660
+ "grad_norm": 0.44012672892851296,
2661
+ "learning_rate": 2.1282910122243038e-07,
2662
+ "loss": 1.59132719039917,
2663
+ "memory(GiB)": 69.47,
2664
+ "step": 1215,
2665
+ "token_acc": 0.5708482310549674,
2666
+ "train_speed(iter/s)": 0.068324
2667
+ },
2668
+ {
2669
+ "epoch": 0.9148856392950881,
2670
+ "grad_norm": 0.436144469439609,
2671
+ "learning_rate": 1.9529073587423008e-07,
2672
+ "loss": 1.6214092254638672,
2673
+ "memory(GiB)": 69.47,
2674
+ "step": 1220,
2675
+ "token_acc": 0.5917138980903125,
2676
+ "train_speed(iter/s)": 0.06836
2677
+ },
2678
+ {
2679
+ "epoch": 0.9186351706036745,
2680
+ "grad_norm": 0.45595845007022057,
2681
+ "learning_rate": 1.7849203350882415e-07,
2682
+ "loss": 1.595216941833496,
2683
+ "memory(GiB)": 69.47,
2684
+ "step": 1225,
2685
+ "token_acc": 0.5995736316066336,
2686
+ "train_speed(iter/s)": 0.068394
2687
+ },
2688
+ {
2689
+ "epoch": 0.922384701912261,
2690
+ "grad_norm": 0.4354484890078837,
2691
+ "learning_rate": 1.6243558020949345e-07,
2692
+ "loss": 1.6196565628051758,
2693
+ "memory(GiB)": 69.47,
2694
+ "step": 1230,
2695
+ "token_acc": 0.5604222422568806,
2696
+ "train_speed(iter/s)": 0.068427
2697
+ },
2698
+ {
2699
+ "epoch": 0.9261342332208474,
2700
+ "grad_norm": 0.43528308145508676,
2701
+ "learning_rate": 1.471238477936765e-07,
2702
+ "loss": 1.5957447052001954,
2703
+ "memory(GiB)": 69.47,
2704
+ "step": 1235,
2705
+ "token_acc": 0.6027920480891877,
2706
+ "train_speed(iter/s)": 0.068463
2707
+ },
2708
+ {
2709
+ "epoch": 0.9298837645294338,
2710
+ "grad_norm": 0.4143657458995663,
2711
+ "learning_rate": 1.3255919343244105e-07,
2712
+ "loss": 1.5905607223510743,
2713
+ "memory(GiB)": 69.47,
2714
+ "step": 1240,
2715
+ "token_acc": 0.5880818473790528,
2716
+ "train_speed(iter/s)": 0.068497
2717
+ },
2718
+ {
2719
+ "epoch": 0.9336332958380202,
2720
+ "grad_norm": 0.4299846489431254,
2721
+ "learning_rate": 1.1874385928761112e-07,
2722
+ "loss": 1.598089599609375,
2723
+ "memory(GiB)": 69.47,
2724
+ "step": 1245,
2725
+ "token_acc": 0.5968487409559421,
2726
+ "train_speed(iter/s)": 0.068531
2727
+ },
2728
+ {
2729
+ "epoch": 0.9373828271466067,
2730
+ "grad_norm": 0.428143089845641,
2731
+ "learning_rate": 1.0567997216659576e-07,
2732
+ "loss": 1.5996816635131836,
2733
+ "memory(GiB)": 69.47,
2734
+ "step": 1250,
2735
+ "token_acc": 0.5944731717144583,
2736
+ "train_speed(iter/s)": 0.068566
2737
+ },
2738
+ {
2739
+ "epoch": 0.9373828271466067,
2740
+ "eval_loss": 1.6125571727752686,
2741
+ "eval_runtime": 42.1918,
2742
+ "eval_samples_per_second": 61.244,
2743
+ "eval_steps_per_second": 1.28,
2744
+ "eval_token_acc": 0.5909382759470073,
2745
+ "step": 1250
2746
+ },
2747
+ {
2748
+ "epoch": 0.941132358455193,
2749
+ "grad_norm": 0.4502688717392391,
2750
+ "learning_rate": 9.336954319497716e-08,
2751
+ "loss": 1.5971858978271485,
2752
+ "memory(GiB)": 69.47,
2753
+ "step": 1255,
2754
+ "token_acc": 0.5895391353585584,
2755
+ "train_speed(iter/s)": 0.068262
2756
+ },
2757
+ {
2758
+ "epoch": 0.9448818897637795,
2759
+ "grad_norm": 0.4344596223428801,
2760
+ "learning_rate": 8.181446750690658e-08,
2761
+ "loss": 1.6137876510620117,
2762
+ "memory(GiB)": 69.47,
2763
+ "step": 1260,
2764
+ "token_acc": 0.5668742586002372,
2765
+ "train_speed(iter/s)": 0.068299
2766
+ },
2767
+ {
2768
+ "epoch": 0.948631421072366,
2769
+ "grad_norm": 0.45590977674430777,
2770
+ "learning_rate": 7.101652395335779e-08,
2771
+ "loss": 1.6049671173095703,
2772
+ "memory(GiB)": 69.47,
2773
+ "step": 1265,
2774
+ "token_acc": 0.6246026139173437,
2775
+ "train_speed(iter/s)": 0.068333
2776
+ },
2777
+ {
2778
+ "epoch": 0.9523809523809523,
2779
+ "grad_norm": 0.4295730317443299,
2780
+ "learning_rate": 6.097737482827826e-08,
2781
+ "loss": 1.60078125,
2782
+ "memory(GiB)": 69.47,
2783
+ "step": 1270,
2784
+ "token_acc": 0.6323105986580678,
2785
+ "train_speed(iter/s)": 0.068365
2786
+ },
2787
+ {
2788
+ "epoch": 0.9561304836895388,
2789
+ "grad_norm": 0.4265511315819114,
2790
+ "learning_rate": 5.169856561269171e-08,
2791
+ "loss": 1.6046745300292968,
2792
+ "memory(GiB)": 69.47,
2793
+ "step": 1275,
2794
+ "token_acc": 0.5845875477479601,
2795
+ "train_speed(iter/s)": 0.0684
2796
+ },
2797
+ {
2798
+ "epoch": 0.9598800149981253,
2799
+ "grad_norm": 0.44161345011158165,
2800
+ "learning_rate": 4.3181524736773394e-08,
2801
+ "loss": 1.6101186752319336,
2802
+ "memory(GiB)": 69.47,
2803
+ "step": 1280,
2804
+ "token_acc": 0.6039562611800572,
2805
+ "train_speed(iter/s)": 0.068433
2806
+ },
2807
+ {
2808
+ "epoch": 0.9636295463067117,
2809
+ "grad_norm": 0.4040990948400036,
2810
+ "learning_rate": 3.54275633599549e-08,
2811
+ "loss": 1.5941335678100585,
2812
+ "memory(GiB)": 69.47,
2813
+ "step": 1285,
2814
+ "token_acc": 0.5708919115077552,
2815
+ "train_speed(iter/s)": 0.068466
2816
+ },
2817
+ {
2818
+ "epoch": 0.9673790776152981,
2819
+ "grad_norm": 0.4137779647306755,
2820
+ "learning_rate": 2.8437875169070595e-08,
2821
+ "loss": 1.5992546081542969,
2822
+ "memory(GiB)": 69.47,
2823
+ "step": 1290,
2824
+ "token_acc": 0.5816298270170738,
2825
+ "train_speed(iter/s)": 0.068501
2826
+ },
2827
+ {
2828
+ "epoch": 0.9711286089238845,
2829
+ "grad_norm": 0.414717051427377,
2830
+ "learning_rate": 2.2213536194601314e-08,
2831
+ "loss": 1.6082378387451173,
2832
+ "memory(GiB)": 69.47,
2833
+ "step": 1295,
2834
+ "token_acc": 0.5427460601272402,
2835
+ "train_speed(iter/s)": 0.068535
2836
+ },
2837
+ {
2838
+ "epoch": 0.974878140232471,
2839
+ "grad_norm": 0.4389048585868221,
2840
+ "learning_rate": 1.6755504645021292e-08,
2841
+ "loss": 1.5763816833496094,
2842
+ "memory(GiB)": 69.47,
2843
+ "step": 1300,
2844
+ "token_acc": 0.5868564953189289,
2845
+ "train_speed(iter/s)": 0.068567
2846
+ },
2847
+ {
2848
+ "epoch": 0.974878140232471,
2849
+ "eval_loss": 1.6123632192611694,
2850
+ "eval_runtime": 42.645,
2851
+ "eval_samples_per_second": 60.593,
2852
+ "eval_steps_per_second": 1.266,
2853
+ "eval_token_acc": 0.5910751210292512,
2854
+ "step": 1300
2855
+ }
2856
+ ],
2857
+ "logging_steps": 5,
2858
+ "max_steps": 1333,
2859
+ "num_input_tokens_seen": 0,
2860
+ "num_train_epochs": 1,
2861
+ "save_steps": 50,
2862
+ "stateful_callbacks": {
2863
+ "TrainerControl": {
2864
+ "args": {
2865
+ "should_epoch_stop": false,
2866
+ "should_evaluate": false,
2867
+ "should_log": false,
2868
+ "should_save": true,
2869
+ "should_training_stop": false
2870
+ },
2871
+ "attributes": {}
2872
+ }
2873
+ },
2874
+ "total_flos": 1988579831316480.0,
2875
+ "train_batch_size": 2,
2876
+ "trial_name": null,
2877
+ "trial_params": null
2878
+ }