huiting tang commited on
Commit
9edca3c
·
verified ·
1 Parent(s): 625720e

Add files using upload-large-folder tool

Browse files
c2/architecture.txt ADDED
@@ -0,0 +1,300 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ === Raw Model ===
2
+ GPT(
3
+ (transformer): ModuleDict(
4
+ (drop): Dropout(p=0.0, inplace=False)
5
+ (h): ModuleList(
6
+ (0-17): 18 x Block(
7
+ (ln_1): RMSNorm()
8
+ (attn): CausalSelfAttention(
9
+ (rotary): RotaryEmbedding()
10
+ (q_proj): Linear(in_features=320, out_features=320, bias=False)
11
+ (k_proj): Linear(in_features=320, out_features=64, bias=False)
12
+ (v_proj): Linear(in_features=320, out_features=64, bias=False)
13
+ (c_proj): Linear(in_features=320, out_features=320, bias=False)
14
+ (resid_dropout): Dropout(p=0.0, inplace=False)
15
+ )
16
+ (ln_2): RMSNorm()
17
+ (mlp): MLP(
18
+ (c_fc): Linear(in_features=320, out_features=2048, bias=False)
19
+ (c_proj): Linear(in_features=1024, out_features=320, bias=False)
20
+ (dropout): Dropout(p=0.0, inplace=False)
21
+ )
22
+ )
23
+ )
24
+ (ln_f): RMSNorm()
25
+ (wte): Embedding(50304, 320)
26
+ )
27
+ (lm_head): Linear(in_features=320, out_features=50304, bias=False)
28
+ )
29
+
30
+ === Forward Summary (torchinfo, uncompiled model) ===
31
+ ====================================================================================================
32
+ Layer (type:depth-idx) Output Shape Param #
33
+ ====================================================================================================
34
+ GPT [1, 1, 50304] --
35
+ ├─ModuleDict: 1-1 -- --
36
+ │ └─Embedding: 2-1 [1, 1024, 320] 16,097,280
37
+ │ └─Dropout: 2-2 [1, 1024, 320] --
38
+ │ └─ModuleList: 2-3 -- --
39
+ │ │ └─Block: 3-1 [1, 1024, 320] --
40
+ │ │ │ └─RMSNorm: 4-1 [1, 1024, 320] 320
41
+ │ │ │ └─CausalSelfAttention: 4-2 [1, 1024, 320] --
42
+ │ │ │ │ └─Linear: 5-1 [1, 1024, 320] 102,400
43
+ │ │ │ │ └─Linear: 5-2 [1, 1024, 64] 20,480
44
+ │ │ │ │ └─Linear: 5-3 [1, 1024, 64] 20,480
45
+ │ │ │ │ └─RotaryEmbedding: 5-4 [1, 1, 1024, 64] --
46
+ │ │ │ │ └─Linear: 5-5 [1, 1024, 320] 102,400
47
+ │ │ │ │ └─Dropout: 5-6 [1, 1024, 320] --
48
+ │ │ │ └─RMSNorm: 4-3 [1, 1024, 320] 320
49
+ │ │ │ └─MLP: 4-4 [1, 1024, 320] --
50
+ │ │ │ │ └─Linear: 5-7 [1, 1024, 2048] 655,360
51
+ │ │ │ │ └─Linear: 5-8 [1, 1024, 320] 327,680
52
+ │ │ │ │ └─Dropout: 5-9 [1, 1024, 320] --
53
+ │ │ └─Block: 3-2 [1, 1024, 320] --
54
+ │ │ │ └─RMSNorm: 4-5 [1, 1024, 320] 320
55
+ │ │ │ └─CausalSelfAttention: 4-6 [1, 1024, 320] --
56
+ │ │ │ │ └─Linear: 5-10 [1, 1024, 320] 102,400
57
+ │ │ │ │ └─Linear: 5-11 [1, 1024, 64] 20,480
58
+ │ │ │ │ └─Linear: 5-12 [1, 1024, 64] 20,480
59
+ │ │ │ │ └─RotaryEmbedding: 5-13 [1, 1, 1024, 64] --
60
+ │ │ │ │ └─Linear: 5-14 [1, 1024, 320] 102,400
61
+ │ │ │ │ └─Dropout: 5-15 [1, 1024, 320] --
62
+ │ │ │ └─RMSNorm: 4-7 [1, 1024, 320] 320
63
+ │ │ │ └─MLP: 4-8 [1, 1024, 320] --
64
+ │ │ │ │ └─Linear: 5-16 [1, 1024, 2048] 655,360
65
+ │ │ │ │ └─Linear: 5-17 [1, 1024, 320] 327,680
66
+ │ │ │ │ └─Dropout: 5-18 [1, 1024, 320] --
67
+ │ │ └─Block: 3-3 [1, 1024, 320] --
68
+ │ │ │ └─RMSNorm: 4-9 [1, 1024, 320] 320
69
+ │ │ │ └─CausalSelfAttention: 4-10 [1, 1024, 320] --
70
+ │ │ │ │ └─Linear: 5-19 [1, 1024, 320] 102,400
71
+ │ │ │ │ └─Linear: 5-20 [1, 1024, 64] 20,480
72
+ │ │ │ │ └─Linear: 5-21 [1, 1024, 64] 20,480
73
+ │ │ │ │ └─RotaryEmbedding: 5-22 [1, 1, 1024, 64] --
74
+ │ │ │ │ └─Linear: 5-23 [1, 1024, 320] 102,400
75
+ │ │ │ │ └─Dropout: 5-24 [1, 1024, 320] --
76
+ │ │ │ └─RMSNorm: 4-11 [1, 1024, 320] 320
77
+ │ │ │ └─MLP: 4-12 [1, 1024, 320] --
78
+ │ │ │ │ └─Linear: 5-25 [1, 1024, 2048] 655,360
79
+ │ │ │ │ └─Linear: 5-26 [1, 1024, 320] 327,680
80
+ │ │ │ │ └─Dropout: 5-27 [1, 1024, 320] --
81
+ │ │ └─Block: 3-4 [1, 1024, 320] --
82
+ │ │ │ └─RMSNorm: 4-13 [1, 1024, 320] 320
83
+ │ │ │ └─CausalSelfAttention: 4-14 [1, 1024, 320] --
84
+ │ │ │ │ └─Linear: 5-28 [1, 1024, 320] 102,400
85
+ │ │ │ │ └─Linear: 5-29 [1, 1024, 64] 20,480
86
+ │ │ │ │ └─Linear: 5-30 [1, 1024, 64] 20,480
87
+ │ │ │ │ └─RotaryEmbedding: 5-31 [1, 1, 1024, 64] --
88
+ │ │ │ │ └─Linear: 5-32 [1, 1024, 320] 102,400
89
+ │ │ │ │ └─Dropout: 5-33 [1, 1024, 320] --
90
+ │ │ │ └─RMSNorm: 4-15 [1, 1024, 320] 320
91
+ │ │ │ └─MLP: 4-16 [1, 1024, 320] --
92
+ │ │ │ │ └─Linear: 5-34 [1, 1024, 2048] 655,360
93
+ │ │ │ │ └─Linear: 5-35 [1, 1024, 320] 327,680
94
+ │ │ │ │ └─Dropout: 5-36 [1, 1024, 320] --
95
+ │ │ └─Block: 3-5 [1, 1024, 320] --
96
+ │ │ │ └─RMSNorm: 4-17 [1, 1024, 320] 320
97
+ │ │ │ └─CausalSelfAttention: 4-18 [1, 1024, 320] --
98
+ │ │ │ │ └─Linear: 5-37 [1, 1024, 320] 102,400
99
+ │ │ │ │ └─Linear: 5-38 [1, 1024, 64] 20,480
100
+ │ │ │ │ └─Linear: 5-39 [1, 1024, 64] 20,480
101
+ │ │ │ │ └─RotaryEmbedding: 5-40 [1, 1, 1024, 64] --
102
+ │ │ │ │ └─Linear: 5-41 [1, 1024, 320] 102,400
103
+ │ │ │ │ └─Dropout: 5-42 [1, 1024, 320] --
104
+ │ │ │ └─RMSNorm: 4-19 [1, 1024, 320] 320
105
+ │ │ │ └─MLP: 4-20 [1, 1024, 320] --
106
+ │ │ │ │ └─Linear: 5-43 [1, 1024, 2048] 655,360
107
+ │ │ │ │ └─Linear: 5-44 [1, 1024, 320] 327,680
108
+ │ │ │ │ └─Dropout: 5-45 [1, 1024, 320] --
109
+ │ │ └─Block: 3-6 [1, 1024, 320] --
110
+ │ │ │ └─RMSNorm: 4-21 [1, 1024, 320] 320
111
+ │ │ │ └─CausalSelfAttention: 4-22 [1, 1024, 320] --
112
+ │ │ │ │ └─Linear: 5-46 [1, 1024, 320] 102,400
113
+ │ │ │ │ └─Linear: 5-47 [1, 1024, 64] 20,480
114
+ │ │ │ │ └─Linear: 5-48 [1, 1024, 64] 20,480
115
+ │ │ │ │ └─RotaryEmbedding: 5-49 [1, 1, 1024, 64] --
116
+ │ │ │ │ └─Linear: 5-50 [1, 1024, 320] 102,400
117
+ │ │ │ │ └─Dropout: 5-51 [1, 1024, 320] --
118
+ │ │ │ └─RMSNorm: 4-23 [1, 1024, 320] 320
119
+ │ │ │ └─MLP: 4-24 [1, 1024, 320] --
120
+ │ │ │ │ └─Linear: 5-52 [1, 1024, 2048] 655,360
121
+ │ │ │ │ └─Linear: 5-53 [1, 1024, 320] 327,680
122
+ │ │ │ │ └─Dropout: 5-54 [1, 1024, 320] --
123
+ │ │ └─Block: 3-7 [1, 1024, 320] --
124
+ │ │ │ └─RMSNorm: 4-25 [1, 1024, 320] 320
125
+ │ │ │ └─CausalSelfAttention: 4-26 [1, 1024, 320] --
126
+ │ │ │ │ └─Linear: 5-55 [1, 1024, 320] 102,400
127
+ │ │ │ │ └─Linear: 5-56 [1, 1024, 64] 20,480
128
+ │ │ │ ��� └─Linear: 5-57 [1, 1024, 64] 20,480
129
+ │ │ │ │ └─RotaryEmbedding: 5-58 [1, 1, 1024, 64] --
130
+ │ │ │ │ └─Linear: 5-59 [1, 1024, 320] 102,400
131
+ │ │ │ │ └─Dropout: 5-60 [1, 1024, 320] --
132
+ │ │ │ └─RMSNorm: 4-27 [1, 1024, 320] 320
133
+ │ │ │ └─MLP: 4-28 [1, 1024, 320] --
134
+ │ │ │ │ └─Linear: 5-61 [1, 1024, 2048] 655,360
135
+ │ │ │ │ └─Linear: 5-62 [1, 1024, 320] 327,680
136
+ │ │ │ │ └─Dropout: 5-63 [1, 1024, 320] --
137
+ │ │ └─Block: 3-8 [1, 1024, 320] --
138
+ │ │ │ └─RMSNorm: 4-29 [1, 1024, 320] 320
139
+ │ │ │ └─CausalSelfAttention: 4-30 [1, 1024, 320] --
140
+ │ │ │ │ └─Linear: 5-64 [1, 1024, 320] 102,400
141
+ │ │ │ │ └─Linear: 5-65 [1, 1024, 64] 20,480
142
+ │ │ │ │ └─Linear: 5-66 [1, 1024, 64] 20,480
143
+ │ │ │ │ └─RotaryEmbedding: 5-67 [1, 1, 1024, 64] --
144
+ │ │ │ │ └─Linear: 5-68 [1, 1024, 320] 102,400
145
+ │ │ │ │ └─Dropout: 5-69 [1, 1024, 320] --
146
+ │ │ │ └─RMSNorm: 4-31 [1, 1024, 320] 320
147
+ │ │ │ └─MLP: 4-32 [1, 1024, 320] --
148
+ │ │ │ │ └─Linear: 5-70 [1, 1024, 2048] 655,360
149
+ │ │ │ │ └─Linear: 5-71 [1, 1024, 320] 327,680
150
+ │ │ │ │ └─Dropout: 5-72 [1, 1024, 320] --
151
+ │ │ └─Block: 3-9 [1, 1024, 320] --
152
+ │ │ │ └─RMSNorm: 4-33 [1, 1024, 320] 320
153
+ │ │ │ └─CausalSelfAttention: 4-34 [1, 1024, 320] --
154
+ │ │ │ │ └─Linear: 5-73 [1, 1024, 320] 102,400
155
+ │ │ │ │ └─Linear: 5-74 [1, 1024, 64] 20,480
156
+ │ │ │ │ └─Linear: 5-75 [1, 1024, 64] 20,480
157
+ │ │ │ │ └─RotaryEmbedding: 5-76 [1, 1, 1024, 64] --
158
+ │ │ │ │ └─Linear: 5-77 [1, 1024, 320] 102,400
159
+ │ │ │ │ └─Dropout: 5-78 [1, 1024, 320] --
160
+ │ │ │ └─RMSNorm: 4-35 [1, 1024, 320] 320
161
+ │ │ │ └─MLP: 4-36 [1, 1024, 320] --
162
+ │ │ │ │ └─Linear: 5-79 [1, 1024, 2048] 655,360
163
+ │ │ │ │ └─Linear: 5-80 [1, 1024, 320] 327,680
164
+ │ │ │ │ └─Dropout: 5-81 [1, 1024, 320] --
165
+ │ │ └─Block: 3-10 [1, 1024, 320] --
166
+ │ │ │ └─RMSNorm: 4-37 [1, 1024, 320] 320
167
+ │ │ │ └─CausalSelfAttention: 4-38 [1, 1024, 320] --
168
+ │ │ │ │ └─Linear: 5-82 [1, 1024, 320] 102,400
169
+ │ │ │ │ └─Linear: 5-83 [1, 1024, 64] 20,480
170
+ │ │ │ │ └─Linear: 5-84 [1, 1024, 64] 20,480
171
+ │ │ │ │ └─RotaryEmbedding: 5-85 [1, 1, 1024, 64] --
172
+ │ │ │ │ └─Linear: 5-86 [1, 1024, 320] 102,400
173
+ │ │ │ │ └─Dropout: 5-87 [1, 1024, 320] --
174
+ │ │ │ └─RMSNorm: 4-39 [1, 1024, 320] 320
175
+ │ │ │ └─MLP: 4-40 [1, 1024, 320] --
176
+ │ │ │ │ └─Linear: 5-88 [1, 1024, 2048] 655,360
177
+ │ │ │ │ └─Linear: 5-89 [1, 1024, 320] 327,680
178
+ │ │ │ │ └─Dropout: 5-90 [1, 1024, 320] --
179
+ │ │ └─Block: 3-11 [1, 1024, 320] --
180
+ │ │ │ └─RMSNorm: 4-41 [1, 1024, 320] 320
181
+ │ │ │ └─CausalSelfAttention: 4-42 [1, 1024, 320] --
182
+ │ │ │ │ └─Linear: 5-91 [1, 1024, 320] 102,400
183
+ │ │ │ │ └─Linear: 5-92 [1, 1024, 64] 20,480
184
+ │ │ │ │ └─Linear: 5-93 [1, 1024, 64] 20,480
185
+ │ │ │ │ └─RotaryEmbedding: 5-94 [1, 1, 1024, 64] --
186
+ │ │ │ │ └─Linear: 5-95 [1, 1024, 320] 102,400
187
+ │ │ │ │ └─Dropout: 5-96 [1, 1024, 320] --
188
+ │ │ │ └─RMSNorm: 4-43 [1, 1024, 320] 320
189
+ │ │ │ └─MLP: 4-44 [1, 1024, 320] --
190
+ │ │ │ │ └─Linear: 5-97 [1, 1024, 2048] 655,360
191
+ │ │ │ │ └─Linear: 5-98 [1, 1024, 320] 327,680
192
+ │ │ │ │ └─Dropout: 5-99 [1, 1024, 320] --
193
+ │ │ └─Block: 3-12 [1, 1024, 320] --
194
+ │ │ │ └─RMSNorm: 4-45 [1, 1024, 320] 320
195
+ │ │ │ └─CausalSelfAttention: 4-46 [1, 1024, 320] --
196
+ │ │ │ │ └─Linear: 5-100 [1, 1024, 320] 102,400
197
+ │ │ │ │ └─Linear: 5-101 [1, 1024, 64] 20,480
198
+ │ │ │ │ └─Linear: 5-102 [1, 1024, 64] 20,480
199
+ │ │ │ │ └─RotaryEmbedding: 5-103 [1, 1, 1024, 64] --
200
+ │ │ │ │ └─Linear: 5-104 [1, 1024, 320] 102,400
201
+ │ │ │ │ └─Dropout: 5-105 [1, 1024, 320] --
202
+ │ │ │ └─RMSNorm: 4-47 [1, 1024, 320] 320
203
+ │ │ │ └─MLP: 4-48 [1, 1024, 320] --
204
+ │ │ │ │ └─Linear: 5-106 [1, 1024, 2048] 655,360
205
+ │ │ │ │ └─Linear: 5-107 [1, 1024, 320] 327,680
206
+ │ │ │ │ └─Dropout: 5-108 [1, 1024, 320] --
207
+ │ │ └─Block: 3-13 [1, 1024, 320] --
208
+ │ │ │ └─RMSNorm: 4-49 [1, 1024, 320] 320
209
+ │ │ │ └─CausalSelfAttention: 4-50 [1, 1024, 320] --
210
+ │ │ │ │ └─Linear: 5-109 [1, 1024, 320] 102,400
211
+ │ │ │ │ └─Linear: 5-110 [1, 1024, 64] 20,480
212
+ │ │ │ │ └─Linear: 5-111 [1, 1024, 64] 20,480
213
+ │ │ │ │ └─RotaryEmbedding: 5-112 [1, 1, 1024, 64] --
214
+ │ │ │ │ └─Linear: 5-113 [1, 1024, 320] 102,400
215
+ │ │ │ │ └─Dropout: 5-114 [1, 1024, 320] --
216
+ │ │ │ └─RMSNorm: 4-51 [1, 1024, 320] 320
217
+ │ │ │ └─MLP: 4-52 [1, 1024, 320] --
218
+ │ │ │ │ └─Linear: 5-115 [1, 1024, 2048] 655,360
219
+ │ │ │ │ └─Linear: 5-116 [1, 1024, 320] 327,680
220
+ │ │ │ │ └─Dropout: 5-117 [1, 1024, 320] --
221
+ │ │ └─Block: 3-14 [1, 1024, 320] --
222
+ │ │ │ └─RMSNorm: 4-53 [1, 1024, 320] 320
223
+ │ │ │ └─CausalSelfAttention: 4-54 [1, 1024, 320] --
224
+ │ │ │ │ └─Linear: 5-118 [1, 1024, 320] 102,400
225
+ │ │ │ │ └─Linear: 5-119 [1, 1024, 64] 20,480
226
+ │ │ │ │ └─Linear: 5-120 [1, 1024, 64] 20,480
227
+ │ │ │ │ └─RotaryEmbedding: 5-121 [1, 1, 1024, 64] --
228
+ │ │ │ │ └─Linear: 5-122 [1, 1024, 320] 102,400
229
+ │ │ │ │ └─Dropout: 5-123 [1, 1024, 320] --
230
+ │ │ │ └─RMSNorm: 4-55 [1, 1024, 320] 320
231
+ │ │ │ └─MLP: 4-56 [1, 1024, 320] --
232
+ │ │ │ │ └─Linear: 5-124 [1, 1024, 2048] 655,360
233
+ │ │ │ │ └─Linear: 5-125 [1, 1024, 320] 327,680
234
+ │ │ │ │ └─Dropout: 5-126 [1, 1024, 320] --
235
+ │ │ └─Block: 3-15 [1, 1024, 320] --
236
+ │ │ │ └─RMSNorm: 4-57 [1, 1024, 320] 320
237
+ │ │ │ └─CausalSelfAttention: 4-58 [1, 1024, 320] --
238
+ │ │ │ │ └─Linear: 5-127 [1, 1024, 320] 102,400
239
+ │ │ │ │ └─Linear: 5-128 [1, 1024, 64] 20,480
240
+ │ │ │ │ └─Linear: 5-129 [1, 1024, 64] 20,480
241
+ │ │ │ │ └─RotaryEmbedding: 5-130 [1, 1, 1024, 64] --
242
+ │ │ │ │ └─Linear: 5-131 [1, 1024, 320] 102,400
243
+ │ │ │ │ └─Dropout: 5-132 [1, 1024, 320] --
244
+ │ │ │ └─RMSNorm: 4-59 [1, 1024, 320] 320
245
+ │ │ │ └─MLP: 4-60 [1, 1024, 320] --
246
+ │ │ │ │ └─Linear: 5-133 [1, 1024, 2048] 655,360
247
+ │ │ │ │ └─Linear: 5-134 [1, 1024, 320] 327,680
248
+ │ │ │ │ └─Dropout: 5-135 [1, 1024, 320] --
249
+ │ │ └─Block: 3-16 [1, 1024, 320] --
250
+ │ │ │ └─RMSNorm: 4-61 [1, 1024, 320] 320
251
+ │ │ │ └─CausalSelfAttention: 4-62 [1, 1024, 320] --
252
+ │ │ │ │ └─Linear: 5-136 [1, 1024, 320] 102,400
253
+ │ │ │ │ └─Linear: 5-137 [1, 1024, 64] 20,480
254
+ │ │ │ │ └─Linear: 5-138 [1, 1024, 64] 20,480
255
+ │ │ │ │ └─RotaryEmbedding: 5-139 [1, 1, 1024, 64] --
256
+ │ │ │ │ └─Linear: 5-140 [1, 1024, 320] 102,400
257
+ │ │ │ │ └─Dropout: 5-141 [1, 1024, 320] --
258
+ │ │ │ └─RMSNorm: 4-63 [1, 1024, 320] 320
259
+ │ │ │ └─MLP: 4-64 [1, 1024, 320] --
260
+ │ │ │ │ └─Linear: 5-142 [1, 1024, 2048] 655,360
261
+ │ │ │ │ └─Linear: 5-143 [1, 1024, 320] 327,680
262
+ │ │ │ │ └─Dropout: 5-144 [1, 1024, 320] --
263
+ │ │ └─Block: 3-17 [1, 1024, 320] --
264
+ │ │ │ └─RMSNorm: 4-65 [1, 1024, 320] 320
265
+ │ │ │ └─CausalSelfAttention: 4-66 [1, 1024, 320] --
266
+ │ │ │ │ └─Linear: 5-145 [1, 1024, 320] 102,400
267
+ │ │ │ │ └─Linear: 5-146 [1, 1024, 64] 20,480
268
+ │ │ │ │ └─Linear: 5-147 [1, 1024, 64] 20,480
269
+ │ │ │ │ └─RotaryEmbedding: 5-148 [1, 1, 1024, 64] --
270
+ │ │ │ │ └─Linear: 5-149 [1, 1024, 320] 102,400
271
+ │ │ │ │ └─Dropout: 5-150 [1, 1024, 320] --
272
+ │ │ │ └─RMSNorm: 4-67 [1, 1024, 320] 320
273
+ │ │ │ └─MLP: 4-68 [1, 1024, 320] --
274
+ │ │ │ │ └─Linear: 5-151 [1, 1024, 2048] 655,360
275
+ │ │ │ │ └─Linear: 5-152 [1, 1024, 320] 327,680
276
+ │ │ │ │ └─Dropout: 5-153 [1, 1024, 320] --
277
+ │ │ └─Block: 3-18 [1, 1024, 320] --
278
+ │ │ │ └─RMSNorm: 4-69 [1, 1024, 320] 320
279
+ │ │ │ └─CausalSelfAttention: 4-70 [1, 1024, 320] --
280
+ │ │ │ │ └─Linear: 5-154 [1, 1024, 320] 102,400
281
+ │ │ │ │ └─Linear: 5-155 [1, 1024, 64] 20,480
282
+ │ │ │ │ └─Linear: 5-156 [1, 1024, 64] 20,480
283
+ │ │ │ │ └─RotaryEmbedding: 5-157 [1, 1, 1024, 64] --
284
+ │ │ │ │ └─Linear: 5-158 [1, 1024, 320] 102,400
285
+ │ │ │ │ └─Dropout: 5-159 [1, 1024, 320] --
286
+ │ │ │ └─RMSNorm: 4-71 [1, 1024, 320] 320
287
+ │ │ │ └─MLP: 4-72 [1, 1024, 320] --
288
+ │ │ │ │ └─Linear: 5-160 [1, 1024, 2048] 655,360
289
+ │ │ │ │ └─Linear: 5-161 [1, 1024, 320] 327,680
290
+ │ │ │ │ └─Dropout: 5-162 [1, 1024, 320] --
291
+ │ ��─RMSNorm: 2-4 [1, 1024, 320] 320
292
+ ├─Linear: 1-2 [1, 1, 50304] 16,097,280
293
+ ====================================================================================================
294
+
295
+ === Parameter Counts (unique tensors) ===
296
+ Total params: 38,227,520
297
+ Trainable params: 38,227,520
298
+ Weight tying (wte = lm_head): True
299
+ Embedding mode: standard tied token embedding
300
+ Note: module-level torchinfo totals may double-count the tied LM head; use the unique counts above.
c2/checkpoints/best_ckpt.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:293260075cdc50302f3a876f11d963b3ce0741ae8bba064e6b997433134f5e84
3
+ size 458922618
c2/checkpoints/ckpt_step0091000.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5468ead6c2b8a064e4c6c1430a25362a4e110590c47d486e34a963e239a3b009
3
+ size 458926734
c2/checkpoints/ckpt_step0091500.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:881312a4c5046818747a07ee58370202c34c7833a00fb35784d2f0a931357bf1
3
+ size 458926734
c2/checkpoints/ckpt_step0092000.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3898ad58d9ba4b4f5d56275d3a75f41d304ec8440549f56ac0092cc4add24a3
3
+ size 458926734
c2/checkpoints/ckpt_step0092500.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5b153e397953c84b90da7d7f64e9e375cd63dfadab34a2a7f1c3eb09b5d17395
3
+ size 458926734
c2/checkpoints/ckpt_step0092685.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:298d449e1817f467e3194879fbd50eef8ee2a64d3ce729420fbbdd03667d10b2
3
+ size 458926734
c2/config_snapshot.json ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "run": {
3
+ "name": "c2",
4
+ "artifacts_root": "artifacts",
5
+ "resume": true,
6
+ "deterministic": false
7
+ },
8
+ "distributed": {
9
+ "enabled": false,
10
+ "backend": "nccl"
11
+ },
12
+ "preprocessing": {
13
+ "data_dir": "data",
14
+ "processed_dir": "data/processed_OWT",
15
+ "log_dir": "logs/preprocessing",
16
+ "train_split": 0.9,
17
+ "dataset_name": "openwebtext",
18
+ "dataset_config_name": null,
19
+ "dataset_split": "train",
20
+ "dataset_text_column": "text",
21
+ "dataset_repo_id": "huiting123/processedOWT",
22
+ "num_proc": 4,
23
+ "tokenization_num_proc": 0,
24
+ "tokenization_batch_size": 1000,
25
+ "tokenization_chunk_size": 100000,
26
+ "shard_write_batch_size": 5000,
27
+ "seed": 42,
28
+ "subset_size": 0,
29
+ "raw_data_path": null,
30
+ "test_data_path": null,
31
+ "skip_language_filter": false,
32
+ "skip_repetition_filter": false,
33
+ "skip_quality_filter": false,
34
+ "min_words": 100,
35
+ "max_words": 10000,
36
+ "max_non_ascii": 0.3,
37
+ "min_line_uniqueness": 0.7,
38
+ "min_sentence_uniqueness": 0.8,
39
+ "max_train_tokens": 0
40
+ },
41
+ "model": {
42
+ "vocab_size": 50304,
43
+ "n_layers": 18,
44
+ "n_heads": 5,
45
+ "n_kv_heads": 1,
46
+ "n_embd": 320,
47
+ "embedding_dim": null,
48
+ "tie_embeddings": true,
49
+ "context_len": 1024,
50
+ "dropout": 0.0,
51
+ "bias": false,
52
+ "norm_type": "rmsnorm",
53
+ "norm_eps": 1e-05,
54
+ "positional_embedding": "rope",
55
+ "rope_theta": 10000.0,
56
+ "rope_fraction": 1.0,
57
+ "mlp_type": "swiglu",
58
+ "mlp_hidden_mult": 4.0,
59
+ "mlp_hidden_dim": 1024,
60
+ "qk_norm": false,
61
+ "block_style": "sequential"
62
+ },
63
+ "training": {
64
+ "seed": 0,
65
+ "learning_rate": 0.0012,
66
+ "min_lr": 0.00012,
67
+ "weight_decay": 0.03,
68
+ "beta1": 0.9,
69
+ "beta2": 0.95,
70
+ "grad_clip": 1.0,
71
+ "max_iters": 92685,
72
+ "warmup_steps": 116,
73
+ "lr_schedule": "wsd",
74
+ "wsd_stable_frac": 0.85,
75
+ "batch_size": 4,
76
+ "gradient_accumulation_steps": 16,
77
+ "dtype": "float16",
78
+ "device": "cuda",
79
+ "eval_step_interval": 500,
80
+ "eval_batches": 20,
81
+ "log_interval": 10,
82
+ "max_checkpoints": 5
83
+ },
84
+ "inference": {
85
+ "checkpoint": null,
86
+ "prompt": "",
87
+ "max_tokens": 100,
88
+ "temperature": 1.0,
89
+ "seed": 0,
90
+ "device": "auto",
91
+ "leaderboard": false
92
+ },
93
+ "post_training": {
94
+ "base_checkpoint": null,
95
+ "learning_rate": 1e-05,
96
+ "max_iters": 1000,
97
+ "checkpoint_dir": "checkpoints/post",
98
+ "log_dir": "logs/post"
99
+ },
100
+ "evaluation": {
101
+ "checkpoint": null,
102
+ "batch_size": 8,
103
+ "device": "auto",
104
+ "log_dir": "logs/evaluation"
105
+ },
106
+ "notifications": {
107
+ "enabled": false,
108
+ "smtp_host": "smtp.gmail.com",
109
+ "smtp_port": 587,
110
+ "smtp_user": "",
111
+ "to_addresses": [],
112
+ "cooldown_minutes": 5,
113
+ "periodic_status_hours": 4.0,
114
+ "disk_min_gb": 5.0
115
+ }
116
+ }
c2/eval_metrics.jsonl ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 12000, "epoch": 0, "val_loss": 3.9863175749778748, "val_ppl": 53.85620233572836, "is_best": false, "timestamp": "2026-05-04T20:59:59.504631"}
2
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 12500, "epoch": 0, "val_loss": 3.9993658542633055, "val_ppl": 54.5635378248214, "is_best": false, "timestamp": "2026-05-04T21:03:40.210750"}
3
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 13000, "epoch": 0, "val_loss": 3.8980185985565186, "val_ppl": 49.30465993335737, "is_best": false, "timestamp": "2026-05-04T21:07:20.539646"}
4
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 13500, "epoch": 0, "val_loss": 4.0108379364013675, "val_ppl": 55.193099499479665, "is_best": false, "timestamp": "2026-05-04T21:11:00.351397"}
5
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 14000, "epoch": 0, "val_loss": 3.996466362476349, "val_ppl": 54.4055604327834, "is_best": false, "timestamp": "2026-05-04T21:14:41.071787"}
6
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 14500, "epoch": 0, "val_loss": 3.812811541557312, "val_ppl": 45.277559820706834, "is_best": true, "timestamp": "2026-05-04T21:18:21.800499"}
7
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 15000, "epoch": 0, "val_loss": 3.832418644428253, "val_ppl": 46.17408197361166, "is_best": false, "timestamp": "2026-05-04T21:21:59.680798"}
8
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 15500, "epoch": 0, "val_loss": 3.8438140153884888, "val_ppl": 46.70326214225701, "is_best": false, "timestamp": "2026-05-04T21:25:35.963292"}
9
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 16000, "epoch": 0, "val_loss": 3.849898660182953, "val_ppl": 46.98830120444446, "is_best": false, "timestamp": "2026-05-04T21:29:11.938730"}
10
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 16500, "epoch": 0, "val_loss": 3.909795415401459, "val_ppl": 49.88874446053632, "is_best": false, "timestamp": "2026-05-04T21:32:49.678245"}
11
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 17000, "epoch": 0, "val_loss": 3.8968958020210267, "val_ppl": 49.24933189887585, "is_best": false, "timestamp": "2026-05-04T21:36:27.607090"}
12
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 17500, "epoch": 0, "val_loss": 4.013467276096344, "val_ppl": 55.33841186094534, "is_best": false, "timestamp": "2026-05-04T21:40:03.896723"}
13
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 18000, "epoch": 0, "val_loss": 3.816865837574005, "val_ppl": 45.46150107532316, "is_best": false, "timestamp": "2026-05-04T21:43:39.397578"}
14
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 18500, "epoch": 0, "val_loss": 3.7777989864349366, "val_ppl": 43.71970808806943, "is_best": true, "timestamp": "2026-05-04T21:47:16.012613"}
15
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 19000, "epoch": 0, "val_loss": 3.772597849369049, "val_ppl": 43.49290621891045, "is_best": true, "timestamp": "2026-05-04T21:50:52.340681"}
16
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 19500, "epoch": 0, "val_loss": 3.804729461669922, "val_ppl": 44.91309775478298, "is_best": false, "timestamp": "2026-05-04T21:54:29.858721"}
17
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 20000, "epoch": 0, "val_loss": 3.7609264254570007, "val_ppl": 42.988232930167364, "is_best": true, "timestamp": "2026-05-04T21:58:05.590810"}
18
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 20500, "epoch": 0, "val_loss": 3.844603657722473, "val_ppl": 46.74015557957265, "is_best": false, "timestamp": "2026-05-04T22:01:41.893882"}
19
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 21000, "epoch": 0, "val_loss": 3.8787616729736327, "val_ppl": 48.36428716991975, "is_best": false, "timestamp": "2026-05-04T22:05:18.243747"}
20
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 21500, "epoch": 0, "val_loss": 3.85902978181839, "val_ppl": 47.41932195485685, "is_best": false, "timestamp": "2026-05-04T22:08:54.100097"}
21
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 22000, "epoch": 0, "val_loss": 3.7951119422912596, "val_ppl": 44.48321567991365, "is_best": false, "timestamp": "2026-05-04T22:12:34.826759"}
22
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 22500, "epoch": 0, "val_loss": 3.793329083919525, "val_ppl": 44.403979061259854, "is_best": false, "timestamp": "2026-05-04T22:16:16.250257"}
23
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 23000, "epoch": 0, "val_loss": 3.802736735343933, "val_ppl": 44.823687357318825, "is_best": false, "timestamp": "2026-05-04T22:19:56.618702"}
24
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 23500, "epoch": 0, "val_loss": 3.743816339969635, "val_ppl": 42.258957364917784, "is_best": true, "timestamp": "2026-05-04T22:23:37.079401"}
25
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 24000, "epoch": 0, "val_loss": 3.8505793809890747, "val_ppl": 47.02029800792802, "is_best": false, "timestamp": "2026-05-04T22:27:13.809411"}
26
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 24500, "epoch": 0, "val_loss": 3.6836650729179383, "val_ppl": 39.79196760363053, "is_best": true, "timestamp": "2026-05-04T22:30:49.533366"}
27
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 25000, "epoch": 0, "val_loss": 3.790373718738556, "val_ppl": 44.272942813003034, "is_best": false, "timestamp": "2026-05-04T22:34:25.781566"}
28
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 25500, "epoch": 0, "val_loss": 3.772153317928314, "val_ppl": 43.47357655128801, "is_best": false, "timestamp": "2026-05-04T22:38:00.959868"}
29
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 26000, "epoch": 0, "val_loss": 3.8301936626434325, "val_ppl": 46.07145969098069, "is_best": false, "timestamp": "2026-05-04T22:41:35.514732"}
30
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 26500, "epoch": 0, "val_loss": 3.786704921722412, "val_ppl": 44.11081196695457, "is_best": false, "timestamp": "2026-05-04T22:45:11.828341"}
31
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 27000, "epoch": 0, "val_loss": 3.7141265153884886, "val_ppl": 41.02273869918982, "is_best": false, "timestamp": "2026-05-04T22:48:48.977554"}
32
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 27500, "epoch": 0, "val_loss": 3.8283850073814394, "val_ppl": 45.98820761283225, "is_best": false, "timestamp": "2026-05-04T22:52:23.338057"}
33
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 28000, "epoch": 0, "val_loss": 3.715892422199249, "val_ppl": 41.09524503372762, "is_best": false, "timestamp": "2026-05-04T22:55:57.343896"}
34
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 28500, "epoch": 0, "val_loss": 3.7689138054847717, "val_ppl": 43.33297122839589, "is_best": false, "timestamp": "2026-05-04T22:59:35.084593"}
35
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 29000, "epoch": 0, "val_loss": 3.803325188159943, "val_ppl": 44.85007174459031, "is_best": false, "timestamp": "2026-05-04T23:03:12.043553"}
36
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 29500, "epoch": 0, "val_loss": 3.666334068775177, "val_ppl": 39.10827450516796, "is_best": true, "timestamp": "2026-05-04T23:06:47.310567"}
37
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 30000, "epoch": 0, "val_loss": 3.7948907256126403, "val_ppl": 44.473376339039916, "is_best": false, "timestamp": "2026-05-04T23:10:23.410240"}
38
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 30500, "epoch": 0, "val_loss": 3.812023639678955, "val_ppl": 45.241899596500296, "is_best": false, "timestamp": "2026-05-04T23:13:58.332153"}
39
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 31000, "epoch": 0, "val_loss": 3.675893270969391, "val_ppl": 39.483910940532915, "is_best": false, "timestamp": "2026-05-04T23:17:34.037102"}
40
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 31500, "epoch": 0, "val_loss": 3.738007938861847, "val_ppl": 42.01421186824429, "is_best": false, "timestamp": "2026-05-04T23:21:09.271499"}
41
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 32000, "epoch": 0, "val_loss": 3.8809256076812746, "val_ppl": 48.469057646889524, "is_best": false, "timestamp": "2026-05-04T23:24:45.122386"}
42
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 32500, "epoch": 0, "val_loss": 3.8831029653549196, "val_ppl": 48.57470709807361, "is_best": false, "timestamp": "2026-05-04T23:28:19.588133"}
43
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 33000, "epoch": 0, "val_loss": 3.782689297199249, "val_ppl": 43.93403468183204, "is_best": false, "timestamp": "2026-05-04T23:31:54.513127"}
44
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 33500, "epoch": 0, "val_loss": 3.7974807739257814, "val_ppl": 44.588713832770054, "is_best": false, "timestamp": "2026-05-04T23:35:31.287826"}
45
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 34000, "epoch": 0, "val_loss": 3.755391275882721, "val_ppl": 42.75094395179436, "is_best": false, "timestamp": "2026-05-04T23:39:06.900594"}
46
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 34500, "epoch": 0, "val_loss": 3.6980305790901182, "val_ppl": 40.36772498147992, "is_best": false, "timestamp": "2026-05-04T23:42:45.306237"}
47
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 35000, "epoch": 0, "val_loss": 3.748930037021637, "val_ppl": 42.47561034735333, "is_best": false, "timestamp": "2026-05-04T23:46:22.382235"}
48
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 35500, "epoch": 0, "val_loss": 3.8731701374053955, "val_ppl": 48.094611192011875, "is_best": false, "timestamp": "2026-05-04T23:50:00.736529"}
49
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 36000, "epoch": 0, "val_loss": 3.8187678933143614, "val_ppl": 45.54805367224632, "is_best": false, "timestamp": "2026-05-04T23:53:38.469138"}
50
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 36500, "epoch": 0, "val_loss": 3.642268753051758, "val_ppl": 38.17835580431779, "is_best": true, "timestamp": "2026-05-04T23:57:14.388749"}
51
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 37000, "epoch": 0, "val_loss": 3.6575311541557314, "val_ppl": 38.76551854290175, "is_best": false, "timestamp": "2026-05-05T00:00:51.741991"}
52
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 37500, "epoch": 0, "val_loss": 3.639414060115814, "val_ppl": 38.069523736673105, "is_best": true, "timestamp": "2026-05-05T00:04:27.830597"}
53
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 38000, "epoch": 0, "val_loss": 3.7348302245140075, "val_ppl": 41.88091460685559, "is_best": false, "timestamp": "2026-05-05T00:08:03.818764"}
54
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 38500, "epoch": 0, "val_loss": 3.76340229511261, "val_ppl": 43.09479805787405, "is_best": false, "timestamp": "2026-05-05T00:11:38.899989"}
55
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 39000, "epoch": 0, "val_loss": 3.663316512107849, "val_ppl": 38.99044094482259, "is_best": false, "timestamp": "2026-05-05T00:15:13.596636"}
56
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 39500, "epoch": 0, "val_loss": 3.6712753891944887, "val_ppl": 39.30199925439836, "is_best": false, "timestamp": "2026-05-05T00:18:48.089176"}
57
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 40000, "epoch": 0, "val_loss": 3.722728192806244, "val_ppl": 41.37712503770513, "is_best": false, "timestamp": "2026-05-05T00:22:22.673948"}
58
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 40500, "epoch": 0, "val_loss": 3.770984923839569, "val_ppl": 43.42281194373766, "is_best": false, "timestamp": "2026-05-05T00:25:57.668102"}
59
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 41000, "epoch": 0, "val_loss": 3.552431809902191, "val_ppl": 34.898079879036295, "is_best": true, "timestamp": "2026-05-05T00:29:32.958417"}
60
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 41500, "epoch": 0, "val_loss": 3.747627294063568, "val_ppl": 42.420311572948926, "is_best": false, "timestamp": "2026-05-05T00:33:08.355133"}
61
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 42000, "epoch": 0, "val_loss": 3.6810667157173156, "val_ppl": 39.68870806875473, "is_best": false, "timestamp": "2026-05-05T00:36:42.580968"}
62
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 42500, "epoch": 0, "val_loss": 3.7633370995521545, "val_ppl": 43.09198855994634, "is_best": false, "timestamp": "2026-05-05T00:40:17.011458"}
63
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 43000, "epoch": 0, "val_loss": 3.615912067890167, "val_ppl": 37.185245932545655, "is_best": false, "timestamp": "2026-05-05T00:43:51.844020"}
64
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 43500, "epoch": 0, "val_loss": 3.6781845092773438, "val_ppl": 39.57448170981275, "is_best": false, "timestamp": "2026-05-05T00:47:26.919631"}
65
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 44000, "epoch": 0, "val_loss": 3.6713991284370424, "val_ppl": 39.30686275491367, "is_best": false, "timestamp": "2026-05-05T00:51:04.835836"}
66
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 44500, "epoch": 0, "val_loss": 3.699683153629303, "val_ppl": 40.43449080854727, "is_best": false, "timestamp": "2026-05-05T00:54:40.453262"}
67
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 45000, "epoch": 0, "val_loss": 3.7241574883461, "val_ppl": 41.43630746248623, "is_best": false, "timestamp": "2026-05-05T00:58:17.458978"}
68
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 45500, "epoch": 0, "val_loss": 3.7611931204795837, "val_ppl": 42.99969920685097, "is_best": false, "timestamp": "2026-05-05T01:01:54.445349"}
69
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 46000, "epoch": 0, "val_loss": 3.7802424907684324, "val_ppl": 43.82666800953869, "is_best": false, "timestamp": "2026-05-05T01:05:28.095918"}
70
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 46500, "epoch": 0, "val_loss": 3.654506039619446, "val_ppl": 38.648425608316444, "is_best": false, "timestamp": "2026-05-05T01:09:01.956376"}
71
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 47000, "epoch": 0, "val_loss": 3.791075897216797, "val_ppl": 44.304041237659206, "is_best": false, "timestamp": "2026-05-05T01:12:35.113869"}
72
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 47500, "epoch": 0, "val_loss": 3.765990149974823, "val_ppl": 43.206465567898896, "is_best": false, "timestamp": "2026-05-05T01:16:07.821939"}
73
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 48000, "epoch": 0, "val_loss": 3.6301537990570067, "val_ppl": 37.71861725886971, "is_best": false, "timestamp": "2026-05-05T01:19:41.674618"}
74
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 48500, "epoch": 0, "val_loss": 3.85994348526001, "val_ppl": 47.46266895266112, "is_best": false, "timestamp": "2026-05-05T01:23:14.931355"}
75
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 49000, "epoch": 0, "val_loss": 3.7905235290527344, "val_ppl": 44.279575853311975, "is_best": false, "timestamp": "2026-05-05T01:26:48.890788"}
76
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 49500, "epoch": 0, "val_loss": 3.7553041696548464, "val_ppl": 42.74722024051022, "is_best": false, "timestamp": "2026-05-05T01:30:22.140454"}
77
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 50000, "epoch": 0, "val_loss": 3.718538928031921, "val_ppl": 41.204147881852954, "is_best": false, "timestamp": "2026-05-05T01:33:56.237159"}
78
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 50500, "epoch": 0, "val_loss": 3.7871594190597535, "val_ppl": 44.13086477016917, "is_best": false, "timestamp": "2026-05-05T01:37:29.804904"}
79
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 51000, "epoch": 0, "val_loss": 3.6975579500198363, "val_ppl": 40.34865052907795, "is_best": false, "timestamp": "2026-05-05T01:41:03.626884"}
80
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 51500, "epoch": 0, "val_loss": 3.7123099088668825, "val_ppl": 40.948284172299644, "is_best": false, "timestamp": "2026-05-05T01:44:36.590819"}
81
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 52000, "epoch": 0, "val_loss": 3.6105751514434816, "val_ppl": 36.98732000880116, "is_best": false, "timestamp": "2026-05-05T01:48:10.949076"}
82
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 52500, "epoch": 0, "val_loss": 3.748972177505493, "val_ppl": 42.47740032784051, "is_best": false, "timestamp": "2026-05-05T01:51:44.887522"}
83
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 53000, "epoch": 0, "val_loss": 3.536519932746887, "val_ppl": 34.347180464321134, "is_best": true, "timestamp": "2026-05-05T01:55:19.914524"}
84
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 53500, "epoch": 0, "val_loss": 3.7042893409729003, "val_ppl": 40.62116925624361, "is_best": false, "timestamp": "2026-05-05T01:58:56.094996"}
85
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 54000, "epoch": 0, "val_loss": 3.528304159641266, "val_ppl": 34.06614785367218, "is_best": true, "timestamp": "2026-05-05T02:02:36.543228"}
86
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 54500, "epoch": 0, "val_loss": 3.688965547084808, "val_ppl": 40.00344386707912, "is_best": false, "timestamp": "2026-05-05T02:06:13.936633"}
87
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 55000, "epoch": 0, "val_loss": 3.687112832069397, "val_ppl": 39.92939750054665, "is_best": false, "timestamp": "2026-05-05T02:09:49.921760"}
88
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 55500, "epoch": 0, "val_loss": 3.707573747634888, "val_ppl": 40.75480503215183, "is_best": false, "timestamp": "2026-05-05T02:13:25.278363"}
89
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 56000, "epoch": 0, "val_loss": 3.724093532562256, "val_ppl": 41.4336574557054, "is_best": false, "timestamp": "2026-05-05T02:17:00.358077"}
90
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 56500, "epoch": 0, "val_loss": 3.6532342433929443, "val_ppl": 38.59930392947151, "is_best": false, "timestamp": "2026-05-05T02:20:34.406624"}
91
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 57000, "epoch": 0, "val_loss": 3.6039551854133607, "val_ppl": 36.743273886571934, "is_best": false, "timestamp": "2026-05-05T02:24:08.529184"}
92
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 57500, "epoch": 0, "val_loss": 3.690984618663788, "val_ppl": 40.08429527857909, "is_best": false, "timestamp": "2026-05-05T02:27:42.348235"}
93
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 58000, "epoch": 0, "val_loss": 3.7980252385139464, "val_ppl": 44.61299741866023, "is_best": false, "timestamp": "2026-05-05T02:31:15.494315"}
94
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 58500, "epoch": 0, "val_loss": 3.7596939325332643, "val_ppl": 42.935282874264374, "is_best": false, "timestamp": "2026-05-05T02:34:49.636452"}
95
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 59000, "epoch": 0, "val_loss": 3.7567172765731813, "val_ppl": 42.80766933362843, "is_best": false, "timestamp": "2026-05-05T02:38:23.899072"}
96
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 59500, "epoch": 0, "val_loss": 3.6661568880081177, "val_ppl": 39.101345884920015, "is_best": false, "timestamp": "2026-05-05T02:41:58.314158"}
97
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 60000, "epoch": 0, "val_loss": 3.829219877719879, "val_ppl": 46.02661783483271, "is_best": false, "timestamp": "2026-05-05T02:45:33.015118"}
98
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 60500, "epoch": 0, "val_loss": 3.6592538833618162, "val_ppl": 38.8323585910055, "is_best": false, "timestamp": "2026-05-05T02:49:07.008367"}
99
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 61000, "epoch": 0, "val_loss": 3.6369224905967714, "val_ppl": 37.974788939901174, "is_best": false, "timestamp": "2026-05-05T02:52:41.826436"}
100
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 61500, "epoch": 0, "val_loss": 3.640268337726593, "val_ppl": 38.10205957379428, "is_best": false, "timestamp": "2026-05-05T02:56:17.487129"}
101
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 62000, "epoch": 0, "val_loss": 3.7314751744270325, "val_ppl": 41.740637490620905, "is_best": false, "timestamp": "2026-05-05T02:59:52.254509"}
102
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 62500, "epoch": 0, "val_loss": 3.607824993133545, "val_ppl": 36.88573876958596, "is_best": false, "timestamp": "2026-05-05T03:03:27.488521"}
103
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 63000, "epoch": 0, "val_loss": 3.497276210784912, "val_ppl": 33.02537517856575, "is_best": true, "timestamp": "2026-05-05T03:07:01.428002"}
104
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 63500, "epoch": 0, "val_loss": 3.751301980018616, "val_ppl": 42.57647965469087, "is_best": false, "timestamp": "2026-05-05T03:10:39.967541"}
105
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 64000, "epoch": 0, "val_loss": 3.785206985473633, "val_ppl": 44.04478624625217, "is_best": false, "timestamp": "2026-05-05T03:14:09.552152"}
106
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 64500, "epoch": 0, "val_loss": 3.6670176506042482, "val_ppl": 39.13501735040617, "is_best": false, "timestamp": "2026-05-05T03:17:40.120467"}
107
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 65000, "epoch": 0, "val_loss": 3.6048176407814028, "val_ppl": 36.774976989665426, "is_best": false, "timestamp": "2026-05-05T03:21:09.621984"}
108
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 65500, "epoch": 0, "val_loss": 3.68316490650177, "val_ppl": 39.77206997427974, "is_best": false, "timestamp": "2026-05-05T03:24:39.232220"}
109
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 66000, "epoch": 0, "val_loss": 3.6640037536621093, "val_ppl": 39.017246005779675, "is_best": false, "timestamp": "2026-05-05T03:28:09.790645"}
110
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 66500, "epoch": 0, "val_loss": 3.5959048748016356, "val_ppl": 36.44866655023838, "is_best": false, "timestamp": "2026-05-05T03:31:40.643166"}
111
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 67000, "epoch": 0, "val_loss": 3.7562708497047423, "val_ppl": 42.78856310494689, "is_best": false, "timestamp": "2026-05-05T03:35:11.086634"}
112
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 67500, "epoch": 0, "val_loss": 3.5145472645759583, "val_ppl": 33.600712247137416, "is_best": false, "timestamp": "2026-05-05T03:38:41.101123"}
113
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 68000, "epoch": 0, "val_loss": 3.6910828351974487, "val_ppl": 40.08823241245824, "is_best": false, "timestamp": "2026-05-05T03:42:10.953259"}
114
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 68500, "epoch": 0, "val_loss": 3.8013942003250123, "val_ppl": 44.7635503644065, "is_best": false, "timestamp": "2026-05-05T03:45:40.080018"}
115
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 69000, "epoch": 0, "val_loss": 3.658419907093048, "val_ppl": 38.79998682599847, "is_best": false, "timestamp": "2026-05-05T03:49:08.561564"}
116
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 69500, "epoch": 0, "val_loss": 3.8823310852050783, "val_ppl": 48.53722771253796, "is_best": false, "timestamp": "2026-05-05T03:52:37.799746"}
117
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 70000, "epoch": 0, "val_loss": 3.713327670097351, "val_ppl": 40.989980963473904, "is_best": false, "timestamp": "2026-05-05T03:56:07.108614"}
118
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 70500, "epoch": 0, "val_loss": 3.6645475745201113, "val_ppl": 39.03847016852754, "is_best": false, "timestamp": "2026-05-05T03:59:35.499171"}
119
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 71000, "epoch": 0, "val_loss": 3.6358020663261414, "val_ppl": 37.93226489163729, "is_best": false, "timestamp": "2026-05-05T04:03:04.490368"}
120
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 71500, "epoch": 0, "val_loss": 3.714549744129181, "val_ppl": 41.04010437579673, "is_best": false, "timestamp": "2026-05-05T04:06:33.754519"}
121
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 72000, "epoch": 0, "val_loss": 3.724819552898407, "val_ppl": 41.463750056217506, "is_best": false, "timestamp": "2026-05-05T04:10:02.950489"}
122
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 72500, "epoch": 0, "val_loss": 3.706956958770752, "val_ppl": 40.72967567279914, "is_best": false, "timestamp": "2026-05-05T04:13:32.753044"}
123
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 73000, "epoch": 0, "val_loss": 3.720665693283081, "val_ppl": 41.291872683735996, "is_best": false, "timestamp": "2026-05-05T04:17:02.264757"}
124
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 73500, "epoch": 0, "val_loss": 3.7619191646575927, "val_ppl": 43.03093022429351, "is_best": false, "timestamp": "2026-05-05T04:20:31.644524"}
125
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 74000, "epoch": 0, "val_loss": 3.7080879330635073, "val_ppl": 40.77576594748239, "is_best": false, "timestamp": "2026-05-05T04:24:01.396289"}
126
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 74500, "epoch": 0, "val_loss": 3.668335270881653, "val_ppl": 39.18661642935482, "is_best": false, "timestamp": "2026-05-05T04:27:30.551942"}
127
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 75000, "epoch": 0, "val_loss": 3.646988558769226, "val_ppl": 38.35897613746714, "is_best": false, "timestamp": "2026-05-05T04:30:59.989095"}
128
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 75500, "epoch": 0, "val_loss": 3.7617430686950684, "val_ppl": 43.023353318368194, "is_best": false, "timestamp": "2026-05-05T04:34:29.068254"}
129
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 76000, "epoch": 0, "val_loss": 3.541844880580902, "val_ppl": 34.53056523237709, "is_best": false, "timestamp": "2026-05-05T04:37:59.028500"}
130
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 76500, "epoch": 0, "val_loss": 3.680254328250885, "val_ppl": 39.656478552960735, "is_best": false, "timestamp": "2026-05-05T04:41:28.477721"}
131
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 77000, "epoch": 0, "val_loss": 3.645061802864075, "val_ppl": 38.285138909678764, "is_best": false, "timestamp": "2026-05-05T04:44:57.542724"}
132
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 77500, "epoch": 0, "val_loss": 3.615162217617035, "val_ppl": 37.157373017289224, "is_best": false, "timestamp": "2026-05-05T04:48:26.395231"}
133
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 78000, "epoch": 0, "val_loss": 3.667114770412445, "val_ppl": 39.138818320356776, "is_best": false, "timestamp": "2026-05-05T04:51:55.154634"}
134
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 78500, "epoch": 0, "val_loss": 3.6738196134567263, "val_ppl": 39.40211966483728, "is_best": false, "timestamp": "2026-05-05T04:55:24.168296"}
135
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 79000, "epoch": 0, "val_loss": 3.5766518473625184, "val_ppl": 35.75363160589735, "is_best": false, "timestamp": "2026-05-05T04:58:53.097006"}
136
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 79500, "epoch": 0, "val_loss": 3.6736133217811586, "val_ppl": 39.393992173896386, "is_best": false, "timestamp": "2026-05-05T05:02:22.534742"}
137
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 80000, "epoch": 0, "val_loss": 3.7694395780563354, "val_ppl": 43.355760506575855, "is_best": false, "timestamp": "2026-05-05T05:05:51.139952"}
138
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 80500, "epoch": 0, "val_loss": 3.673647201061249, "val_ppl": 39.3953268365997, "is_best": false, "timestamp": "2026-05-05T05:09:20.334475"}
139
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 81000, "epoch": 0, "val_loss": 3.5661979794502257, "val_ppl": 35.38181471216213, "is_best": false, "timestamp": "2026-05-05T05:12:50.088072"}
140
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 81500, "epoch": 0, "val_loss": 3.662993311882019, "val_ppl": 38.9778412617239, "is_best": false, "timestamp": "2026-05-05T05:16:17.663732"}
141
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 82000, "epoch": 0, "val_loss": 4.2208603620529175, "val_ppl": 68.09204290482313, "is_best": false, "timestamp": "2026-05-05T05:19:45.079006"}
142
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 82500, "epoch": 0, "val_loss": 3.5309290766716, "val_ppl": 34.155686129192006, "is_best": false, "timestamp": "2026-05-05T05:23:12.931143"}
143
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 83000, "epoch": 0, "val_loss": 3.5450966119766236, "val_ppl": 34.64303211239782, "is_best": false, "timestamp": "2026-05-05T05:26:40.835777"}
144
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 83500, "epoch": 0, "val_loss": 3.542018008232117, "val_ppl": 34.53654394555625, "is_best": false, "timestamp": "2026-05-05T05:30:09.353422"}
145
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 84000, "epoch": 0, "val_loss": 3.658814322948456, "val_ppl": 38.81529317432707, "is_best": false, "timestamp": "2026-05-05T05:33:37.801618"}
146
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 84500, "epoch": 0, "val_loss": 3.653587245941162, "val_ppl": 38.6129319873461, "is_best": false, "timestamp": "2026-05-05T05:37:07.033376"}
147
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 85000, "epoch": 0, "val_loss": 3.6783868193626406, "val_ppl": 39.58248883651697, "is_best": false, "timestamp": "2026-05-05T05:40:34.896439"}
148
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 85500, "epoch": 0, "val_loss": 3.6223838329315186, "val_ppl": 37.42668051850112, "is_best": false, "timestamp": "2026-05-05T05:44:04.198255"}
149
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 86000, "epoch": 0, "val_loss": 3.662233901023865, "val_ppl": 38.94825230235286, "is_best": false, "timestamp": "2026-05-05T05:47:32.314224"}
150
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 86500, "epoch": 0, "val_loss": 3.6114082098007203, "val_ppl": 37.01814544275634, "is_best": false, "timestamp": "2026-05-05T05:51:01.125909"}
151
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 87000, "epoch": 0, "val_loss": 3.571852457523346, "val_ppl": 35.582447108813014, "is_best": false, "timestamp": "2026-05-05T05:54:29.829229"}
152
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 87500, "epoch": 0, "val_loss": 3.487281250953674, "val_ppl": 32.69693200268292, "is_best": true, "timestamp": "2026-05-05T05:57:59.390664"}
153
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 88000, "epoch": 0, "val_loss": 3.5826083183288575, "val_ppl": 35.96723259701156, "is_best": false, "timestamp": "2026-05-05T06:01:29.039685"}
154
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 88500, "epoch": 0, "val_loss": 3.3613099455833435, "val_ppl": 28.826927863984967, "is_best": true, "timestamp": "2026-05-05T06:04:58.135293"}
155
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 89000, "epoch": 0, "val_loss": 3.720856249332428, "val_ppl": 41.29974184959951, "is_best": false, "timestamp": "2026-05-05T06:08:27.479708"}
156
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 89500, "epoch": 0, "val_loss": 3.411408472061157, "val_ppl": 30.307902044460747, "is_best": false, "timestamp": "2026-05-05T06:11:56.256241"}
157
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 90000, "epoch": 0, "val_loss": 3.502420389652252, "val_ppl": 33.19570133414478, "is_best": false, "timestamp": "2026-05-05T06:15:23.843460"}
158
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 90500, "epoch": 0, "val_loss": 3.470740056037903, "val_ppl": 32.16053423758203, "is_best": false, "timestamp": "2026-05-05T06:18:52.013161"}
159
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 91000, "epoch": 0, "val_loss": 3.6083788871765137, "val_ppl": 36.90617521985247, "is_best": false, "timestamp": "2026-05-05T06:22:20.694699"}
160
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 91500, "epoch": 0, "val_loss": 3.5298542261123655, "val_ppl": 34.11899359388364, "is_best": false, "timestamp": "2026-05-05T06:25:49.341306"}
161
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 92000, "epoch": 0, "val_loss": 3.536409044265747, "val_ppl": 34.34337196881105, "is_best": false, "timestamp": "2026-05-05T06:29:17.886483"}
162
+ {"run_name": "c2", "stage": "pretraining", "event": "eval", "step": 92500, "epoch": 0, "val_loss": 3.567478024959564, "val_ppl": 35.42713404441251, "is_best": false, "timestamp": "2026-05-05T06:32:46.611475"}
c2/events.jsonl ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"run_name": "c2", "stage": "pretraining", "event": "model_summary", "total_params": 38227520, "trainable_params": 38227520, "weight_tied_lm_head": true, "timestamp": "2026-05-04T20:51:34.748590"}
2
+ {"run_name": "c2", "stage": "pretraining", "event": "config", "model": {"vocab_size": 50304, "n_layers": 18, "n_heads": 5, "n_kv_heads": 1, "n_embd": 320, "embedding_dim": null, "tie_embeddings": true, "context_len": 1024, "dropout": 0.0, "bias": false, "norm_type": "rmsnorm", "norm_eps": 1e-05, "positional_embedding": "rope", "rope_theta": 10000.0, "rope_fraction": 1.0, "mlp_type": "swiglu", "mlp_hidden_mult": 4.0, "mlp_hidden_dim": 1024, "qk_norm": false, "block_style": "sequential"}, "training": {"seed": 0, "learning_rate": 0.0012, "min_lr": 0.00012, "weight_decay": 0.03, "beta1": 0.9, "beta2": 0.95, "grad_clip": 1.0, "max_iters": 11586, "warmup_steps": 116, "lr_schedule": "wsd", "wsd_stable_frac": 0.85, "batch_size": 4, "gradient_accumulation_steps": 16, "dtype": "float16", "device": "cuda", "eval_step_interval": 500, "eval_batches": 20, "log_interval": 10, "max_checkpoints": 5}, "distributed": {"enabled": false, "backend": "nccl"}, "timestamp": "2026-05-04T20:51:34.748853"}
3
+ {"run_name": "c2", "stage": "pretraining", "event": "model_summary", "total_params": 38227520, "trainable_params": 38227520, "weight_tied_lm_head": true, "timestamp": "2026-05-04T20:55:51.114602"}
4
+ {"run_name": "c2", "stage": "pretraining", "event": "config", "model": {"vocab_size": 50304, "n_layers": 18, "n_heads": 5, "n_kv_heads": 1, "n_embd": 320, "embedding_dim": null, "tie_embeddings": true, "context_len": 1024, "dropout": 0.0, "bias": false, "norm_type": "rmsnorm", "norm_eps": 1e-05, "positional_embedding": "rope", "rope_theta": 10000.0, "rope_fraction": 1.0, "mlp_type": "swiglu", "mlp_hidden_mult": 4.0, "mlp_hidden_dim": 1024, "qk_norm": false, "block_style": "sequential"}, "training": {"seed": 0, "learning_rate": 0.0012, "min_lr": 0.00012, "weight_decay": 0.03, "beta1": 0.9, "beta2": 0.95, "grad_clip": 1.0, "max_iters": 92685, "warmup_steps": 116, "lr_schedule": "wsd", "wsd_stable_frac": 0.85, "batch_size": 4, "gradient_accumulation_steps": 16, "dtype": "float16", "device": "cuda", "eval_step_interval": 500, "eval_batches": 20, "log_interval": 10, "max_checkpoints": 5}, "distributed": {"enabled": false, "backend": "nccl"}, "timestamp": "2026-05-04T20:55:51.114804"}
5
+ {"run_name": "c2", "stage": "pretraining", "event": "resume", "checkpoint": "artifacts/c2/checkpoints/ckpt_step0011586.pt", "step": 11586, "best_val_loss": 3.8527660250663756, "timestamp": "2026-05-04T20:55:51.578050"}
6
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 12000, "path": "artifacts/c2/checkpoints/ckpt_step0012000.pt", "timestamp": "2026-05-04T21:00:00.110473"}
7
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 12500, "path": "artifacts/c2/checkpoints/ckpt_step0012500.pt", "timestamp": "2026-05-04T21:03:40.818712"}
8
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 13000, "path": "artifacts/c2/checkpoints/ckpt_step0013000.pt", "timestamp": "2026-05-04T21:07:21.158528"}
9
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 13500, "path": "artifacts/c2/checkpoints/ckpt_step0013500.pt", "timestamp": "2026-05-04T21:11:00.953139"}
10
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 14000, "path": "artifacts/c2/checkpoints/ckpt_step0014000.pt", "timestamp": "2026-05-04T21:14:41.734550"}
11
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 14500, "path": "artifacts/c2/checkpoints/ckpt_step0014500.pt", "timestamp": "2026-05-04T21:18:22.458649"}
12
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 14500, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-04T21:18:23.064434"}
13
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 15000, "path": "artifacts/c2/checkpoints/ckpt_step0015000.pt", "timestamp": "2026-05-04T21:22:00.340468"}
14
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 15500, "path": "artifacts/c2/checkpoints/ckpt_step0015500.pt", "timestamp": "2026-05-04T21:25:36.621906"}
15
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 16000, "path": "artifacts/c2/checkpoints/ckpt_step0016000.pt", "timestamp": "2026-05-04T21:29:12.590050"}
16
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 16500, "path": "artifacts/c2/checkpoints/ckpt_step0016500.pt", "timestamp": "2026-05-04T21:32:50.330606"}
17
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 17000, "path": "artifacts/c2/checkpoints/ckpt_step0017000.pt", "timestamp": "2026-05-04T21:36:28.262577"}
18
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 17500, "path": "artifacts/c2/checkpoints/ckpt_step0017500.pt", "timestamp": "2026-05-04T21:40:04.551108"}
19
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 18000, "path": "artifacts/c2/checkpoints/ckpt_step0018000.pt", "timestamp": "2026-05-04T21:43:40.067927"}
20
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 18500, "path": "artifacts/c2/checkpoints/ckpt_step0018500.pt", "timestamp": "2026-05-04T21:47:16.668622"}
21
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 18500, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-04T21:47:17.560815"}
22
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 19000, "path": "artifacts/c2/checkpoints/ckpt_step0019000.pt", "timestamp": "2026-05-04T21:50:52.991139"}
23
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 19000, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-04T21:50:53.852840"}
24
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 19500, "path": "artifacts/c2/checkpoints/ckpt_step0019500.pt", "timestamp": "2026-05-04T21:54:30.514043"}
25
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 20000, "path": "artifacts/c2/checkpoints/ckpt_step0020000.pt", "timestamp": "2026-05-04T21:58:06.244586"}
26
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 20000, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-04T21:58:07.169029"}
27
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 20500, "path": "artifacts/c2/checkpoints/ckpt_step0020500.pt", "timestamp": "2026-05-04T22:01:42.547664"}
28
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 21000, "path": "artifacts/c2/checkpoints/ckpt_step0021000.pt", "timestamp": "2026-05-04T22:05:18.903445"}
29
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 21500, "path": "artifacts/c2/checkpoints/ckpt_step0021500.pt", "timestamp": "2026-05-04T22:08:54.759245"}
30
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 22000, "path": "artifacts/c2/checkpoints/ckpt_step0022000.pt", "timestamp": "2026-05-04T22:12:35.490777"}
31
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 22500, "path": "artifacts/c2/checkpoints/ckpt_step0022500.pt", "timestamp": "2026-05-04T22:16:16.910153"}
32
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 23000, "path": "artifacts/c2/checkpoints/ckpt_step0023000.pt", "timestamp": "2026-05-04T22:19:57.275761"}
33
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 23500, "path": "artifacts/c2/checkpoints/ckpt_step0023500.pt", "timestamp": "2026-05-04T22:23:37.733214"}
34
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 23500, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-04T22:23:38.692307"}
35
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 24000, "path": "artifacts/c2/checkpoints/ckpt_step0024000.pt", "timestamp": "2026-05-04T22:27:14.462120"}
36
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 24500, "path": "artifacts/c2/checkpoints/ckpt_step0024500.pt", "timestamp": "2026-05-04T22:30:50.186765"}
37
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 24500, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-04T22:30:51.110454"}
38
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 25000, "path": "artifacts/c2/checkpoints/ckpt_step0025000.pt", "timestamp": "2026-05-04T22:34:26.435459"}
39
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 25500, "path": "artifacts/c2/checkpoints/ckpt_step0025500.pt", "timestamp": "2026-05-04T22:38:01.616826"}
40
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 26000, "path": "artifacts/c2/checkpoints/ckpt_step0026000.pt", "timestamp": "2026-05-04T22:41:36.171027"}
41
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 26500, "path": "artifacts/c2/checkpoints/ckpt_step0026500.pt", "timestamp": "2026-05-04T22:45:12.480833"}
42
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 27000, "path": "artifacts/c2/checkpoints/ckpt_step0027000.pt", "timestamp": "2026-05-04T22:48:49.630829"}
43
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 27500, "path": "artifacts/c2/checkpoints/ckpt_step0027500.pt", "timestamp": "2026-05-04T22:52:23.987911"}
44
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 28000, "path": "artifacts/c2/checkpoints/ckpt_step0028000.pt", "timestamp": "2026-05-04T22:55:57.996561"}
45
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 28500, "path": "artifacts/c2/checkpoints/ckpt_step0028500.pt", "timestamp": "2026-05-04T22:59:35.749865"}
46
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 29000, "path": "artifacts/c2/checkpoints/ckpt_step0029000.pt", "timestamp": "2026-05-04T23:03:12.706823"}
47
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 29500, "path": "artifacts/c2/checkpoints/ckpt_step0029500.pt", "timestamp": "2026-05-04T23:06:47.965913"}
48
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 29500, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-04T23:06:48.838561"}
49
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 30000, "path": "artifacts/c2/checkpoints/ckpt_step0030000.pt", "timestamp": "2026-05-04T23:10:24.061996"}
50
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 30500, "path": "artifacts/c2/checkpoints/ckpt_step0030500.pt", "timestamp": "2026-05-04T23:13:58.988561"}
51
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 31000, "path": "artifacts/c2/checkpoints/ckpt_step0031000.pt", "timestamp": "2026-05-04T23:17:34.697407"}
52
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 31500, "path": "artifacts/c2/checkpoints/ckpt_step0031500.pt", "timestamp": "2026-05-04T23:21:09.933172"}
53
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 32000, "path": "artifacts/c2/checkpoints/ckpt_step0032000.pt", "timestamp": "2026-05-04T23:24:45.779797"}
54
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 32500, "path": "artifacts/c2/checkpoints/ckpt_step0032500.pt", "timestamp": "2026-05-04T23:28:20.241536"}
55
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 33000, "path": "artifacts/c2/checkpoints/ckpt_step0033000.pt", "timestamp": "2026-05-04T23:31:55.169163"}
56
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 33500, "path": "artifacts/c2/checkpoints/ckpt_step0033500.pt", "timestamp": "2026-05-04T23:35:31.951568"}
57
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 34000, "path": "artifacts/c2/checkpoints/ckpt_step0034000.pt", "timestamp": "2026-05-04T23:39:07.554608"}
58
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 34500, "path": "artifacts/c2/checkpoints/ckpt_step0034500.pt", "timestamp": "2026-05-04T23:42:45.963040"}
59
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 35000, "path": "artifacts/c2/checkpoints/ckpt_step0035000.pt", "timestamp": "2026-05-04T23:46:23.037806"}
60
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 35500, "path": "artifacts/c2/checkpoints/ckpt_step0035500.pt", "timestamp": "2026-05-04T23:50:01.391079"}
61
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 36000, "path": "artifacts/c2/checkpoints/ckpt_step0036000.pt", "timestamp": "2026-05-04T23:53:39.123535"}
62
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 36500, "path": "artifacts/c2/checkpoints/ckpt_step0036500.pt", "timestamp": "2026-05-04T23:57:15.042471"}
63
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 36500, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-04T23:57:15.986028"}
64
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 37000, "path": "artifacts/c2/checkpoints/ckpt_step0037000.pt", "timestamp": "2026-05-05T00:00:52.397532"}
65
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 37500, "path": "artifacts/c2/checkpoints/ckpt_step0037500.pt", "timestamp": "2026-05-05T00:04:28.488179"}
66
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 37500, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-05T00:04:29.328242"}
67
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 38000, "path": "artifacts/c2/checkpoints/ckpt_step0038000.pt", "timestamp": "2026-05-05T00:08:04.476478"}
68
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 38500, "path": "artifacts/c2/checkpoints/ckpt_step0038500.pt", "timestamp": "2026-05-05T00:11:39.556054"}
69
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 39000, "path": "artifacts/c2/checkpoints/ckpt_step0039000.pt", "timestamp": "2026-05-05T00:15:14.254304"}
70
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 39500, "path": "artifacts/c2/checkpoints/ckpt_step0039500.pt", "timestamp": "2026-05-05T00:18:48.741599"}
71
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 40000, "path": "artifacts/c2/checkpoints/ckpt_step0040000.pt", "timestamp": "2026-05-05T00:22:23.329858"}
72
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 40500, "path": "artifacts/c2/checkpoints/ckpt_step0040500.pt", "timestamp": "2026-05-05T00:25:58.324698"}
73
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 41000, "path": "artifacts/c2/checkpoints/ckpt_step0041000.pt", "timestamp": "2026-05-05T00:29:33.613999"}
74
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 41000, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-05T00:29:34.493359"}
75
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 41500, "path": "artifacts/c2/checkpoints/ckpt_step0041500.pt", "timestamp": "2026-05-05T00:33:09.011262"}
76
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 42000, "path": "artifacts/c2/checkpoints/ckpt_step0042000.pt", "timestamp": "2026-05-05T00:36:43.234279"}
77
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 42500, "path": "artifacts/c2/checkpoints/ckpt_step0042500.pt", "timestamp": "2026-05-05T00:40:17.669807"}
78
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 43000, "path": "artifacts/c2/checkpoints/ckpt_step0043000.pt", "timestamp": "2026-05-05T00:43:52.501341"}
79
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 43500, "path": "artifacts/c2/checkpoints/ckpt_step0043500.pt", "timestamp": "2026-05-05T00:47:27.576226"}
80
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 44000, "path": "artifacts/c2/checkpoints/ckpt_step0044000.pt", "timestamp": "2026-05-05T00:51:05.492518"}
81
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 44500, "path": "artifacts/c2/checkpoints/ckpt_step0044500.pt", "timestamp": "2026-05-05T00:54:41.107238"}
82
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 45000, "path": "artifacts/c2/checkpoints/ckpt_step0045000.pt", "timestamp": "2026-05-05T00:58:18.115880"}
83
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 45500, "path": "artifacts/c2/checkpoints/ckpt_step0045500.pt", "timestamp": "2026-05-05T01:01:55.098593"}
84
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 46000, "path": "artifacts/c2/checkpoints/ckpt_step0046000.pt", "timestamp": "2026-05-05T01:05:28.750126"}
85
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 46500, "path": "artifacts/c2/checkpoints/ckpt_step0046500.pt", "timestamp": "2026-05-05T01:09:02.610840"}
86
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 47000, "path": "artifacts/c2/checkpoints/ckpt_step0047000.pt", "timestamp": "2026-05-05T01:12:35.776226"}
87
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 47500, "path": "artifacts/c2/checkpoints/ckpt_step0047500.pt", "timestamp": "2026-05-05T01:16:08.472641"}
88
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 48000, "path": "artifacts/c2/checkpoints/ckpt_step0048000.pt", "timestamp": "2026-05-05T01:19:42.326766"}
89
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 48500, "path": "artifacts/c2/checkpoints/ckpt_step0048500.pt", "timestamp": "2026-05-05T01:23:15.584520"}
90
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 49000, "path": "artifacts/c2/checkpoints/ckpt_step0049000.pt", "timestamp": "2026-05-05T01:26:49.541401"}
91
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 49500, "path": "artifacts/c2/checkpoints/ckpt_step0049500.pt", "timestamp": "2026-05-05T01:30:22.798284"}
92
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 50000, "path": "artifacts/c2/checkpoints/ckpt_step0050000.pt", "timestamp": "2026-05-05T01:33:56.893241"}
93
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 50500, "path": "artifacts/c2/checkpoints/ckpt_step0050500.pt", "timestamp": "2026-05-05T01:37:30.457284"}
94
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 51000, "path": "artifacts/c2/checkpoints/ckpt_step0051000.pt", "timestamp": "2026-05-05T01:41:04.277657"}
95
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 51500, "path": "artifacts/c2/checkpoints/ckpt_step0051500.pt", "timestamp": "2026-05-05T01:44:37.249519"}
96
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 52000, "path": "artifacts/c2/checkpoints/ckpt_step0052000.pt", "timestamp": "2026-05-05T01:48:11.602737"}
97
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 52500, "path": "artifacts/c2/checkpoints/ckpt_step0052500.pt", "timestamp": "2026-05-05T01:51:45.550140"}
98
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 53000, "path": "artifacts/c2/checkpoints/ckpt_step0053000.pt", "timestamp": "2026-05-05T01:55:20.569368"}
99
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 53000, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-05T01:55:21.509022"}
100
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 53500, "path": "artifacts/c2/checkpoints/ckpt_step0053500.pt", "timestamp": "2026-05-05T01:58:56.759123"}
101
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 54000, "path": "artifacts/c2/checkpoints/ckpt_step0054000.pt", "timestamp": "2026-05-05T02:02:37.198236"}
102
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 54000, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-05T02:02:38.124472"}
103
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 54500, "path": "artifacts/c2/checkpoints/ckpt_step0054500.pt", "timestamp": "2026-05-05T02:06:14.593044"}
104
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 55000, "path": "artifacts/c2/checkpoints/ckpt_step0055000.pt", "timestamp": "2026-05-05T02:09:50.577235"}
105
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 55500, "path": "artifacts/c2/checkpoints/ckpt_step0055500.pt", "timestamp": "2026-05-05T02:13:25.929470"}
106
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 56000, "path": "artifacts/c2/checkpoints/ckpt_step0056000.pt", "timestamp": "2026-05-05T02:17:01.005635"}
107
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 56500, "path": "artifacts/c2/checkpoints/ckpt_step0056500.pt", "timestamp": "2026-05-05T02:20:35.058409"}
108
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 57000, "path": "artifacts/c2/checkpoints/ckpt_step0057000.pt", "timestamp": "2026-05-05T02:24:09.182984"}
109
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 57500, "path": "artifacts/c2/checkpoints/ckpt_step0057500.pt", "timestamp": "2026-05-05T02:27:43.001859"}
110
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 58000, "path": "artifacts/c2/checkpoints/ckpt_step0058000.pt", "timestamp": "2026-05-05T02:31:16.149978"}
111
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 58500, "path": "artifacts/c2/checkpoints/ckpt_step0058500.pt", "timestamp": "2026-05-05T02:34:50.291544"}
112
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 59000, "path": "artifacts/c2/checkpoints/ckpt_step0059000.pt", "timestamp": "2026-05-05T02:38:24.555747"}
113
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 59500, "path": "artifacts/c2/checkpoints/ckpt_step0059500.pt", "timestamp": "2026-05-05T02:41:58.967242"}
114
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 60000, "path": "artifacts/c2/checkpoints/ckpt_step0060000.pt", "timestamp": "2026-05-05T02:45:33.668284"}
115
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 60500, "path": "artifacts/c2/checkpoints/ckpt_step0060500.pt", "timestamp": "2026-05-05T02:49:07.661930"}
116
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 61000, "path": "artifacts/c2/checkpoints/ckpt_step0061000.pt", "timestamp": "2026-05-05T02:52:42.478664"}
117
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 61500, "path": "artifacts/c2/checkpoints/ckpt_step0061500.pt", "timestamp": "2026-05-05T02:56:18.138931"}
118
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 62000, "path": "artifacts/c2/checkpoints/ckpt_step0062000.pt", "timestamp": "2026-05-05T02:59:52.909119"}
119
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 62500, "path": "artifacts/c2/checkpoints/ckpt_step0062500.pt", "timestamp": "2026-05-05T03:03:28.146235"}
120
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 63000, "path": "artifacts/c2/checkpoints/ckpt_step0063000.pt", "timestamp": "2026-05-05T03:07:02.081683"}
121
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 63000, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-05T03:07:02.997468"}
122
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 63500, "path": "artifacts/c2/checkpoints/ckpt_step0063500.pt", "timestamp": "2026-05-05T03:10:40.623001"}
123
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 64000, "path": "artifacts/c2/checkpoints/ckpt_step0064000.pt", "timestamp": "2026-05-05T03:14:10.206324"}
124
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 64500, "path": "artifacts/c2/checkpoints/ckpt_step0064500.pt", "timestamp": "2026-05-05T03:17:40.773151"}
125
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 65000, "path": "artifacts/c2/checkpoints/ckpt_step0065000.pt", "timestamp": "2026-05-05T03:21:10.271135"}
126
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 65500, "path": "artifacts/c2/checkpoints/ckpt_step0065500.pt", "timestamp": "2026-05-05T03:24:39.893399"}
127
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 66000, "path": "artifacts/c2/checkpoints/ckpt_step0066000.pt", "timestamp": "2026-05-05T03:28:10.440559"}
128
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 66500, "path": "artifacts/c2/checkpoints/ckpt_step0066500.pt", "timestamp": "2026-05-05T03:31:41.292818"}
129
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 67000, "path": "artifacts/c2/checkpoints/ckpt_step0067000.pt", "timestamp": "2026-05-05T03:35:11.743968"}
130
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 67500, "path": "artifacts/c2/checkpoints/ckpt_step0067500.pt", "timestamp": "2026-05-05T03:38:41.753218"}
131
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 68000, "path": "artifacts/c2/checkpoints/ckpt_step0068000.pt", "timestamp": "2026-05-05T03:42:11.607967"}
132
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 68500, "path": "artifacts/c2/checkpoints/ckpt_step0068500.pt", "timestamp": "2026-05-05T03:45:40.732238"}
133
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 69000, "path": "artifacts/c2/checkpoints/ckpt_step0069000.pt", "timestamp": "2026-05-05T03:49:09.216500"}
134
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 69500, "path": "artifacts/c2/checkpoints/ckpt_step0069500.pt", "timestamp": "2026-05-05T03:52:38.451307"}
135
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 70000, "path": "artifacts/c2/checkpoints/ckpt_step0070000.pt", "timestamp": "2026-05-05T03:56:07.767576"}
136
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 70500, "path": "artifacts/c2/checkpoints/ckpt_step0070500.pt", "timestamp": "2026-05-05T03:59:36.145726"}
137
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 71000, "path": "artifacts/c2/checkpoints/ckpt_step0071000.pt", "timestamp": "2026-05-05T04:03:05.144792"}
138
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 71500, "path": "artifacts/c2/checkpoints/ckpt_step0071500.pt", "timestamp": "2026-05-05T04:06:34.400719"}
139
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 72000, "path": "artifacts/c2/checkpoints/ckpt_step0072000.pt", "timestamp": "2026-05-05T04:10:03.602582"}
140
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 72500, "path": "artifacts/c2/checkpoints/ckpt_step0072500.pt", "timestamp": "2026-05-05T04:13:33.400362"}
141
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 73000, "path": "artifacts/c2/checkpoints/ckpt_step0073000.pt", "timestamp": "2026-05-05T04:17:02.914126"}
142
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 73500, "path": "artifacts/c2/checkpoints/ckpt_step0073500.pt", "timestamp": "2026-05-05T04:20:32.300875"}
143
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 74000, "path": "artifacts/c2/checkpoints/ckpt_step0074000.pt", "timestamp": "2026-05-05T04:24:02.049636"}
144
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 74500, "path": "artifacts/c2/checkpoints/ckpt_step0074500.pt", "timestamp": "2026-05-05T04:27:31.200827"}
145
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 75000, "path": "artifacts/c2/checkpoints/ckpt_step0075000.pt", "timestamp": "2026-05-05T04:31:00.636999"}
146
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 75500, "path": "artifacts/c2/checkpoints/ckpt_step0075500.pt", "timestamp": "2026-05-05T04:34:29.719933"}
147
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 76000, "path": "artifacts/c2/checkpoints/ckpt_step0076000.pt", "timestamp": "2026-05-05T04:37:59.679950"}
148
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 76500, "path": "artifacts/c2/checkpoints/ckpt_step0076500.pt", "timestamp": "2026-05-05T04:41:29.125304"}
149
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 77000, "path": "artifacts/c2/checkpoints/ckpt_step0077000.pt", "timestamp": "2026-05-05T04:44:58.189302"}
150
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 77500, "path": "artifacts/c2/checkpoints/ckpt_step0077500.pt", "timestamp": "2026-05-05T04:48:27.045411"}
151
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 78000, "path": "artifacts/c2/checkpoints/ckpt_step0078000.pt", "timestamp": "2026-05-05T04:51:55.803709"}
152
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 78500, "path": "artifacts/c2/checkpoints/ckpt_step0078500.pt", "timestamp": "2026-05-05T04:55:24.816039"}
153
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 79000, "path": "artifacts/c2/checkpoints/ckpt_step0079000.pt", "timestamp": "2026-05-05T04:58:53.749490"}
154
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 79500, "path": "artifacts/c2/checkpoints/ckpt_step0079500.pt", "timestamp": "2026-05-05T05:02:23.187466"}
155
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 80000, "path": "artifacts/c2/checkpoints/ckpt_step0080000.pt", "timestamp": "2026-05-05T05:05:51.789635"}
156
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 80500, "path": "artifacts/c2/checkpoints/ckpt_step0080500.pt", "timestamp": "2026-05-05T05:09:20.988700"}
157
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 81000, "path": "artifacts/c2/checkpoints/ckpt_step0081000.pt", "timestamp": "2026-05-05T05:12:50.742366"}
158
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 81500, "path": "artifacts/c2/checkpoints/ckpt_step0081500.pt", "timestamp": "2026-05-05T05:16:18.314073"}
159
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 82000, "path": "artifacts/c2/checkpoints/ckpt_step0082000.pt", "timestamp": "2026-05-05T05:19:45.731644"}
160
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 82500, "path": "artifacts/c2/checkpoints/ckpt_step0082500.pt", "timestamp": "2026-05-05T05:23:13.590769"}
161
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 83000, "path": "artifacts/c2/checkpoints/ckpt_step0083000.pt", "timestamp": "2026-05-05T05:26:41.488695"}
162
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 83500, "path": "artifacts/c2/checkpoints/ckpt_step0083500.pt", "timestamp": "2026-05-05T05:30:10.015029"}
163
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 84000, "path": "artifacts/c2/checkpoints/ckpt_step0084000.pt", "timestamp": "2026-05-05T05:33:38.459867"}
164
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 84500, "path": "artifacts/c2/checkpoints/ckpt_step0084500.pt", "timestamp": "2026-05-05T05:37:07.695034"}
165
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 85000, "path": "artifacts/c2/checkpoints/ckpt_step0085000.pt", "timestamp": "2026-05-05T05:40:35.566712"}
166
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 85500, "path": "artifacts/c2/checkpoints/ckpt_step0085500.pt", "timestamp": "2026-05-05T05:44:04.855637"}
167
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 86000, "path": "artifacts/c2/checkpoints/ckpt_step0086000.pt", "timestamp": "2026-05-05T05:47:32.968873"}
168
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 86500, "path": "artifacts/c2/checkpoints/ckpt_step0086500.pt", "timestamp": "2026-05-05T05:51:01.778350"}
169
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 87000, "path": "artifacts/c2/checkpoints/ckpt_step0087000.pt", "timestamp": "2026-05-05T05:54:30.481515"}
170
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 87500, "path": "artifacts/c2/checkpoints/ckpt_step0087500.pt", "timestamp": "2026-05-05T05:58:00.050529"}
171
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 87500, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-05T05:58:00.987741"}
172
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 88000, "path": "artifacts/c2/checkpoints/ckpt_step0088000.pt", "timestamp": "2026-05-05T06:01:29.694602"}
173
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 88500, "path": "artifacts/c2/checkpoints/ckpt_step0088500.pt", "timestamp": "2026-05-05T06:04:58.783944"}
174
+ {"run_name": "c2", "stage": "pretraining", "event": "best_checkpoint_saved", "step": 88500, "path": "artifacts/c2/checkpoints/best_ckpt.pt", "timestamp": "2026-05-05T06:04:59.598538"}
175
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 89000, "path": "artifacts/c2/checkpoints/ckpt_step0089000.pt", "timestamp": "2026-05-05T06:08:28.132788"}
176
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 89500, "path": "artifacts/c2/checkpoints/ckpt_step0089500.pt", "timestamp": "2026-05-05T06:11:56.919763"}
177
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 90000, "path": "artifacts/c2/checkpoints/ckpt_step0090000.pt", "timestamp": "2026-05-05T06:15:24.500092"}
178
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 90500, "path": "artifacts/c2/checkpoints/ckpt_step0090500.pt", "timestamp": "2026-05-05T06:18:52.669982"}
179
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 91000, "path": "artifacts/c2/checkpoints/ckpt_step0091000.pt", "timestamp": "2026-05-05T06:22:21.345356"}
180
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 91500, "path": "artifacts/c2/checkpoints/ckpt_step0091500.pt", "timestamp": "2026-05-05T06:25:49.995493"}
181
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 92000, "path": "artifacts/c2/checkpoints/ckpt_step0092000.pt", "timestamp": "2026-05-05T06:29:18.544358"}
182
+ {"run_name": "c2", "stage": "pretraining", "event": "checkpoint_saved", "step": 92500, "path": "artifacts/c2/checkpoints/ckpt_step0092500.pt", "timestamp": "2026-05-05T06:32:47.267711"}
183
+ {"run_name": "c2", "stage": "pretraining", "event": "final_checkpoint_saved", "step": 92685, "path": "artifacts/c2/checkpoints/ckpt_step0092685.pt", "best_val_loss_so_far": 3.3613099455833435, "timestamp": "2026-05-05T06:34:04.509790"}
184
+ {"run_name": "c2", "stage": "pretraining", "event": "metrics_plot_saved", "path": "artifacts/c2/metrics.png", "timestamp": "2026-05-05T06:34:06.673898"}
185
+ {"run_name": "c2", "stage": "pretraining", "event": "results_doc_saved", "path": "artifacts/c2/results.md", "timestamp": "2026-05-05T06:34:06.674076"}
c2/logs/pretraining_20260504_205134.log ADDED
@@ -0,0 +1,305 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-05-04 20:51:34,651 | INFO | === Raw Model ===
2
+ GPT(
3
+ (transformer): ModuleDict(
4
+ (drop): Dropout(p=0.0, inplace=False)
5
+ (h): ModuleList(
6
+ (0-17): 18 x Block(
7
+ (ln_1): RMSNorm()
8
+ (attn): CausalSelfAttention(
9
+ (rotary): RotaryEmbedding()
10
+ (q_proj): Linear(in_features=320, out_features=320, bias=False)
11
+ (k_proj): Linear(in_features=320, out_features=64, bias=False)
12
+ (v_proj): Linear(in_features=320, out_features=64, bias=False)
13
+ (c_proj): Linear(in_features=320, out_features=320, bias=False)
14
+ (resid_dropout): Dropout(p=0.0, inplace=False)
15
+ )
16
+ (ln_2): RMSNorm()
17
+ (mlp): MLP(
18
+ (c_fc): Linear(in_features=320, out_features=2048, bias=False)
19
+ (c_proj): Linear(in_features=1024, out_features=320, bias=False)
20
+ (dropout): Dropout(p=0.0, inplace=False)
21
+ )
22
+ )
23
+ )
24
+ (ln_f): RMSNorm()
25
+ (wte): Embedding(50304, 320)
26
+ )
27
+ (lm_head): Linear(in_features=320, out_features=50304, bias=False)
28
+ )
29
+
30
+ === Forward Summary (torchinfo, uncompiled model) ===
31
+ ====================================================================================================
32
+ Layer (type:depth-idx) Output Shape Param #
33
+ ====================================================================================================
34
+ GPT [1, 1, 50304] --
35
+ ├─ModuleDict: 1-1 -- --
36
+ │ └─Embedding: 2-1 [1, 1024, 320] 16,097,280
37
+ │ └─Dropout: 2-2 [1, 1024, 320] --
38
+ │ └─ModuleList: 2-3 -- --
39
+ │ │ └─Block: 3-1 [1, 1024, 320] --
40
+ │ │ │ └─RMSNorm: 4-1 [1, 1024, 320] 320
41
+ │ │ │ └─CausalSelfAttention: 4-2 [1, 1024, 320] --
42
+ │ │ │ │ └─Linear: 5-1 [1, 1024, 320] 102,400
43
+ │ │ │ │ └─Linear: 5-2 [1, 1024, 64] 20,480
44
+ │ │ │ │ └─Linear: 5-3 [1, 1024, 64] 20,480
45
+ │ │ │ │ └─RotaryEmbedding: 5-4 [1, 1, 1024, 64] --
46
+ │ │ │ │ └─Linear: 5-5 [1, 1024, 320] 102,400
47
+ │ │ │ │ └─Dropout: 5-6 [1, 1024, 320] --
48
+ │ │ │ └─RMSNorm: 4-3 [1, 1024, 320] 320
49
+ │ │ │ └─MLP: 4-4 [1, 1024, 320] --
50
+ │ │ │ │ └─Linear: 5-7 [1, 1024, 2048] 655,360
51
+ │ │ │ │ └─Linear: 5-8 [1, 1024, 320] 327,680
52
+ │ │ │ │ └─Dropout: 5-9 [1, 1024, 320] --
53
+ │ │ └─Block: 3-2 [1, 1024, 320] --
54
+ │ │ │ └─RMSNorm: 4-5 [1, 1024, 320] 320
55
+ │ │ │ └─CausalSelfAttention: 4-6 [1, 1024, 320] --
56
+ │ │ │ │ └─Linear: 5-10 [1, 1024, 320] 102,400
57
+ │ │ │ │ └─Linear: 5-11 [1, 1024, 64] 20,480
58
+ │ │ │ │ └─Linear: 5-12 [1, 1024, 64] 20,480
59
+ │ │ │ │ └─RotaryEmbedding: 5-13 [1, 1, 1024, 64] --
60
+ │ │ │ │ └─Linear: 5-14 [1, 1024, 320] 102,400
61
+ │ │ │ │ └─Dropout: 5-15 [1, 1024, 320] --
62
+ │ │ │ └─RMSNorm: 4-7 [1, 1024, 320] 320
63
+ │ │ │ └─MLP: 4-8 [1, 1024, 320] --
64
+ │ │ │ │ └─Linear: 5-16 [1, 1024, 2048] 655,360
65
+ │ │ │ │ └─Linear: 5-17 [1, 1024, 320] 327,680
66
+ │ │ │ │ └─Dropout: 5-18 [1, 1024, 320] --
67
+ │ │ └─Block: 3-3 [1, 1024, 320] --
68
+ │ │ │ └─RMSNorm: 4-9 [1, 1024, 320] 320
69
+ │ │ │ └─CausalSelfAttention: 4-10 [1, 1024, 320] --
70
+ │ │ │ │ └─Linear: 5-19 [1, 1024, 320] 102,400
71
+ │ │ │ │ └─Linear: 5-20 [1, 1024, 64] 20,480
72
+ │ │ │ │ └─Linear: 5-21 [1, 1024, 64] 20,480
73
+ │ │ │ │ └─RotaryEmbedding: 5-22 [1, 1, 1024, 64] --
74
+ │ │ │ │ └─Linear: 5-23 [1, 1024, 320] 102,400
75
+ │ │ │ │ └─Dropout: 5-24 [1, 1024, 320] --
76
+ │ │ │ └─RMSNorm: 4-11 [1, 1024, 320] 320
77
+ │ │ │ └─MLP: 4-12 [1, 1024, 320] --
78
+ │ │ │ │ └─Linear: 5-25 [1, 1024, 2048] 655,360
79
+ │ │ │ │ └─Linear: 5-26 [1, 1024, 320] 327,680
80
+ │ │ │ │ └─Dropout: 5-27 [1, 1024, 320] --
81
+ │ │ └─Block: 3-4 [1, 1024, 320] --
82
+ │ │ │ └─RMSNorm: 4-13 [1, 1024, 320] 320
83
+ │ │ │ └─CausalSelfAttention: 4-14 [1, 1024, 320] --
84
+ │ │ │ │ └─Linear: 5-28 [1, 1024, 320] 102,400
85
+ │ │ │ │ └─Linear: 5-29 [1, 1024, 64] 20,480
86
+ │ │ │ │ └─Linear: 5-30 [1, 1024, 64] 20,480
87
+ │ │ │ │ └─RotaryEmbedding: 5-31 [1, 1, 1024, 64] --
88
+ │ │ │ │ └─Linear: 5-32 [1, 1024, 320] 102,400
89
+ │ │ │ │ └─Dropout: 5-33 [1, 1024, 320] --
90
+ │ │ │ └─RMSNorm: 4-15 [1, 1024, 320] 320
91
+ │ │ │ └─MLP: 4-16 [1, 1024, 320] --
92
+ │ │ │ │ └─Linear: 5-34 [1, 1024, 2048] 655,360
93
+ │ │ │ │ └─Linear: 5-35 [1, 1024, 320] 327,680
94
+ │ │ │ │ └─Dropout: 5-36 [1, 1024, 320] --
95
+ │ │ └─Block: 3-5 [1, 1024, 320] --
96
+ │ │ │ └─RMSNorm: 4-17 [1, 1024, 320] 320
97
+ │ │ │ └─CausalSelfAttention: 4-18 [1, 1024, 320] --
98
+ │ │ │ │ └─Linear: 5-37 [1, 1024, 320] 102,400
99
+ │ │ │ │ └─Linear: 5-38 [1, 1024, 64] 20,480
100
+ │ │ │ │ └─Linear: 5-39 [1, 1024, 64] 20,480
101
+ │ │ │ │ └─RotaryEmbedding: 5-40 [1, 1, 1024, 64] --
102
+ │ │ │ │ └─Linear: 5-41 [1, 1024, 320] 102,400
103
+ │ │ │ │ └─Dropout: 5-42 [1, 1024, 320] --
104
+ │ │ │ └─RMSNorm: 4-19 [1, 1024, 320] 320
105
+ │ │ │ └─MLP: 4-20 [1, 1024, 320] --
106
+ │ │ │ │ └─Linear: 5-43 [1, 1024, 2048] 655,360
107
+ │ │ │ │ └─Linear: 5-44 [1, 1024, 320] 327,680
108
+ │ │ │ │ └─Dropout: 5-45 [1, 1024, 320] --
109
+ │ │ └─Block: 3-6 [1, 1024, 320] --
110
+ │ │ │ └─RMSNorm: 4-21 [1, 1024, 320] 320
111
+ │ │ │ └─CausalSelfAttention: 4-22 [1, 1024, 320] --
112
+ │ │ │ │ └─Linear: 5-46 [1, 1024, 320] 102,400
113
+ │ │ │ │ └─Linear: 5-47 [1, 1024, 64] 20,480
114
+ │ │ │ │ └─Linear: 5-48 [1, 1024, 64] 20,480
115
+ │ │ │ │ └─RotaryEmbedding: 5-49 [1, 1, 1024, 64] --
116
+ │ │ │ │ └─Linear: 5-50 [1, 1024, 320] 102,400
117
+ │ │ │ │ └─Dropout: 5-51 [1, 1024, 320] --
118
+ │ │ │ └─RMSNorm: 4-23 [1, 1024, 320] 320
119
+ │ │ │ └─MLP: 4-24 [1, 1024, 320] --
120
+ │ │ │ │ └─Linear: 5-52 [1, 1024, 2048] 655,360
121
+ │ │ │ │ └─Linear: 5-53 [1, 1024, 320] 327,680
122
+ │ │ │ │ └─Dropout: 5-54 [1, 1024, 320] --
123
+ │ │ └─Block: 3-7 [1, 1024, 320] --
124
+ │ │ │ └─RMSNorm: 4-25 [1, 1024, 320] 320
125
+ │ │ │ └─CausalSelfAttention: 4-26 [1, 1024, 320] --
126
+ │ │ │ │ └─Linear: 5-55 [1, 1024, 320] 102,400
127
+ │ │ │ │ └─Linear: 5-56 [1, 1024, 64] 20,480
128
+ │ │ │ │ └─Linear: 5-57 [1, 1024, 64] 20,480
129
+ │ │ │ │ └─RotaryEmbedding: 5-58 [1, 1, 1024, 64] --
130
+ │ │ │ │ └─Linear: 5-59 [1, 1024, 320] 102,400
131
+ │ │ │ │ └─Dropout: 5-60 [1, 1024, 320] --
132
+ │ │ │ └─RMSNorm: 4-27 [1, 1024, 320] 320
133
+ │ │ │ └─MLP: 4-28 [1, 1024, 320] --
134
+ │ │ │ │ └─Linear: 5-61 [1, 1024, 2048] 655,360
135
+ │ │ │ │ └─Linear: 5-62 [1, 1024, 320] 327,680
136
+ │ │ │ │ └─Dropout: 5-63 [1, 1024, 320] --
137
+ │ │ └─Block: 3-8 [1, 1024, 320] --
138
+ │ │ │ └─RMSNorm: 4-29 [1, 1024, 320] 320
139
+ │ │ │ └─CausalSelfAttention: 4-30 [1, 1024, 320] --
140
+ │ │ │ │ └─Linear: 5-64 [1, 1024, 320] 102,400
141
+ │ │ │ │ └─Linear: 5-65 [1, 1024, 64] 20,480
142
+ │ │ │ │ └─Linear: 5-66 [1, 1024, 64] 20,480
143
+ │ │ │ │ └─RotaryEmbedding: 5-67 [1, 1, 1024, 64] --
144
+ │ │ │ │ └─Linear: 5-68 [1, 1024, 320] 102,400
145
+ │ │ │ │ └─Dropout: 5-69 [1, 1024, 320] --
146
+ │ │ │ └─RMSNorm: 4-31 [1, 1024, 320] 320
147
+ │ │ │ └─MLP: 4-32 [1, 1024, 320] --
148
+ │ │ │ │ └─Linear: 5-70 [1, 1024, 2048] 655,360
149
+ │ │ │ │ └─Linear: 5-71 [1, 1024, 320] 327,680
150
+ │ │ │ │ └─Dropout: 5-72 [1, 1024, 320] --
151
+ │ │ └─Block: 3-9 [1, 1024, 320] --
152
+ │ │ │ └─RMSNorm: 4-33 [1, 1024, 320] 320
153
+ │ │ │ └─CausalSelfAttention: 4-34 [1, 1024, 320] --
154
+ │ │ │ │ └─Linear: 5-73 [1, 1024, 320] 102,400
155
+ │ │ │ │ └─Linear: 5-74 [1, 1024, 64] 20,480
156
+ │ │ │ │ └─Linear: 5-75 [1, 1024, 64] 20,480
157
+ │ │ │ │ └─RotaryEmbedding: 5-76 [1, 1, 1024, 64] --
158
+ │ │ │ │ └─Linear: 5-77 [1, 1024, 320] 102,400
159
+ │ │ │ │ └─Dropout: 5-78 [1, 1024, 320] --
160
+ │ │ │ └─RMSNorm: 4-35 [1, 1024, 320] 320
161
+ │ │ │ └─MLP: 4-36 [1, 1024, 320] --
162
+ │ │ │ │ └─Linear: 5-79 [1, 1024, 2048] 655,360
163
+ │ │ │ │ └─Linear: 5-80 [1, 1024, 320] 327,680
164
+ │ │ │ │ └─Dropout: 5-81 [1, 1024, 320] --
165
+ │ │ └─Block: 3-10 [1, 1024, 320] --
166
+ │ │ │ └─RMSNorm: 4-37 [1, 1024, 320] 320
167
+ │ │ │ └─CausalSelfAttention: 4-38 [1, 1024, 320] --
168
+ │ │ │ │ └─Linear: 5-82 [1, 1024, 320] 102,400
169
+ │ │ │ │ └─Linear: 5-83 [1, 1024, 64] 20,480
170
+ │ │ │ │ └─Linear: 5-84 [1, 1024, 64] 20,480
171
+ │ │ │ │ └─RotaryEmbedding: 5-85 [1, 1, 1024, 64] --
172
+ │ │ │ │ └─Linear: 5-86 [1, 1024, 320] 102,400
173
+ │ │ │ │ └─Dropout: 5-87 [1, 1024, 320] --
174
+ │ │ │ └─RMSNorm: 4-39 [1, 1024, 320] 320
175
+ │ │ │ └─MLP: 4-40 [1, 1024, 320] --
176
+ │ │ │ │ └─Linear: 5-88 [1, 1024, 2048] 655,360
177
+ │ │ │ │ └─Linear: 5-89 [1, 1024, 320] 327,680
178
+ │ │ │ │ └─Dropout: 5-90 [1, 1024, 320] --
179
+ │ │ └─Block: 3-11 [1, 1024, 320] --
180
+ │ │ │ └─RMSNorm: 4-41 [1, 1024, 320] 320
181
+ │ │ │ └─CausalSelfAttention: 4-42 [1, 1024, 320] --
182
+ │ │ │ │ └─Linear: 5-91 [1, 1024, 320] 102,400
183
+ │ │ │ │ └─Linear: 5-92 [1, 1024, 64] 20,480
184
+ │ │ │ │ └─Linear: 5-93 [1, 1024, 64] 20,480
185
+ │ │ │ │ └─RotaryEmbedding: 5-94 [1, 1, 1024, 64] --
186
+ │ │ │ │ └─Linear: 5-95 [1, 1024, 320] 102,400
187
+ │ │ │ │ └─Dropout: 5-96 [1, 1024, 320] --
188
+ │ │ │ └─RMSNorm: 4-43 [1, 1024, 320] 320
189
+ │ │ │ └─MLP: 4-44 [1, 1024, 320] --
190
+ │ │ │ │ └─Linear: 5-97 [1, 1024, 2048] 655,360
191
+ │ │ │ │ └─Linear: 5-98 [1, 1024, 320] 327,680
192
+ │ │ │ │ └─Dropout: 5-99 [1, 1024, 320] --
193
+ │ │ └─Block: 3-12 [1, 1024, 320] --
194
+ │ │ │ └─RMSNorm: 4-45 [1, 1024, 320] 320
195
+ │ │ │ └─CausalSelfAttention: 4-46 [1, 1024, 320] --
196
+ │ │ │ │ └─Linear: 5-100 [1, 1024, 320] 102,400
197
+ │ │ │ │ └─Linear: 5-101 [1, 1024, 64] 20,480
198
+ │ │ │ │ └─Linear: 5-102 [1, 1024, 64] 20,480
199
+ │ │ │ │ └─RotaryEmbedding: 5-103 [1, 1, 1024, 64] --
200
+ │ │ │ │ └─Linear: 5-104 [1, 1024, 320] 102,400
201
+ │ │ │ │ └─Dropout: 5-105 [1, 1024, 320] --
202
+ │ │ │ └─RMSNorm: 4-47 [1, 1024, 320] 320
203
+ │ │ │ └─MLP: 4-48 [1, 1024, 320] --
204
+ │ │ │ │ └─Linear: 5-106 [1, 1024, 2048] 655,360
205
+ │ │ │ │ └─Linear: 5-107 [1, 1024, 320] 327,680
206
+ │ │ │ │ └─Dropout: 5-108 [1, 1024, 320] --
207
+ │ │ └─Block: 3-13 [1, 1024, 320] --
208
+ │ │ │ └─RMSNorm: 4-49 [1, 1024, 320] 320
209
+ │ │ │ └─CausalSelfAttention: 4-50 [1, 1024, 320] --
210
+ │ │ │ │ └─Linear: 5-109 [1, 1024, 320] 102,400
211
+ │ │ │ │ └─Linear: 5-110 [1, 1024, 64] 20,480
212
+ │ │ │ │ └─Linear: 5-111 [1, 1024, 64] 20,480
213
+ │ │ │ │ └─RotaryEmbedding: 5-112 [1, 1, 1024, 64] --
214
+ │ │ │ │ └─Linear: 5-113 [1, 1024, 320] 102,400
215
+ │ │ │ │ └─Dropout: 5-114 [1, 1024, 320] --
216
+ │ │ │ └─RMSNorm: 4-51 [1, 1024, 320] 320
217
+ │ │ │ └─MLP: 4-52 [1, 1024, 320] --
218
+ │ │ │ │ └─Linear: 5-115 [1, 1024, 2048] 655,360
219
+ │ │ │ │ └─Linear: 5-116 [1, 1024, 320] 327,680
220
+ │ │ │ │ └─Dropout: 5-117 [1, 1024, 320] --
221
+ │ │ └─Block: 3-14 [1, 1024, 320] --
222
+ │ │ │ └─RMSNorm: 4-53 [1, 1024, 320] 320
223
+ │ │ │ └─CausalSelfAttention: 4-54 [1, 1024, 320] --
224
+ │ │ │ │ └─Linear: 5-118 [1, 1024, 320] 102,400
225
+ │ │ │ │ └─Linear: 5-119 [1, 1024, 64] 20,480
226
+ │ │ │ │ └─Linear: 5-120 [1, 1024, 64] 20,480
227
+ │ │ │ │ └─RotaryEmbedding: 5-121 [1, 1, 1024, 64] --
228
+ │ │ │ │ └─Linear: 5-122 [1, 1024, 320] 102,400
229
+ │ │ │ │ └─Dropout: 5-123 [1, 1024, 320] --
230
+ │ │ │ └─RMSNorm: 4-55 [1, 1024, 320] 320
231
+ │ │ │ └─MLP: 4-56 [1, 1024, 320] --
232
+ │ │ │ │ └─Linear: 5-124 [1, 1024, 2048] 655,360
233
+ │ │ │ │ └─Linear: 5-125 [1, 1024, 320] 327,680
234
+ │ │ │ │ └─Dropout: 5-126 [1, 1024, 320] --
235
+ │ │ └─Block: 3-15 [1, 1024, 320] --
236
+ │ │ │ └─RMSNorm: 4-57 [1, 1024, 320] 320
237
+ │ │ │ └─CausalSelfAttention: 4-58 [1, 1024, 320] --
238
+ │ │ │ │ └─Linear: 5-127 [1, 1024, 320] 102,400
239
+ │ │ │ │ └─Linear: 5-128 [1, 1024, 64] 20,480
240
+ │ │ │ │ └─Linear: 5-129 [1, 1024, 64] 20,480
241
+ │ │ │ │ └─RotaryEmbedding: 5-130 [1, 1, 1024, 64] --
242
+ │ │ │ │ └─Linear: 5-131 [1, 1024, 320] 102,400
243
+ │ │ │ │ └─Dropout: 5-132 [1, 1024, 320] --
244
+ │ │ │ └─RMSNorm: 4-59 [1, 1024, 320] 320
245
+ │ │ │ └─MLP: 4-60 [1, 1024, 320] --
246
+ │ │ │ │ └─Linear: 5-133 [1, 1024, 2048] 655,360
247
+ │ │ │ │ └─Linear: 5-134 [1, 1024, 320] 327,680
248
+ │ │ │ │ └─Dropout: 5-135 [1, 1024, 320] --
249
+ │ │ └─Block: 3-16 [1, 1024, 320] --
250
+ │ │ │ └─RMSNorm: 4-61 [1, 1024, 320] 320
251
+ │ │ │ └─CausalSelfAttention: 4-62 [1, 1024, 320] --
252
+ │ │ │ │ └─Linear: 5-136 [1, 1024, 320] 102,400
253
+ │ │ │ │ └─Linear: 5-137 [1, 1024, 64] 20,480
254
+ │ │ │ │ └─Linear: 5-138 [1, 1024, 64] 20,480
255
+ │ │ │ │ └─RotaryEmbedding: 5-139 [1, 1, 1024, 64] --
256
+ │ │ │ │ └─Linear: 5-140 [1, 1024, 320] 102,400
257
+ │ │ │ │ └─Dropout: 5-141 [1, 1024, 320] --
258
+ │ │ │ └─RMSNorm: 4-63 [1, 1024, 320] 320
259
+ │ │ │ └─MLP: 4-64 [1, 1024, 320] --
260
+ │ │ │ │ └─Linear: 5-142 [1, 1024, 2048] 655,360
261
+ │ │ │ │ └─Linear: 5-143 [1, 1024, 320] 327,680
262
+ │ │ │ │ └─Dropout: 5-144 [1, 1024, 320] --
263
+ │ │ └─Block: 3-17 [1, 1024, 320] --
264
+ │ │ │ └─RMSNorm: 4-65 [1, 1024, 320] 320
265
+ │ │ │ └─CausalSelfAttention: 4-66 [1, 1024, 320] --
266
+ │ │ │ │ └─Linear: 5-145 [1, 1024, 320] 102,400
267
+ │ │ │ │ └─Linear: 5-146 [1, 1024, 64] 20,480
268
+ │ │ │ │ └─Linear: 5-147 [1, 1024, 64] 20,480
269
+ │ │ │ │ └─RotaryEmbedding: 5-148 [1, 1, 1024, 64] --
270
+ │ │ │ │ └─Linear: 5-149 [1, 1024, 320] 102,400
271
+ │ │ │ │ └─Dropout: 5-150 [1, 1024, 320] --
272
+ │ │ │ └─RMSNorm: 4-67 [1, 1024, 320] 320
273
+ │ │ │ └─MLP: 4-68 [1, 1024, 320] --
274
+ │ │ │ │ └─Linear: 5-151 [1, 1024, 2048] 655,360
275
+ │ │ │ │ └─Linear: 5-152 [1, 1024, 320] 327,680
276
+ │ │ │ │ └─Dropout: 5-153 [1, 1024, 320] --
277
+ │ │ └─Block: 3-18 [1, 1024, 320] --
278
+ │ │ │ └─RMSNorm: 4-69 [1, 1024, 320] 320
279
+ │ │ │ └─CausalSelfAttention: 4-70 [1, 1024, 320] --
280
+ │ │ │ │ └─Linear: 5-154 [1, 1024, 320] 102,400
281
+ │ │ │ │ └─Linear: 5-155 [1, 1024, 64] 20,480
282
+ │ │ │ │ └─Linear: 5-156 [1, 1024, 64] 20,480
283
+ │ │ │ │ └─RotaryEmbedding: 5-157 [1, 1, 1024, 64] --
284
+ │ │ │ │ └─Linear: 5-158 [1, 1024, 320] 102,400
285
+ │ │ │ │ └─Dropout: 5-159 [1, 1024, 320] --
286
+ │ │ │ └─RMSNorm: 4-71 [1, 1024, 320] 320
287
+ │ │ │ └─MLP: 4-72 [1, 1024, 320] --
288
+ │ │ │ │ └─Linear: 5-160 [1, 1024, 2048] 655,360
289
+ │ │ │ │ └─Linear: 5-161 [1, 1024, 320] 327,680
290
+ │ │ │ │ └─Dropout: 5-162 [1, 1024, 320] --
291
+ │ └─RMSNorm: 2-4 [1, 1024, 320] 320
292
+ ├─Linear: 1-2 [1, 1, 50304] 16,097,280
293
+ ====================================================================================================
294
+
295
+ === Parameter Counts (unique tensors) ===
296
+ Total params: 38,227,520
297
+ Trainable params: 38,227,520
298
+ Weight tying (wte = lm_head): True
299
+ Embedding mode: standard tied token embedding
300
+ Note: module-level torchinfo totals may double-count the tied LM head; use the unique counts above.
301
+ 2026-05-04 20:51:34,748 | INFO | === Pretraining Started ===
302
+ 2026-05-04 20:51:34,749 | INFO | Device: cuda | dtype: float16 | distributed: False (world_size=1)
303
+ 2026-05-04 20:51:34,749 | INFO | Model: 18 layers, 5 heads, 320 embd, context_len=1024
304
+ 2026-05-04 20:51:34,749 | INFO | Training: max_iters=11586, batch_size=4, grad_accum=16, lr=1.20e-03, warmup=116 steps
305
+ 2026-05-04 20:51:34,749 | INFO | Data: 6074221786 train tokens | tokens/step=65536
c2/logs/pretraining_20260504_205550.log ADDED
The diff for this file is too large to render. See raw diff
 
c2/metrics.png ADDED

Git LFS Details

  • SHA256: d1117017ca3b0604eb5200d5b3c741c7e2b6f8c915fda44111c5433093af4804
  • Pointer size: 131 Bytes
  • Size of remote file: 376 kB
c2/results.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Results: c2
2
+
3
+ Automatically generated after pretraining.
4
+
5
+ ## Summary
6
+ - Model: `18L / 5H / 320d`
7
+ - Total parameters: `38227520`
8
+ - Last logged train step: `92680`
9
+ - Best validation loss: `3.3613`
10
+ - Best validation perplexity: `28.83`
11
+ - Last validation step: `92500`
12
+ - Learning rate: `0.0012`
13
+ - Effective tokens/update: `65536`
14
+
15
+ ## Files
16
+ - [Config snapshot](config_snapshot.json)
17
+ - [Train metrics](train_metrics.jsonl)
18
+ - [Eval metrics](eval_metrics.jsonl)
19
+ - [Events](events.jsonl)
20
+ - [Metrics plot](metrics.png)
21
+
22
+ ## Metrics Plot
23
+
24
+ ![Metrics plot](metrics.png)
c2/train_metrics.jsonl ADDED
The diff for this file is too large to render. See raw diff