WCNegentropy commited on
Commit
094ed24
·
verified ·
1 Parent(s): 8b6adb7

Remove nested directory: BitTransformerLM/tests/TEST_RESULTS.md

Browse files
BitTransformerLM/tests/TEST_RESULTS.md DELETED
@@ -1,578 +0,0 @@
1
- # Test Results
2
-
3
- ## Automated Tests
4
- - `pytest -q`: all tests passed.
5
-
6
- ```
7
- .... [100%]
8
- 4 passed in 5.28s
9
- ```
10
-
11
- ## Example Script
12
- - `python example.py` executed successfully:
13
-
14
- ```
15
- Training loss: 0.8508605360984802
16
- Available telemetry: ['activations', 'attention_maps', 'entropy', 'negentropy', 'lz_complexity', 'symbiosis_score']
17
- ```
18
-
19
- ## Progressive Scale-Up
20
- - `python progressive_scaleup.py` (default steps=2) produced:
21
-
22
- ```
23
- Step 0 validation loss: 0.7001
24
- Step 1 validation loss: 0.6954
25
- ```
26
-
27
- ## Text Inference
28
- - Running `infer_text` on a short string returned the input text without errors:
29
-
30
- ```
31
- hi
32
- ```
33
-
34
- ## Extended Scaling Test
35
- Installed torch and ran `python progressive_scaleup.py --steps 4`:
36
-
37
- ```
38
- Step 0 validation loss: 0.6970
39
- Step 1 validation loss: 0.6915
40
- Step 2 validation loss: 0.7022
41
- Step 3 validation loss: 0.7123
42
- ```
43
-
44
- ## Collapse Test
45
- Running a minimal `collapse_submodel` example produced a 2-layer model without errors:
46
-
47
- ```
48
- collapsed_layers 2
49
- ```
50
-
51
-
52
- ## Stress Test 2025
53
- - `pip install -r requirements.txt` succeeded.
54
- - `pytest -q` reported:
55
- ```
56
- 10 passed, 1 skipped
57
- ```
58
-
59
- ### Large Scale-Up
60
- Ran `python progressive_scaleup.py --steps 8 --eps 0.70`:
61
- ```
62
- Step 0 validation loss: 0.7053
63
- Step 1 validation loss: 0.6945
64
- Scaled model to 2 layers and width 32
65
- Step 2 validation loss: 0.6953
66
- Scaled model to 4 layers and width 32
67
- Step 3 validation loss: 0.6820
68
- Scaled model to 8 layers and width 32
69
- Step 4 validation loss: 0.6722
70
- Scaled model to 16 layers and width 32
71
- Step 5 validation loss: 0.6664
72
- Scaled model to 32 layers and width 32
73
- Step 6 validation loss: 0.6663
74
- Scaled model to 64 layers and width 32
75
- Step 7 validation loss: 0.6742
76
- Scaled model to 128 layers and width 32
77
- ```
78
-
79
- ### Collapse Submodel
80
- Using `collapse_submodel` with small clusters produced:
81
- ```
82
- collapsed_layers 3
83
- d_model 16
84
- ```
85
-
86
- ## WikiText Benchmark Attempt
87
- - `pip install -r requirements.txt` succeeded after installing torch 2.7.1+cpu.
88
- - Attempted to download WikiText2 via `datasets` but network access to the S3 bucket was blocked.
89
- - Fallback to random data: ran `python progressive_scaleup.py --steps 12 --width-mult 2.0`:
90
- ```
91
- Step 7 validation loss: 0.6980
92
- Scaled model to 1 layers and width 32
93
- Step 8 validation loss: 0.7022
94
- Scaled model to 1 layers and width 32
95
- Step 9 validation loss: 0.7025
96
- Scaled model to 1 layers and width 32
97
- Step 10 validation loss: 0.7055
98
- Scaled model to 1 layers and width 32
99
- Step 11 validation loss: 0.6976
100
- Scaled model to 1 layers and width 32
101
- ```
102
- - Collapsing a toy cluster produced:
103
- ```
104
- collapsed_layers 1
105
- ```
106
-
107
- ## WikiText Benchmark (datasets)
108
- Using the HuggingFace `datasets` loader with a small subset:
109
- ```
110
- Step 0 validation loss: 0.6237
111
- Scaled model to 2 layers and width 64
112
- Step 1 validation loss: 0.5894
113
- Scaled model to 4 layers and width 128
114
- Step 2 validation loss: 0.5108
115
- Scaled model to 8 layers and width 256
116
- Step 3 validation loss: 0.8422
117
- Collapsed model validation loss: 0.6019973754882812
118
- ```
119
-
120
- ## WikiText Schedule Benchmark
121
- Installed requirements via pip and ran `python wikitext_schedule.py --steps 10 --max-len 16 --dataset-size 10`:
122
- ```
123
- Step 0 validation loss: 0.6686
124
- Scaled model to 2 layers and width 32
125
- Step 1 validation loss: 0.6271
126
- Scaled model to 2 layers and width 64
127
- Step 2 validation loss: 0.7467
128
- Scaled model to 4 layers and width 64
129
- Step 3 validation loss: 0.6571
130
- Scaled model to 4 layers and width 128
131
- Step 4 validation loss: 0.7457
132
- Scaled model to 8 layers and width 128
133
- Step 5 validation loss: 0.8038
134
- Scaled model to 8 layers and width 256
135
- Step 6 validation loss: 2.6579
136
- Scaled model to 16 layers and width 256
137
- Step 7 validation loss: 4.0604
138
- Scaled model to 16 layers and width 512
139
- Step 8 validation loss: 8.6210
140
- Scaled model to 32 layers and width 512
141
- Step 9 validation loss: 6.4301
142
- Scaled model to 32 layers and width 1024
143
- Step 10 validation loss: 11.1592
144
- ```
145
- Attempting the full 12-step run exceeded memory limits and the process was killed after step 10.
146
-
147
- ## Recursive Integration Flow Test
148
- Installed requirements manually and ran `python recursive_integration_flow.py`. Output:
149
-
150
- ```
151
- warnings.warn(
152
- /workspace/Test/recursive_integration_flow.py:87: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
153
- with torch.cpu.amp.autocast(dtype=torch.bfloat16):
154
- Step 0 validation loss: 1.2578 K=0.105 C=0.328 S=0.329
155
- Step 1 validation loss: 0.7305 K=0.031 C=0.095 S=0.244
156
- ⚠️ Step 1 regressed below metric floor. Halting.
157
- Traceback (most recent call last):
158
- File "/workspace/Test/recursive_integration_flow.py", line 119, in <module>
159
- recursive_integration_flow()
160
- File "/workspace/Test/recursive_integration_flow.py", line 93, in recursive_integration_flow
161
- safe_output = hil_safe_inference(
162
- ^^^^^^^^^^^^^^^^^^^
163
- File "/workspace/Test/bit_transformer/safety.py", line 24, in hil_safe_inference
164
- raise RuntimeError(
165
- RuntimeError: Safety gate triggered: C=0.603, S=0.248
166
- ```
167
-
168
- New successful run after adjusting metric floors:
169
-
170
- ```
171
- Step 0 validation loss: 0.7461 K=0.038 C=0.084 S=0.246
172
- Step 1 validation loss: 0.7344 K=0.036 C=0.073 S=0.243
173
- Step 2 validation loss: 0.7266 K=0.029 C=0.074 S=0.242
174
- Step 3 validation loss: 0.7656 K=0.054 C=0.093 S=0.245
175
- Step 4 validation loss: 0.7422 K=0.026 C=0.097 S=0.241
176
- Compilation skipped: Dynamo is not supported on Python 3.12+
177
- Safe output bits: [[1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1]]
178
- ```
179
- New run with torch-2.7.1+cpu installed from requirements and compile disabled:
180
- ```
181
- Step 0 validation loss: 1.8750 K=0.152 C=0.314 S=0.345
182
- Step 1 validation loss: 1.0625 K=0.305 C=0.101 S=0.302
183
- Step 2 validation loss: 0.7266 K=0.028 C=0.083 S=0.244
184
- Step 3 validation loss: 0.7773 K=0.045 C=0.175 S=0.254
185
- Step 4 validation loss: 0.7539 K=0.031 C=0.122 S=0.245
186
- Safe output bits: [[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0]]
187
- ```
188
- Run with pinned dependencies from updated `requirements.txt`:
189
- ```
190
- Step 0 validation loss: 2.4531 K=0.195 C=0.287 S=0.346
191
- Step 1 validation loss: 1.5781 K=0.176 C=0.307 S=0.340
192
- Step 2 validation loss: 0.7383 K=0.037 C=0.112 S=0.245
193
- Step 3 validation loss: 0.7773 K=0.038 C=0.178 S=0.251
194
- Step 4 validation loss: 0.7227 K=0.028 C=0.099 S=0.239
195
- Safe output bits: [[1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1]]
196
- ```
197
-
198
- ## WikiText Schedule with Compression
199
- Ran `python wikitext_schedule.py --steps 2 --dataset-size 64` using the new compression-aware training.
200
-
201
- ```
202
- Step 0 validation loss: 0.6969
203
- Scaled model to 2 layers and width 32
204
- Step 1 validation loss: 0.6840
205
- Scaled model to 2 layers and width 64
206
- Step 2 validation loss: 0.6746
207
- ```
208
- ## WikiText Schedule 10-step Run with Compression
209
- Step 0 validation loss: 2.1250
210
- Scaled model to 2 layers and width 32
211
- Step 1 validation loss: 2.2188
212
- Scaled model to 2 layers and width 64
213
- Step 2 validation loss: 6.0000
214
- Scaled model to 4 layers and width 64
215
- Step 3 validation loss: 6.3750
216
- Scaled model to 4 layers and width 128
217
- Step 4 validation loss: 4.7812
218
- Scaled model to 8 layers and width 128
219
- Step 5 validation loss: 3.8594
220
- Scaled model to 8 layers and width 256
221
- Step 6 validation loss: 7.2812
222
- Scaled model to 16 layers and width 256
223
- Step 7 validation loss: 9.8125
224
- Scaled model to 16 layers and width 512
225
- Step 8 validation loss: 34.5000
226
- Scaled model to 32 layers and width 512
227
- Step 9 validation loss: 39.7500
228
- Scaled model to 32 layers and width 1024
229
- Step 10 validation loss: 163.0000
230
-
231
- ### 10-step Run with ACT Enabled
232
- Attempted to rerun the 10-step schedule with `use_act=True` and dataset size 128.
233
- Training was interrupted due to time limits after step 8. Partial results:
234
- ```
235
- Step 0 validation loss: 1.8594
236
- Scaled model to 2 layers and width 32
237
- Step 1 validation loss: 0.7344
238
- Scaled model to 2 layers and width 64
239
- Step 2 validation loss: 0.5469
240
- Scaled model to 4 layers and width 64
241
- Step 3 validation loss: 0.2520
242
- Scaled model to 4 layers and width 128
243
- Step 4 validation loss: 0.1748
244
- Scaled model to 8 layers and width 128
245
- Step 5 validation loss: 0.0284
246
- Scaled model to 8 layers and width 256
247
- Step 6 validation loss: 0.1982
248
- Scaled model to 16 layers and width 256
249
- Step 7 validation loss: 0.1562
250
- Scaled model to 16 layers and width 512
251
- Step 8 validation loss: 0.2168
252
- Scaled model to 32 layers and width 512
253
- ```
254
-
255
- ## WikiText-103 100MB Attempt
256
- Attempted to run training with 100MB of WikiText-103 data streamed via `datasets` and converted to bits. Converting the dataset (352k lines) took too long and the process was interrupted before the first training step could complete.
257
-
258
-
259
- ## Offline Full Bits Training Attempt
260
- - Installed requirements successfully.
261
- - Built `full_bits.pt` (100MB WikiText-103 compressed to bits).
262
- - Ran `python full_bits_train.py` but the training loop was extremely slow and was manually interrupted before completing a single pass.
263
-
264
- ## BitSeq Dataset Training
265
- - Built `full_bits.pt` from WikiText2 using `build_full_bits.py`.
266
- - Ran `python full_bits_train.py` with BitSeq DataLoader (seq=2048, batch=8).
267
- - The script loaded one batch and reported `Batch loss: 2.4375`.
268
-
269
- ## Offline train_full_sequence Scale-Up (8 steps)
270
- - Built dataset with `python build_full_bits.py` (~84MB).
271
- - Trained using `BitTransformerLM.train_full_sequence` over the first 65k bits with ctx_bits=64.
272
- ```
273
- Step 0 train loss: 3.7605
274
- Step 1 train loss: 3.7545
275
- Step 2 train loss: 3.7434
276
- Step 3 train loss: 3.7382
277
- Step 4 train loss: 3.7301
278
- Step 5 train loss: 3.7261
279
- Step 6 train loss: 3.7202
280
- Step 7 train loss: 3.7060
281
- ```
282
-
283
- ## Progressive Scale-Up 8-Step Run
284
- ```
285
- Step 0 validation loss: 0.7042
286
- Step 1 validation loss: 0.7036
287
- Step 2 validation loss: 0.7061
288
- Step 3 validation loss: 0.6997
289
- Step 4 validation loss: 0.7072
290
- Step 5 validation loss: 0.6892
291
- Step 6 validation loss: 0.7085
292
- Step 7 validation loss: 0.6966
293
- ```
294
-
295
- ## Compression Inference Test
296
- Installed requirements and ran `python wikitext_schedule.py --steps 2 --dataset-size 64`:
297
- ```
298
- Step 0 validation loss: 0.9297
299
- Scaled model to 2 layers and width 32
300
- Step 1 validation loss: 0.7773
301
- Scaled model to 2 layers and width 64
302
- Step 2 validation loss: 0.7773
303
- ```
304
-
305
- Ran a minimal training cycle with compression and generated text from the model:
306
- ```
307
- Model output: hllo world
308
- ```
309
-
310
-
311
- ## Bigger Batch Smoke Test
312
- Executed `python unified_workflow.py --steps 9 --dataset-size 100` after adding warm-up optimisation. Final lines:
313
- ```
314
- Epoch 1 raw_loss=0.5525 acc=0.692 | compressed_loss=0.5449 acc=0.718 direct_loss=0.0000 ratio=1.07
315
- Step 8 validation loss: 0.4727 K=0.248 C=0.126 S=0.309
316
- Final validation loss: 0.4824 K=0.245 C=0.131 S=0.308
317
- Safety gate triggered Safety gate triggered: C=0.476, S=0.292
318
- Collapsed model validation loss: 0.6928360462188721
319
- ```
320
-
321
- ### Inference Conversation
322
- ```
323
- User: hi
324
- Model: hi
325
- User: ok
326
- Model: ok
327
- ```
328
-
329
- ## Bigger Training Smoke Test
330
-
331
- Executed `python unified_workflow.py --steps 7 --dataset-size 64` after updating
332
- the training loop with extra optimizer steps. Final lines:
333
-
334
- ```
335
- Step 6 validation loss: 0.4922 K=0.252 C=0.118 S=0.306
336
- Final validation loss: 0.4785 K=0.264 C=0.105 S=0.307
337
- Safety gate triggered Safety gate triggered: C=0.476, S=0.297
338
- Collapsed model validation loss: 0.6666421890258789
339
- Workflow results: [(0, 1.015625, 0.2431640625, 0.126953125, 0.30909082293510437), (1, 0.74609375, 0.04248046875, 0.0306396484375, 0.2524452209472656), (2, 0.66796875, 0.11181640625, 0.06396484375, 0.2690799832344055), (3, 0.734375, 0.095703125, 0.044189453125, 0.2644684910774231), (4, 0.5546875, 0.220703125, 0.08837890625, 0.29613998532295227), (5, 0.73046875, 0.03759765625, 0.0654296875, 0.25516262650489807), (6, 0.4921875, 0.251953125, 0.11767578125, 0.30603474378585815), (7, 0.478515625, 0.263671875, 0.10498046875, 0.3072776794433594)]
340
- ```
341
-
342
- ### Inference Conversation (temperature=0.9, top-p=0.95)
343
-
344
- ```
345
- User: hi
346
- Model: hi
347
- User: how are you?
348
- Model: how are you?
349
- ```
350
-
351
- ## Continuous Training Test
352
- Loaded existing weights when present.
353
- Performed 2 scaling steps and 1 plateau step on a 16-sample dataset.
354
- Final validation loss: 0.7383 with the collapsed model at 0.6924.
355
-
356
- ## Diffusion LM Smoke Test
357
- Installed requirements and ran `python unified_workflow.py --steps 2 --dataset-size 32 --max-len 32 --diffusion`:
358
- ```
359
- Epoch 0 raw_loss=4.7188 acc=0.188 | compressed_loss=0.0000 acc=0.000 direct_loss=0.0000 ratio=0.00
360
- Epoch 1 raw_loss=4.6094 acc=0.185 | compressed_loss=0.0000 acc=0.000 direct_loss=0.0000 ratio=0.00
361
- Step 0 validation loss: 3.9844 K=0.311 C=0.109 S=0.351
362
- Epoch 0 raw_loss=3.6445 acc=0.355 | compressed_loss=0.0000 acc=0.000 direct_loss=0.0000 ratio=0.00
363
- Epoch 1 raw_loss=2.4531 acc=0.544 | compressed_loss=0.0000 acc=0.000 direct_loss=0.0000 ratio=0.00
364
- Step 1 validation loss: 3.2656 K=0.371 C=0.088 S=0.357
365
- Final validation loss: 3.2344 K=0.373 C=0.087 S=0.357
366
- Diffusion sample: [1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0]
367
- Diffusion inference output bits: [0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
368
- ```
369
-
370
- ## Rigorous Training Regime
371
- Ran `python tests/rigorous_training_regime.py`:
372
-
373
- ```
374
- ### Progressive Scale-Up (causal=True)
375
-
376
- Step 0 validation loss: 0.7167
377
- Scaled model to 1 layers and width 32
378
- Step 1 validation loss: 0.6880
379
- Scaled model to 1 layers and width 32
380
- Step 2 validation loss: 0.7019
381
- Scaled model to 1 layers and width 32
382
- Duration: 0.23s
383
-
384
- ### Progressive Scale-Up (causal=False)
385
-
386
- Step 0 validation loss: 0.8581
387
- Scaled model to 1 layers and width 32
388
- Step 1 validation loss: 0.7439
389
- Scaled model to 1 layers and width 32
390
- Step 2 validation loss: 0.7068
391
- Scaled model to 1 layers and width 32
392
- Duration: 0.21s
393
-
394
- ### Unified Workflow (causal=True)
395
-
396
- Loaded model from weights/model.pt.gz
397
- Epoch 0 raw_loss=0.6719 acc=0.581 | compressed_loss=0.6875 acc=0.586 direct_loss=0.0000 ratio=1.09
398
- Step 0 validation loss: 0.6367 K=0.091 C=0.069 S=0.284
399
- Epoch 0 raw_loss=0.6328 acc=0.605 | compressed_loss=0.6328 acc=0.612 direct_loss=0.0000 ratio=1.09
400
- Step 1 validation loss: 0.6914 K=0.202 C=0.049 S=0.305
401
- Epoch 0 raw_loss=0.5312 acc=0.718 | compressed_loss=0.6445 acc=0.628 direct_loss=0.0000 ratio=1.09
402
- Plateau 0 validation loss: 0.5469 K=0.096 C=0.118 S=0.290
403
- Final validation loss: 0.5430 K=0.099 C=0.104 S=0.289
404
- Safety gate triggered Safety gate triggered: C=0.484, S=0.285
405
- Collapsed model validation loss: 0.8396304845809937
406
- Workflow results: [(0, 0.63671875, 0.09130859375, 0.0693359375, 0.28369221091270447), (1, 0.69140625, 0.2021484375, 0.049072265625, 0.3053092062473297), (2, 0.546875, 0.09619140625, 0.1181640625, 0.2900315225124359), (3, 0.54296875, 0.09912109375, 0.10400390625, 0.289362370967865)]
407
- Inference on 'hi': [0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1]
408
-
409
- Duration: 8.48s
410
-
411
- ### Unified Workflow (causal=False / Diffusion)
412
-
413
- Loaded model from weights/model.pt.gz
414
- Epoch 0 raw_loss=0.8232 acc=0.391 | compressed_loss=0.0000 acc=0.000 direct_loss=0.0000 ratio=0.00
415
- Step 0 validation loss: 0.9805 K=0.098 C=0.067 S=0.285
416
- Epoch 0 raw_loss=0.7471 acc=0.561 | compressed_loss=0.0000 acc=0.000 direct_loss=0.0000 ratio=0.00
417
- Step 1 validation loss: 1.0547 K=0.134 C=0.091 S=0.294
418
- Epoch 0 raw_loss=0.7520 acc=0.609 | compressed_loss=0.0000 acc=0.000 direct_loss=0.0000 ratio=0.00
419
- Plateau 0 validation loss: 0.2119 K=0.187 C=0.185 S=0.332
420
- Final validation loss: 0.2188 K=0.187 C=0.176 S=0.330
421
- Collapsed model validation loss: 0.6897413730621338
422
- Diffusion sample: [1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1]
423
- Workflow results: [(0, 0.98046875, 0.09765625, 0.06689453125, 0.28478696942329407), (1, 1.0546875, 0.1337890625, 0.0908203125, 0.29406091570854187), (2, 0.2119140625, 0.1865234375, 0.1845703125, 0.33178743720054626), (3, 0.21875, 0.1865234375, 0.17578125, 0.32961323857307434)]
424
- Diffusion inference output bits: [1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1]
425
- Duration: 24.25s
426
- ```
427
-
428
- ## Rigorous Training Regime (2025-08-06)
429
- Ran `python tests/rigorous_training_regime.py`:
430
-
431
- ```
432
- ### Progressive Scale-Up (causal=True)
433
-
434
- Step 0 validation loss: 0.6921
435
- Scaled model to 1 layers and width 32
436
- Step 1 validation loss: 0.7171
437
- Scaled model to 1 layers and width 32
438
- Step 2 validation loss: 0.6914
439
- Scaled model to 1 layers and width 32
440
- Duration: 0.27s
441
-
442
- ### Progressive Scale-Up (causal=False)
443
-
444
- Step 0 validation loss: 0.8465
445
- Scaled model to 1 layers and width 32
446
- Step 1 validation loss: 0.7123
447
- Scaled model to 1 layers and width 32
448
- Step 2 validation loss: 0.7009
449
- Scaled model to 1 layers and width 32
450
- Duration: 0.26s
451
-
452
- ### Unified Workflow (causal=True)
453
-
454
- Epoch 0 raw_loss=1.1094 acc=0.593 | compressed_loss=1.1465 acc=0.599 direct_loss=0.0000 ratio=1.09
455
- Step 0 validation loss: 0.8945 K=0.301 C=0.092 S=0.339
456
- Epoch 0 raw_loss=0.9453 acc=0.601 | compressed_loss=0.9707 acc=0.617 direct_loss=0.0000 ratio=1.09
457
- Step 1 validation loss: 0.9180 K=0.301 C=0.088 S=0.338
458
- Epoch 0 raw_loss=0.8984 acc=0.593 | compressed_loss=0.9590 acc=0.599 direct_loss=0.0000 ratio=1.09
459
- Plateau 0 validation loss: 0.7969 K=0.243 C=0.095 S=0.324
460
- Final validation loss: 0.7930 K=0.244 C=0.094 S=0.324
461
- Safety gate triggered Safety gate triggered: C=0.484, S=0.314
462
- Collapsed model validation loss: 0.6552348732948303
463
- Workflow results: [(0, 0.89453125, 0.30078125, 0.09228515625, 0.33890560269355774), (1, 0.91796875, 0.30078125, 0.08837890625, 0.33844876289367676), (2, 0.796875, 0.2431640625, 0.0947265625, 0.32405367493629456), (3, 0.79296875, 0.244140625, 0.09423828125, 0.32419103384017944)]
464
- Inference on 'hi': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
465
-
466
- Duration: 5.26s
467
-
468
- ### Unified Workflow (causal=False / Diffusion)
469
-
470
- Loaded model from weights/model.pt.gz
471
- Epoch 0 raw_loss=1.2266 acc=0.590 | compressed_loss=0.0000 acc=0.000 direct_loss=0.0000 ratio=0.00
472
- Step 0 validation loss: 0.8359 K=0.165 C=0.032 S=0.296
473
- Epoch 0 raw_loss=0.7617 acc=0.603 | compressed_loss=0.0000 acc=0.000 direct_loss=0.0000 ratio=0.00
474
- Step 1 validation loss: 0.7891 K=0.025 C=0.043 S=0.268
475
- Epoch 0 raw_loss=0.7158 acc=0.553 | compressed_loss=0.0000 acc=0.000 direct_loss=0.0000 ratio=0.00
476
- Plateau 0 validation loss: 0.5391 K=0.113 C=0.056 S=0.287
477
- Final validation loss: 0.5391 K=0.116 C=0.060 S=0.287
478
- Collapsed model validation loss: 0.7268564701080322
479
- Diffusion sample: [1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1]
480
- Workflow results: [(0, 0.8359375, 0.1650390625, 0.0322265625, 0.29598498344421387), (1, 0.7890625, 0.0250244140625, 0.04345703125, 0.26766154170036316), (2, 0.5390625, 0.11328125, 0.05615234375, 0.2867652475833893), (3, 0.5390625, 0.1162109375, 0.06005859375, 0.28735819458961487)]
481
- Diffusion inference output bits: [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0]
482
- Duration: 3.70s
483
- ```
484
-
485
- ## Rigorous Training Regime (2025-08-06 - 10-step alt length/width)
486
- Ran `python tests/rigorous_training_regime.py`:
487
-
488
- ```
489
- ### Progressive Scale-Up (causal=True)
490
-
491
- Step 0 validation loss: 0.4615
492
- Step 1 validation loss: 0.4427
493
- Step 2 validation loss: 0.4282
494
- Step 3 validation loss: 0.4202
495
- Step 4 validation loss: 0.4175
496
- Scaled length; seq_len=128 width=32 params=8674
497
- Step 5 validation loss: 0.5383
498
- Scaled width; seq_len=128 width=64 params=33730
499
- Step 6 validation loss: 0.4334
500
- Step 7 validation loss: 0.4304
501
- Scaled length; seq_len=256 width=64 params=33730
502
- Step 8 validation loss: 0.5085
503
- Scaled width; seq_len=256 width=128 params=132994
504
- Step 9 validation loss: 0.4279
505
- Duration: 38.96s
506
-
507
- ### Progressive Scale-Up (causal=False)
508
-
509
- Step 0 validation loss: 0.4292
510
- Step 1 validation loss: 0.4053
511
- Step 2 validation loss: 0.4003
512
- Step 3 validation loss: 0.3997
513
- Scaled length; seq_len=128 width=32 params=8674
514
- Step 4 validation loss: 0.4162
515
- Scaled width; seq_len=128 width=64 params=33730
516
- Step 5 validation loss: 0.4173
517
- Scaled length; seq_len=256 width=64 params=33730
518
- Step 6 validation loss: 0.4160
519
- Scaled width; seq_len=256 width=128 params=132994
520
- Step 7 validation loss: 0.4211
521
- Scaled length; seq_len=512 width=128 params=132994
522
- Step 8 validation loss: 0.4227
523
- Scaled width; seq_len=512 width=256 params=528130
524
- Step 9 validation loss: 0.4146
525
- Duration: 173.71s
526
-
527
- ### Unified Workflow (causal=True)
528
-
529
- Epoch 0 raw_loss=3.1562 acc=0.540 | compressed_loss=3.4531 acc=0.529 direct_loss=0.0000 ratio=1.09
530
- Step 0 validation loss: 2.9688 K=0.559 C=0.220 S=0.475
531
- Epoch 0 raw_loss=2.7188 acc=0.540 | compressed_loss=2.9883 acc=0.529 direct_loss=0.0000 ratio=1.09
532
- Step 1 validation loss: 3.4531 K=0.566 C=0.222 S=0.481
533
- Epoch 0 raw_loss=3.0625 acc=0.540 | compressed_loss=3.4414 acc=0.529 direct_loss=0.0000 ratio=1.09
534
- Plateau 0 validation loss: 3.0781 K=0.559 C=0.219 S=0.474
535
- Final validation loss: 3.0938 K=0.559 C=0.220 S=0.475
536
- Safety gate triggered Safety gate triggered: C=0.484, S=0.466
537
- Collapsed model validation loss: 0.6677278280258179
538
- Workflow results: [(0, 2.96875, 0.55859375, 0.2197265625, 0.4746275246143341), (1, 3.453125, 0.56640625, 0.2216796875, 0.4808752238750458), (2, 3.078125, 0.55859375, 0.21875, 0.47436484694480896), (3, 3.09375, 0.55859375, 0.2197265625, 0.474519282579422)]
539
- Inference on 'hi': [1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1]
540
-
541
- Duration: 2.50s
542
-
543
- ### Unified Workflow (causal=False / Diffusion)
544
-
545
- Loaded model from weights/model.pt.gz
546
- Epoch 0 raw_loss=4.3984 acc=0.271 | compressed_loss=0.0000 acc=0.000 direct_loss=0.0000 ratio=0.00
547
- Step 0 validation loss: 4.9688 K=0.512 C=0.208 S=0.449
548
- Epoch 0 raw_loss=3.5859 acc=0.225 | compressed_loss=0.0000 acc=0.000 direct_loss=0.0000 ratio=0.00
549
- Step 1 validation loss: 4.6562 K=0.477 C=0.200 S=0.428
550
- Epoch 0 raw_loss=3.3008 acc=0.225 | compressed_loss=0.0000 acc=0.000 direct_loss=0.0000 ratio=0.00
551
- Plateau 0 validation loss: 3.5469 K=0.439 C=0.158 S=0.396
552
- Final validation loss: 3.5625 K=0.436 C=0.156 S=0.396
553
- Collapsed model validation loss: 0.6747412085533142
554
- Diffusion sample: [1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1]
555
- Workflow results: [(0, 4.96875, 0.51171875, 0.2080078125, 0.44865939021110535), (1, 4.65625, 0.4765625, 0.2001953125, 0.4284386932849884), (2, 3.546875, 0.439453125, 0.158203125, 0.3957676589488983), (3, 3.5625, 0.435546875, 0.15625, 0.39555999636650085)]
556
- Diffusion inference output bits: [1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1]
557
- Duration: 3.42s
558
- ```
559
-
560
- ## WikiText Training Attempt (2025-09-??)
561
- Attempted minimal training on real WikiText-2 data using `train_loop` with dropout 0.1 and evaluation dropout 0.0. Training failed due to a telemetry shape mismatch:
562
-
563
- ```
564
- RuntimeError: The size of tensor a (4) must match the size of tensor b (64) at non-singleton dimension 1
565
- ```
566
-
567
- As a sanity check, ran `hil_safe_inference` on an untrained model in evaluation mode (dropout=0.0):
568
-
569
- ```
570
- Inference output bits: [[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
571
- ```
572
-
573
- ## WikiText Training Debug (2025-09-??)
574
- Ran a minimal `train_loop` on parity-protected WikiText-2 samples with dropout 0.1:
575
-
576
- ```
577
- Epoch 0 raw_loss=0.6278 acc=0.724 | compressed_loss=0.0000 acc=0.000 direct_loss=0.0000 ratio=0.00
578
- ```