| # WYRM v27 FINAL β LOCKED VERSION |
| ## Date: April 8, 2026 (Day 58) |
| ## Status: TRAINING ON KAGGLE β DO NOT MODIFY |
|
|
| This is the exact version running on Kaggle T4 x2 right now. |
|
|
| ### Architecture |
| - Total params: 565,816,475 |
| - Base kernel: 443,744,202 |
| - Synthase depth: 32,834,328 (6.89%) |
| - PUP uncertainty: 6,406 (0.001%) |
| - MultiEmbed: 33,560,576 |
| - Plug: 3,154,944 |
| - AuxHead: 32,770,048 |
| - GaussianHead: 19,745,971 |
|
|
| ### Config |
| - dim=1024, layers=24, heads=32, ff=4096 |
| - Batch: 4 x 4 accum = 16 effective |
| - Sequence: 1024 |
| - Vocab: 16,000 (BPE) |
| - Precision: float32 |
| - Optimizer: AdamW, 10 groups |
| - Curriculum: language + math + cognition at multiple depths |
| - SLAΒ² on L0, Synthase depth profiles, PUP (disabled loss) |
|
|
| ### Files |
| - wyrm_notebook_v27_FINAL.py β the Kaggle notebook |
| - kernel-metadata.json β Kaggle push metadata |
| - dataset/ β exact copy of kaggle_upload/ (kernel + staging + tokenizer) |
|
|
| ### Training Metrics (first 60 steps) |
| - Loss: 11.31 β 4.81 (57% drop) |
| - Task loss: 8.22 β 2.21 |
| - Best: 2.07 |
| - VRAM: 5.56 GB / 14.6 GB |
| - Step time: ~27s |
| - ETA: ~111h (15K steps) |
|
|
| ### Kaggle Details |
| - Slug: amuzetnom02/wyrm-500m-interactive |
| - Dataset: amuzetnom02/wyrm-500m-base |
| - Accelerator: GPU T4 x2 |
| - Account: amuzetnom02 |
|
|