# WikiText-103 Streaming Validation Date: 2026-05-31 This report combines 5 random seeds (1, 2, 3, 4, 5) from saved streaming runs. No additional training is performed by this script; it reads saved `metrics.jsonl` files. Regime: WikiText-103 cached-corpus streaming setup with L12_H8_D320, 17,367,040 parameters, five prefixes from 250k to 4M tokens, and 1,000 optimizer steps per stage. This is a clean five-seed run including three dropout decay schedules and broad static dropout baselines from 0.00 through 0.30. ## Sources - `runs/wikitext103_l12_streaming_validation_5seed/locked_stream/20260531-093525/metrics.jsonl` ## Condition Ranking By Final Loss | Condition | Kind | N | Mean trajectory val | Std trajectory val | Mean final val | Std final val | Mean final gap | Dropout path | |---|---|---:|---:|---:|---:|---:|---:|---| | `wikitext103_formula_l12` | `anchor_decay` | 5 | 4.5711 | 0.0045 | 4.0808 | 0.0195 | 0.2817 | `0.30 -> 0.26 -> 0.18 -> 0.09 -> 0.02` | | `wikitext103_probe_blend` | `anchor_decay` | 5 | 4.5635 | 0.0046 | 4.0961 | 0.0145 | 0.3287 | `0.19 -> 0.14 -> 0.09 -> 0.04 -> 0.01` | | `wikitext103_low_decay` | `anchor_decay` | 5 | 4.5681 | 0.0073 | 4.1020 | 0.0166 | 0.3251 | `0.14 -> 0.14 -> 0.10 -> 0.06 -> 0.02` | | `static_dropout_0.1` | `static` | 5 | 4.5836 | 0.0062 | 4.1105 | 0.0188 | 0.2687 | `0.10 -> 0.10 -> 0.10 -> 0.10 -> 0.10` | | `static_dropout_0.08` | `static` | 5 | 4.5967 | 0.0073 | 4.1116 | 0.0186 | 0.2848 | `0.08 -> 0.08 -> 0.08 -> 0.08 -> 0.08` | | `static_dropout_0.06` | `static` | 5 | 4.6186 | 0.0048 | 4.1197 | 0.0082 | 0.3131 | `0.06 -> 0.06 -> 0.06 -> 0.06 -> 0.06` | | `static_dropout_0.14` | `static` | 5 | 4.5735 | 0.0077 | 4.1221 | 0.0155 | 0.2548 | `0.14 -> 0.14 -> 0.14 -> 0.14 -> 0.14` | | `static_dropout_0.18` | `static` | 5 | 4.5756 | 0.0041 | 4.1304 | 0.0130 | 0.2289 | `0.18 -> 0.18 -> 0.18 -> 0.18 -> 0.18` | | `static_dropout_0.04` | `static` | 5 | 4.6501 | 0.0077 | 4.1331 | 0.0227 | 0.3353 | `0.04 -> 0.04 -> 0.04 -> 0.04 -> 0.04` | | `static_dropout_0.2` | `static` | 5 | 4.5794 | 0.0050 | 4.1394 | 0.0167 | 0.2239 | `0.20 -> 0.20 -> 0.20 -> 0.20 -> 0.20` | | `static_dropout_0.02` | `static` | 5 | 4.6954 | 0.0086 | 4.1459 | 0.0165 | 0.3700 | `0.02 -> 0.02 -> 0.02 -> 0.02 -> 0.02` | | `static_dropout_0.26` | `static` | 5 | 4.6063 | 0.0051 | 4.1784 | 0.0145 | 0.2008 | `0.26 -> 0.26 -> 0.26 -> 0.26 -> 0.26` | | `static_dropout_0` | `static` | 5 | 4.7762 | 0.0109 | 4.1835 | 0.0165 | 0.4085 | `0.00 -> 0.00 -> 0.00 -> 0.00 -> 0.00` | | `static_dropout_0.3` | `static` | 5 | 4.6253 | 0.0034 | 4.1946 | 0.0141 | 0.1819 | `0.30 -> 0.30 -> 0.30 -> 0.30 -> 0.30` | ## Paired Final-Loss Deltas Negative `delta_vs_best_static` means the condition beat the best static baseline for that seed. | Seed | Condition | Final val | Best static | Best static final val | Delta vs best static | |---:|---|---:|---|---:|---:| | 1 | `wikitext103_formula_l12` | 4.0623 | `static_dropout_0.1` | 4.0807 | -0.0184 | | 1 | `wikitext103_probe_blend` | 4.0738 | `static_dropout_0.1` | 4.0807 | -0.0069 | | 1 | `wikitext103_low_decay` | 4.0854 | `static_dropout_0.1` | 4.0807 | +0.0047 | | 1 | `static_dropout_0.1` | 4.0807 | `static_dropout_0.1` | 4.0807 | +0.0000 | | 1 | `static_dropout_0.08` | 4.0893 | `static_dropout_0.1` | 4.0807 | +0.0086 | | 1 | `static_dropout_0.06` | 4.1112 | `static_dropout_0.1` | 4.0807 | +0.0305 | | 1 | `static_dropout_0.14` | 4.1082 | `static_dropout_0.1` | 4.0807 | +0.0275 | | 1 | `static_dropout_0.18` | 4.1108 | `static_dropout_0.1` | 4.0807 | +0.0301 | | 1 | `static_dropout_0.2` | 4.1162 | `static_dropout_0.1` | 4.0807 | +0.0355 | | 1 | `static_dropout_0.04` | 4.1031 | `static_dropout_0.1` | 4.0807 | +0.0224 | | 1 | `static_dropout_0.02` | 4.1371 | `static_dropout_0.1` | 4.0807 | +0.0564 | | 1 | `static_dropout_0` | 4.1600 | `static_dropout_0.1` | 4.0807 | +0.0793 | | 1 | `static_dropout_0.26` | 4.1557 | `static_dropout_0.1` | 4.0807 | +0.0750 | | 1 | `static_dropout_0.3` | 4.1802 | `static_dropout_0.1` | 4.0807 | +0.0994 | | 2 | `wikitext103_formula_l12` | 4.1123 | `static_dropout_0.06` | 4.1304 | -0.0181 | | 2 | `wikitext103_probe_blend` | 4.1113 | `static_dropout_0.06` | 4.1304 | -0.0191 | | 2 | `wikitext103_low_decay` | 4.1291 | `static_dropout_0.06` | 4.1304 | -0.0013 | | 2 | `static_dropout_0.1` | 4.1320 | `static_dropout_0.06` | 4.1304 | +0.0016 | | 2 | `static_dropout_0.08` | 4.1374 | `static_dropout_0.06` | 4.1304 | +0.0071 | | 2 | `static_dropout_0.06` | 4.1304 | `static_dropout_0.06` | 4.1304 | +0.0000 | | 2 | `static_dropout_0.14` | 4.1476 | `static_dropout_0.06` | 4.1304 | +0.0172 | | 2 | `static_dropout_0.18` | 4.1471 | `static_dropout_0.06` | 4.1304 | +0.0167 | | 2 | `static_dropout_0.2` | 4.1633 | `static_dropout_0.06` | 4.1304 | +0.0329 | | 2 | `static_dropout_0.04` | 4.1648 | `static_dropout_0.06` | 4.1304 | +0.0344 | | 2 | `static_dropout_0.02` | 4.1746 | `static_dropout_0.06` | 4.1304 | +0.0442 | | 2 | `static_dropout_0` | 4.2030 | `static_dropout_0.06` | 4.1304 | +0.0726 | | 2 | `static_dropout_0.26` | 4.1961 | `static_dropout_0.06` | 4.1304 | +0.0658 | | 2 | `static_dropout_0.3` | 4.2155 | `static_dropout_0.06` | 4.1304 | +0.0852 | | 3 | `wikitext103_formula_l12` | 4.0763 | `static_dropout_0.08` | 4.1036 | -0.0272 | | 3 | `wikitext103_probe_blend` | 4.0934 | `static_dropout_0.08` | 4.1036 | -0.0102 | | 3 | `wikitext103_low_decay` | 4.1006 | `static_dropout_0.08` | 4.1036 | -0.0030 | | 3 | `static_dropout_0.1` | 4.1115 | `static_dropout_0.08` | 4.1036 | +0.0079 | | 3 | `static_dropout_0.08` | 4.1036 | `static_dropout_0.08` | 4.1036 | +0.0000 | | 3 | `static_dropout_0.06` | 4.1127 | `static_dropout_0.08` | 4.1036 | +0.0092 | | 3 | `static_dropout_0.14` | 4.1240 | `static_dropout_0.08` | 4.1036 | +0.0204 | | 3 | `static_dropout_0.18` | 4.1285 | `static_dropout_0.08` | 4.1036 | +0.0250 | | 3 | `static_dropout_0.2` | 4.1367 | `static_dropout_0.08` | 4.1036 | +0.0332 | | 3 | `static_dropout_0.04` | 4.1246 | `static_dropout_0.08` | 4.1036 | +0.0211 | | 3 | `static_dropout_0.02` | 4.1443 | `static_dropout_0.08` | 4.1036 | +0.0408 | | 3 | `static_dropout_0` | 4.1758 | `static_dropout_0.08` | 4.1036 | +0.0722 | | 3 | `static_dropout_0.26` | 4.1796 | `static_dropout_0.08` | 4.1036 | +0.0761 | | 3 | `static_dropout_0.3` | 4.1926 | `static_dropout_0.08` | 4.1036 | +0.0890 | | 4 | `wikitext103_formula_l12` | 4.0845 | `static_dropout_0.1` | 4.1096 | -0.0251 | | 4 | `wikitext103_probe_blend` | 4.0954 | `static_dropout_0.1` | 4.1096 | -0.0141 | | 4 | `wikitext103_low_decay` | 4.0928 | `static_dropout_0.1` | 4.1096 | -0.0167 | | 4 | `static_dropout_0.1` | 4.1096 | `static_dropout_0.1` | 4.1096 | +0.0000 | | 4 | `static_dropout_0.08` | 4.1223 | `static_dropout_0.1` | 4.1096 | +0.0127 | | 4 | `static_dropout_0.06` | 4.1188 | `static_dropout_0.1` | 4.1096 | +0.0093 | | 4 | `static_dropout_0.14` | 4.1117 | `static_dropout_0.1` | 4.1096 | +0.0021 | | 4 | `static_dropout_0.18` | 4.1330 | `static_dropout_0.1` | 4.1096 | +0.0234 | | 4 | `static_dropout_0.2` | 4.1388 | `static_dropout_0.1` | 4.1096 | +0.0292 | | 4 | `static_dropout_0.04` | 4.1312 | `static_dropout_0.1` | 4.1096 | +0.0217 | | 4 | `static_dropout_0.02` | 4.1387 | `static_dropout_0.1` | 4.1096 | +0.0291 | | 4 | `static_dropout_0` | 4.1853 | `static_dropout_0.1` | 4.1096 | +0.0757 | | 4 | `static_dropout_0.26` | 4.1782 | `static_dropout_0.1` | 4.1096 | +0.0686 | | 4 | `static_dropout_0.3` | 4.2007 | `static_dropout_0.1` | 4.1096 | +0.0912 | | 5 | `wikitext103_formula_l12` | 4.0686 | `static_dropout_0.08` | 4.1056 | -0.0370 | | 5 | `wikitext103_probe_blend` | 4.1066 | `static_dropout_0.08` | 4.1056 | +0.0009 | | 5 | `wikitext103_low_decay` | 4.1021 | `static_dropout_0.08` | 4.1056 | -0.0035 | | 5 | `static_dropout_0.1` | 4.1186 | `static_dropout_0.08` | 4.1056 | +0.0129 | | 5 | `static_dropout_0.08` | 4.1056 | `static_dropout_0.08` | 4.1056 | +0.0000 | | 5 | `static_dropout_0.06` | 4.1253 | `static_dropout_0.08` | 4.1056 | +0.0197 | | 5 | `static_dropout_0.14` | 4.1192 | `static_dropout_0.08` | 4.1056 | +0.0135 | | 5 | `static_dropout_0.18` | 4.1325 | `static_dropout_0.08` | 4.1056 | +0.0269 | | 5 | `static_dropout_0.2` | 4.1418 | `static_dropout_0.08` | 4.1056 | +0.0362 | | 5 | `static_dropout_0.04` | 4.1419 | `static_dropout_0.08` | 4.1056 | +0.0363 | | 5 | `static_dropout_0.02` | 4.1346 | `static_dropout_0.08` | 4.1056 | +0.0290 | | 5 | `static_dropout_0` | 4.1934 | `static_dropout_0.08` | 4.1056 | +0.0878 | | 5 | `static_dropout_0.26` | 4.1821 | `static_dropout_0.08` | 4.1056 | +0.0765 | | 5 | `static_dropout_0.3` | 4.1841 | `static_dropout_0.08` | 4.1056 | +0.0785 | ## Stage Trajectory | Stage | Prefix tokens | Condition | Dropout | N | Mean val | Std val | Mean train | Mean gap | |---:|---:|---|---:|---:|---:|---:|---:|---:| | 0 | 250,000 | `static_dropout_0.18` | 0.180 | 5 | 5.1616 | 0.0150 | 3.9964 | 1.1652 | | 0 | 250,000 | `wikitext103_low_decay` | 0.140 | 5 | 5.1635 | 0.0220 | 3.9051 | 1.2585 | | 0 | 250,000 | `static_dropout_0.14` | 0.140 | 5 | 5.1635 | 0.0220 | 3.9051 | 1.2585 | | 0 | 250,000 | `wikitext103_probe_blend` | 0.190 | 5 | 5.1659 | 0.0171 | 4.0201 | 1.1458 | | 0 | 250,000 | `static_dropout_0.1` | 0.100 | 5 | 5.1699 | 0.0237 | 3.8219 | 1.3480 | | 0 | 250,000 | `static_dropout_0.2` | 0.200 | 5 | 5.1701 | 0.0141 | 4.0363 | 1.1338 | | 0 | 250,000 | `static_dropout_0.08` | 0.080 | 5 | 5.1894 | 0.0161 | 3.7619 | 1.4274 | | 0 | 250,000 | `static_dropout_0.26` | 0.260 | 5 | 5.1940 | 0.0161 | 4.1496 | 1.0444 | | 0 | 250,000 | `wikitext103_formula_l12` | 0.300 | 5 | 5.2148 | 0.0181 | 4.2131 | 1.0017 | | 0 | 250,000 | `static_dropout_0.3` | 0.300 | 5 | 5.2148 | 0.0181 | 4.2131 | 1.0017 | | 0 | 250,000 | `static_dropout_0.06` | 0.060 | 5 | 5.2154 | 0.0173 | 3.7128 | 1.5026 | | 0 | 250,000 | `static_dropout_0.04` | 0.040 | 5 | 5.2378 | 0.0186 | 3.6441 | 1.5938 | | 0 | 250,000 | `static_dropout_0.02` | 0.020 | 5 | 5.2750 | 0.0255 | 3.5725 | 1.7025 | | 0 | 250,000 | `static_dropout_0` | 0.000 | 5 | 5.3403 | 0.0270 | 3.5230 | 1.8172 | | 1 | 500,000 | `wikitext103_probe_blend` | 0.140 | 5 | 4.7872 | 0.0269 | 3.6846 | 1.1027 | | 1 | 500,000 | `static_dropout_0.2` | 0.200 | 5 | 4.7873 | 0.0236 | 3.7914 | 0.9959 | | 1 | 500,000 | `static_dropout_0.18` | 0.180 | 5 | 4.7946 | 0.0206 | 3.7572 | 1.0375 | | 1 | 500,000 | `static_dropout_0.14` | 0.140 | 5 | 4.8001 | 0.0198 | 3.6650 | 1.1351 | | 1 | 500,000 | `wikitext103_low_decay` | 0.140 | 5 | 4.8001 | 0.0198 | 3.6650 | 1.1351 | | 1 | 500,000 | `wikitext103_formula_l12` | 0.260 | 5 | 4.8053 | 0.0278 | 3.9182 | 0.8871 | | 1 | 500,000 | `static_dropout_0.26` | 0.260 | 5 | 4.8081 | 0.0216 | 3.9053 | 0.9028 | | 1 | 500,000 | `static_dropout_0.3` | 0.300 | 5 | 4.8242 | 0.0296 | 3.9765 | 0.8476 | | 1 | 500,000 | `static_dropout_0.1` | 0.100 | 5 | 4.8332 | 0.0258 | 3.5637 | 1.2695 | | 1 | 500,000 | `static_dropout_0.08` | 0.080 | 5 | 4.8576 | 0.0239 | 3.5036 | 1.3540 | | 1 | 500,000 | `static_dropout_0.06` | 0.060 | 5 | 4.8947 | 0.0213 | 3.4394 | 1.4552 | | 1 | 500,000 | `static_dropout_0.04` | 0.040 | 5 | 4.9573 | 0.0250 | 3.3515 | 1.6058 | | 1 | 500,000 | `static_dropout_0.02` | 0.020 | 5 | 5.0451 | 0.0169 | 3.2612 | 1.7839 | | 1 | 500,000 | `static_dropout_0` | 0.000 | 5 | 5.1741 | 0.0252 | 3.1506 | 2.0235 | | 2 | 1,000,000 | `wikitext103_formula_l12` | 0.180 | 5 | 4.4938 | 0.0147 | 3.8283 | 0.6655 | | 2 | 1,000,000 | `wikitext103_probe_blend` | 0.090 | 5 | 4.4940 | 0.0159 | 3.6495 | 0.8445 | | 2 | 1,000,000 | `static_dropout_0.14` | 0.140 | 5 | 4.5001 | 0.0148 | 3.7163 | 0.7838 | | 2 | 1,000,000 | `wikitext103_low_decay` | 0.100 | 5 | 4.5013 | 0.0158 | 3.6607 | 0.8406 | | 2 | 1,000,000 | `static_dropout_0.18` | 0.180 | 5 | 4.5023 | 0.0185 | 3.7866 | 0.7157 | | 2 | 1,000,000 | `static_dropout_0.2` | 0.200 | 5 | 4.5060 | 0.0204 | 3.8148 | 0.6913 | | 2 | 1,000,000 | `static_dropout_0.1` | 0.100 | 5 | 4.5186 | 0.0135 | 3.6524 | 0.8662 | | 2 | 1,000,000 | `static_dropout_0.26` | 0.260 | 5 | 4.5262 | 0.0101 | 3.9071 | 0.6191 | | 2 | 1,000,000 | `static_dropout_0.08` | 0.080 | 5 | 4.5326 | 0.0117 | 3.6000 | 0.9326 | | 2 | 1,000,000 | `static_dropout_0.3` | 0.300 | 5 | 4.5462 | 0.0127 | 3.9708 | 0.5754 | | 2 | 1,000,000 | `static_dropout_0.06` | 0.060 | 5 | 4.5574 | 0.0126 | 3.5554 | 1.0020 | | 2 | 1,000,000 | `static_dropout_0.04` | 0.040 | 5 | 4.5959 | 0.0146 | 3.5030 | 1.0929 | | 2 | 1,000,000 | `static_dropout_0.02` | 0.020 | 5 | 4.6558 | 0.0159 | 3.4324 | 1.2234 | | 2 | 1,000,000 | `static_dropout_0` | 0.000 | 5 | 4.7661 | 0.0324 | 3.3658 | 1.4003 | | 3 | 2,000,000 | `wikitext103_formula_l12` | 0.090 | 5 | 4.2607 | 0.0181 | 3.8089 | 0.4518 | | 3 | 2,000,000 | `wikitext103_low_decay` | 0.060 | 5 | 4.2736 | 0.0228 | 3.7474 | 0.5261 | | 3 | 2,000,000 | `wikitext103_probe_blend` | 0.040 | 5 | 4.2743 | 0.0174 | 3.7200 | 0.5543 | | 3 | 2,000,000 | `static_dropout_0.14` | 0.140 | 5 | 4.2816 | 0.0214 | 3.8287 | 0.4529 | | 3 | 2,000,000 | `static_dropout_0.1` | 0.100 | 5 | 4.2857 | 0.0190 | 3.7809 | 0.5048 | | 3 | 2,000,000 | `static_dropout_0.18` | 0.180 | 5 | 4.2889 | 0.0195 | 3.8768 | 0.4121 | | 3 | 2,000,000 | `static_dropout_0.08` | 0.080 | 5 | 4.2925 | 0.0166 | 3.7519 | 0.5406 | | 3 | 2,000,000 | `static_dropout_0.2` | 0.200 | 5 | 4.2942 | 0.0160 | 3.8982 | 0.3960 | | 3 | 2,000,000 | `static_dropout_0.06` | 0.060 | 5 | 4.3059 | 0.0182 | 3.7318 | 0.5741 | | 3 | 2,000,000 | `static_dropout_0.26` | 0.260 | 5 | 4.3248 | 0.0179 | 3.9655 | 0.3593 | | 3 | 2,000,000 | `static_dropout_0.04` | 0.040 | 5 | 4.3262 | 0.0165 | 3.7038 | 0.6225 | | 3 | 2,000,000 | `static_dropout_0.3` | 0.300 | 5 | 4.3467 | 0.0159 | 4.0097 | 0.3370 | | 3 | 2,000,000 | `static_dropout_0.02` | 0.020 | 5 | 4.3551 | 0.0265 | 3.6673 | 0.6878 | | 3 | 2,000,000 | `static_dropout_0` | 0.000 | 5 | 4.4174 | 0.0188 | 3.6409 | 0.7765 | | 4 | 4,000,000 | `wikitext103_formula_l12` | 0.020 | 5 | 4.0808 | 0.0195 | 3.7991 | 0.2817 | | 4 | 4,000,000 | `wikitext103_probe_blend` | 0.010 | 5 | 4.0961 | 0.0145 | 3.7674 | 0.3287 | | 4 | 4,000,000 | `wikitext103_low_decay` | 0.020 | 5 | 4.1020 | 0.0166 | 3.7769 | 0.3251 | | 4 | 4,000,000 | `static_dropout_0.1` | 0.100 | 5 | 4.1105 | 0.0188 | 3.8417 | 0.2687 | | 4 | 4,000,000 | `static_dropout_0.08` | 0.080 | 5 | 4.1116 | 0.0186 | 3.8268 | 0.2848 | | 4 | 4,000,000 | `static_dropout_0.06` | 0.060 | 5 | 4.1197 | 0.0082 | 3.8066 | 0.3131 | | 4 | 4,000,000 | `static_dropout_0.14` | 0.140 | 5 | 4.1221 | 0.0155 | 3.8674 | 0.2548 | | 4 | 4,000,000 | `static_dropout_0.18` | 0.180 | 5 | 4.1304 | 0.0130 | 3.9015 | 0.2289 | | 4 | 4,000,000 | `static_dropout_0.04` | 0.040 | 5 | 4.1331 | 0.0227 | 3.7978 | 0.3353 | | 4 | 4,000,000 | `static_dropout_0.2` | 0.200 | 5 | 4.1394 | 0.0167 | 3.9155 | 0.2239 | | 4 | 4,000,000 | `static_dropout_0.02` | 0.020 | 5 | 4.1459 | 0.0165 | 3.7759 | 0.3700 | | 4 | 4,000,000 | `static_dropout_0.26` | 0.260 | 5 | 4.1784 | 0.0145 | 3.9775 | 0.2008 | | 4 | 4,000,000 | `static_dropout_0` | 0.000 | 5 | 4.1835 | 0.0165 | 3.7750 | 0.4085 | | 4 | 4,000,000 | `static_dropout_0.3` | 0.300 | 5 | 4.1946 | 0.0141 | 4.0127 | 0.1819 | ## Interpretation - `wikitext103_formula_l12` has the best 5-seed mean final validation loss: 4.0808 +/- 0.0195. - The second-best final condition is `wikitext103_probe_blend` at 4.0961 +/- 0.0145. - The best static baseline by mean final loss is `static_dropout_0.1` at 4.1105 +/- 0.0188. - `wikitext103_formula_l12` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0181. - `wikitext103_probe_blend` beats the per-seed best static baseline in 4/5 seeds; worst paired delta is +0.0009. - `wikitext103_low_decay` beats the per-seed best static baseline in 4/5 seeds; worst paired delta is +0.0047. - The best first-stage condition is `static_dropout_0.18` at prefix 250,000 with mean validation loss 5.1616; compare this with the final ranking before claiming a schedule is uniformly better. - This is a saved-run streaming validation artifact. Treat it as strong evidence only when the tested conditions, seeds, static baselines, and stream protocol match the claim being made.