WikiText-103 Streaming Validation
Date: 2026-05-31
This report combines 5 random seeds (1, 2, 3, 4, 5) from saved streaming runs.
No additional training is performed by this script; it reads saved
metrics.jsonl files.
Regime: WikiText-103 cached-corpus streaming setup with L12_H8_D320, 17,367,040 parameters, five prefixes from 250k to 4M tokens, and 1,000 optimizer steps per stage. This is a clean five-seed run including three dropout decay schedules and broad static dropout baselines from 0.00 through 0.30.
Sources
runs/wikitext103_l12_streaming_validation_5seed/locked_stream/20260531-093525/metrics.jsonl
Condition Ranking By Final Loss
| Condition | Kind | N | Mean trajectory val | Std trajectory val | Mean final val | Std final val | Mean final gap | Dropout path |
|---|---|---|---|---|---|---|---|---|
wikitext103_formula_l12 |
anchor_decay |
5 | 4.5711 | 0.0045 | 4.0808 | 0.0195 | 0.2817 | 0.30 -> 0.26 -> 0.18 -> 0.09 -> 0.02 |
wikitext103_probe_blend |
anchor_decay |
5 | 4.5635 | 0.0046 | 4.0961 | 0.0145 | 0.3287 | 0.19 -> 0.14 -> 0.09 -> 0.04 -> 0.01 |
wikitext103_low_decay |
anchor_decay |
5 | 4.5681 | 0.0073 | 4.1020 | 0.0166 | 0.3251 | 0.14 -> 0.14 -> 0.10 -> 0.06 -> 0.02 |
static_dropout_0.1 |
static |
5 | 4.5836 | 0.0062 | 4.1105 | 0.0188 | 0.2687 | 0.10 -> 0.10 -> 0.10 -> 0.10 -> 0.10 |
static_dropout_0.08 |
static |
5 | 4.5967 | 0.0073 | 4.1116 | 0.0186 | 0.2848 | 0.08 -> 0.08 -> 0.08 -> 0.08 -> 0.08 |
static_dropout_0.06 |
static |
5 | 4.6186 | 0.0048 | 4.1197 | 0.0082 | 0.3131 | 0.06 -> 0.06 -> 0.06 -> 0.06 -> 0.06 |
static_dropout_0.14 |
static |
5 | 4.5735 | 0.0077 | 4.1221 | 0.0155 | 0.2548 | 0.14 -> 0.14 -> 0.14 -> 0.14 -> 0.14 |
static_dropout_0.18 |
static |
5 | 4.5756 | 0.0041 | 4.1304 | 0.0130 | 0.2289 | 0.18 -> 0.18 -> 0.18 -> 0.18 -> 0.18 |
static_dropout_0.04 |
static |
5 | 4.6501 | 0.0077 | 4.1331 | 0.0227 | 0.3353 | 0.04 -> 0.04 -> 0.04 -> 0.04 -> 0.04 |
static_dropout_0.2 |
static |
5 | 4.5794 | 0.0050 | 4.1394 | 0.0167 | 0.2239 | 0.20 -> 0.20 -> 0.20 -> 0.20 -> 0.20 |
static_dropout_0.02 |
static |
5 | 4.6954 | 0.0086 | 4.1459 | 0.0165 | 0.3700 | 0.02 -> 0.02 -> 0.02 -> 0.02 -> 0.02 |
static_dropout_0.26 |
static |
5 | 4.6063 | 0.0051 | 4.1784 | 0.0145 | 0.2008 | 0.26 -> 0.26 -> 0.26 -> 0.26 -> 0.26 |
static_dropout_0 |
static |
5 | 4.7762 | 0.0109 | 4.1835 | 0.0165 | 0.4085 | 0.00 -> 0.00 -> 0.00 -> 0.00 -> 0.00 |
static_dropout_0.3 |
static |
5 | 4.6253 | 0.0034 | 4.1946 | 0.0141 | 0.1819 | 0.30 -> 0.30 -> 0.30 -> 0.30 -> 0.30 |
Paired Final-Loss Deltas
Negative delta_vs_best_static means the condition beat the best static
baseline for that seed.
| Seed | Condition | Final val | Best static | Best static final val | Delta vs best static |
|---|---|---|---|---|---|
| 1 | wikitext103_formula_l12 |
4.0623 | static_dropout_0.1 |
4.0807 | -0.0184 |
| 1 | wikitext103_probe_blend |
4.0738 | static_dropout_0.1 |
4.0807 | -0.0069 |
| 1 | wikitext103_low_decay |
4.0854 | static_dropout_0.1 |
4.0807 | +0.0047 |
| 1 | static_dropout_0.1 |
4.0807 | static_dropout_0.1 |
4.0807 | +0.0000 |
| 1 | static_dropout_0.08 |
4.0893 | static_dropout_0.1 |
4.0807 | +0.0086 |
| 1 | static_dropout_0.06 |
4.1112 | static_dropout_0.1 |
4.0807 | +0.0305 |
| 1 | static_dropout_0.14 |
4.1082 | static_dropout_0.1 |
4.0807 | +0.0275 |
| 1 | static_dropout_0.18 |
4.1108 | static_dropout_0.1 |
4.0807 | +0.0301 |
| 1 | static_dropout_0.2 |
4.1162 | static_dropout_0.1 |
4.0807 | +0.0355 |
| 1 | static_dropout_0.04 |
4.1031 | static_dropout_0.1 |
4.0807 | +0.0224 |
| 1 | static_dropout_0.02 |
4.1371 | static_dropout_0.1 |
4.0807 | +0.0564 |
| 1 | static_dropout_0 |
4.1600 | static_dropout_0.1 |
4.0807 | +0.0793 |
| 1 | static_dropout_0.26 |
4.1557 | static_dropout_0.1 |
4.0807 | +0.0750 |
| 1 | static_dropout_0.3 |
4.1802 | static_dropout_0.1 |
4.0807 | +0.0994 |
| 2 | wikitext103_formula_l12 |
4.1123 | static_dropout_0.06 |
4.1304 | -0.0181 |
| 2 | wikitext103_probe_blend |
4.1113 | static_dropout_0.06 |
4.1304 | -0.0191 |
| 2 | wikitext103_low_decay |
4.1291 | static_dropout_0.06 |
4.1304 | -0.0013 |
| 2 | static_dropout_0.1 |
4.1320 | static_dropout_0.06 |
4.1304 | +0.0016 |
| 2 | static_dropout_0.08 |
4.1374 | static_dropout_0.06 |
4.1304 | +0.0071 |
| 2 | static_dropout_0.06 |
4.1304 | static_dropout_0.06 |
4.1304 | +0.0000 |
| 2 | static_dropout_0.14 |
4.1476 | static_dropout_0.06 |
4.1304 | +0.0172 |
| 2 | static_dropout_0.18 |
4.1471 | static_dropout_0.06 |
4.1304 | +0.0167 |
| 2 | static_dropout_0.2 |
4.1633 | static_dropout_0.06 |
4.1304 | +0.0329 |
| 2 | static_dropout_0.04 |
4.1648 | static_dropout_0.06 |
4.1304 | +0.0344 |
| 2 | static_dropout_0.02 |
4.1746 | static_dropout_0.06 |
4.1304 | +0.0442 |
| 2 | static_dropout_0 |
4.2030 | static_dropout_0.06 |
4.1304 | +0.0726 |
| 2 | static_dropout_0.26 |
4.1961 | static_dropout_0.06 |
4.1304 | +0.0658 |
| 2 | static_dropout_0.3 |
4.2155 | static_dropout_0.06 |
4.1304 | +0.0852 |
| 3 | wikitext103_formula_l12 |
4.0763 | static_dropout_0.08 |
4.1036 | -0.0272 |
| 3 | wikitext103_probe_blend |
4.0934 | static_dropout_0.08 |
4.1036 | -0.0102 |
| 3 | wikitext103_low_decay |
4.1006 | static_dropout_0.08 |
4.1036 | -0.0030 |
| 3 | static_dropout_0.1 |
4.1115 | static_dropout_0.08 |
4.1036 | +0.0079 |
| 3 | static_dropout_0.08 |
4.1036 | static_dropout_0.08 |
4.1036 | +0.0000 |
| 3 | static_dropout_0.06 |
4.1127 | static_dropout_0.08 |
4.1036 | +0.0092 |
| 3 | static_dropout_0.14 |
4.1240 | static_dropout_0.08 |
4.1036 | +0.0204 |
| 3 | static_dropout_0.18 |
4.1285 | static_dropout_0.08 |
4.1036 | +0.0250 |
| 3 | static_dropout_0.2 |
4.1367 | static_dropout_0.08 |
4.1036 | +0.0332 |
| 3 | static_dropout_0.04 |
4.1246 | static_dropout_0.08 |
4.1036 | +0.0211 |
| 3 | static_dropout_0.02 |
4.1443 | static_dropout_0.08 |
4.1036 | +0.0408 |
| 3 | static_dropout_0 |
4.1758 | static_dropout_0.08 |
4.1036 | +0.0722 |
| 3 | static_dropout_0.26 |
4.1796 | static_dropout_0.08 |
4.1036 | +0.0761 |
| 3 | static_dropout_0.3 |
4.1926 | static_dropout_0.08 |
4.1036 | +0.0890 |
| 4 | wikitext103_formula_l12 |
4.0845 | static_dropout_0.1 |
4.1096 | -0.0251 |
| 4 | wikitext103_probe_blend |
4.0954 | static_dropout_0.1 |
4.1096 | -0.0141 |
| 4 | wikitext103_low_decay |
4.0928 | static_dropout_0.1 |
4.1096 | -0.0167 |
| 4 | static_dropout_0.1 |
4.1096 | static_dropout_0.1 |
4.1096 | +0.0000 |
| 4 | static_dropout_0.08 |
4.1223 | static_dropout_0.1 |
4.1096 | +0.0127 |
| 4 | static_dropout_0.06 |
4.1188 | static_dropout_0.1 |
4.1096 | +0.0093 |
| 4 | static_dropout_0.14 |
4.1117 | static_dropout_0.1 |
4.1096 | +0.0021 |
| 4 | static_dropout_0.18 |
4.1330 | static_dropout_0.1 |
4.1096 | +0.0234 |
| 4 | static_dropout_0.2 |
4.1388 | static_dropout_0.1 |
4.1096 | +0.0292 |
| 4 | static_dropout_0.04 |
4.1312 | static_dropout_0.1 |
4.1096 | +0.0217 |
| 4 | static_dropout_0.02 |
4.1387 | static_dropout_0.1 |
4.1096 | +0.0291 |
| 4 | static_dropout_0 |
4.1853 | static_dropout_0.1 |
4.1096 | +0.0757 |
| 4 | static_dropout_0.26 |
4.1782 | static_dropout_0.1 |
4.1096 | +0.0686 |
| 4 | static_dropout_0.3 |
4.2007 | static_dropout_0.1 |
4.1096 | +0.0912 |
| 5 | wikitext103_formula_l12 |
4.0686 | static_dropout_0.08 |
4.1056 | -0.0370 |
| 5 | wikitext103_probe_blend |
4.1066 | static_dropout_0.08 |
4.1056 | +0.0009 |
| 5 | wikitext103_low_decay |
4.1021 | static_dropout_0.08 |
4.1056 | -0.0035 |
| 5 | static_dropout_0.1 |
4.1186 | static_dropout_0.08 |
4.1056 | +0.0129 |
| 5 | static_dropout_0.08 |
4.1056 | static_dropout_0.08 |
4.1056 | +0.0000 |
| 5 | static_dropout_0.06 |
4.1253 | static_dropout_0.08 |
4.1056 | +0.0197 |
| 5 | static_dropout_0.14 |
4.1192 | static_dropout_0.08 |
4.1056 | +0.0135 |
| 5 | static_dropout_0.18 |
4.1325 | static_dropout_0.08 |
4.1056 | +0.0269 |
| 5 | static_dropout_0.2 |
4.1418 | static_dropout_0.08 |
4.1056 | +0.0362 |
| 5 | static_dropout_0.04 |
4.1419 | static_dropout_0.08 |
4.1056 | +0.0363 |
| 5 | static_dropout_0.02 |
4.1346 | static_dropout_0.08 |
4.1056 | +0.0290 |
| 5 | static_dropout_0 |
4.1934 | static_dropout_0.08 |
4.1056 | +0.0878 |
| 5 | static_dropout_0.26 |
4.1821 | static_dropout_0.08 |
4.1056 | +0.0765 |
| 5 | static_dropout_0.3 |
4.1841 | static_dropout_0.08 |
4.1056 | +0.0785 |
Stage Trajectory
| Stage | Prefix tokens | Condition | Dropout | N | Mean val | Std val | Mean train | Mean gap |
|---|---|---|---|---|---|---|---|---|
| 0 | 250,000 | static_dropout_0.18 |
0.180 | 5 | 5.1616 | 0.0150 | 3.9964 | 1.1652 |
| 0 | 250,000 | wikitext103_low_decay |
0.140 | 5 | 5.1635 | 0.0220 | 3.9051 | 1.2585 |
| 0 | 250,000 | static_dropout_0.14 |
0.140 | 5 | 5.1635 | 0.0220 | 3.9051 | 1.2585 |
| 0 | 250,000 | wikitext103_probe_blend |
0.190 | 5 | 5.1659 | 0.0171 | 4.0201 | 1.1458 |
| 0 | 250,000 | static_dropout_0.1 |
0.100 | 5 | 5.1699 | 0.0237 | 3.8219 | 1.3480 |
| 0 | 250,000 | static_dropout_0.2 |
0.200 | 5 | 5.1701 | 0.0141 | 4.0363 | 1.1338 |
| 0 | 250,000 | static_dropout_0.08 |
0.080 | 5 | 5.1894 | 0.0161 | 3.7619 | 1.4274 |
| 0 | 250,000 | static_dropout_0.26 |
0.260 | 5 | 5.1940 | 0.0161 | 4.1496 | 1.0444 |
| 0 | 250,000 | wikitext103_formula_l12 |
0.300 | 5 | 5.2148 | 0.0181 | 4.2131 | 1.0017 |
| 0 | 250,000 | static_dropout_0.3 |
0.300 | 5 | 5.2148 | 0.0181 | 4.2131 | 1.0017 |
| 0 | 250,000 | static_dropout_0.06 |
0.060 | 5 | 5.2154 | 0.0173 | 3.7128 | 1.5026 |
| 0 | 250,000 | static_dropout_0.04 |
0.040 | 5 | 5.2378 | 0.0186 | 3.6441 | 1.5938 |
| 0 | 250,000 | static_dropout_0.02 |
0.020 | 5 | 5.2750 | 0.0255 | 3.5725 | 1.7025 |
| 0 | 250,000 | static_dropout_0 |
0.000 | 5 | 5.3403 | 0.0270 | 3.5230 | 1.8172 |
| 1 | 500,000 | wikitext103_probe_blend |
0.140 | 5 | 4.7872 | 0.0269 | 3.6846 | 1.1027 |
| 1 | 500,000 | static_dropout_0.2 |
0.200 | 5 | 4.7873 | 0.0236 | 3.7914 | 0.9959 |
| 1 | 500,000 | static_dropout_0.18 |
0.180 | 5 | 4.7946 | 0.0206 | 3.7572 | 1.0375 |
| 1 | 500,000 | static_dropout_0.14 |
0.140 | 5 | 4.8001 | 0.0198 | 3.6650 | 1.1351 |
| 1 | 500,000 | wikitext103_low_decay |
0.140 | 5 | 4.8001 | 0.0198 | 3.6650 | 1.1351 |
| 1 | 500,000 | wikitext103_formula_l12 |
0.260 | 5 | 4.8053 | 0.0278 | 3.9182 | 0.8871 |
| 1 | 500,000 | static_dropout_0.26 |
0.260 | 5 | 4.8081 | 0.0216 | 3.9053 | 0.9028 |
| 1 | 500,000 | static_dropout_0.3 |
0.300 | 5 | 4.8242 | 0.0296 | 3.9765 | 0.8476 |
| 1 | 500,000 | static_dropout_0.1 |
0.100 | 5 | 4.8332 | 0.0258 | 3.5637 | 1.2695 |
| 1 | 500,000 | static_dropout_0.08 |
0.080 | 5 | 4.8576 | 0.0239 | 3.5036 | 1.3540 |
| 1 | 500,000 | static_dropout_0.06 |
0.060 | 5 | 4.8947 | 0.0213 | 3.4394 | 1.4552 |
| 1 | 500,000 | static_dropout_0.04 |
0.040 | 5 | 4.9573 | 0.0250 | 3.3515 | 1.6058 |
| 1 | 500,000 | static_dropout_0.02 |
0.020 | 5 | 5.0451 | 0.0169 | 3.2612 | 1.7839 |
| 1 | 500,000 | static_dropout_0 |
0.000 | 5 | 5.1741 | 0.0252 | 3.1506 | 2.0235 |
| 2 | 1,000,000 | wikitext103_formula_l12 |
0.180 | 5 | 4.4938 | 0.0147 | 3.8283 | 0.6655 |
| 2 | 1,000,000 | wikitext103_probe_blend |
0.090 | 5 | 4.4940 | 0.0159 | 3.6495 | 0.8445 |
| 2 | 1,000,000 | static_dropout_0.14 |
0.140 | 5 | 4.5001 | 0.0148 | 3.7163 | 0.7838 |
| 2 | 1,000,000 | wikitext103_low_decay |
0.100 | 5 | 4.5013 | 0.0158 | 3.6607 | 0.8406 |
| 2 | 1,000,000 | static_dropout_0.18 |
0.180 | 5 | 4.5023 | 0.0185 | 3.7866 | 0.7157 |
| 2 | 1,000,000 | static_dropout_0.2 |
0.200 | 5 | 4.5060 | 0.0204 | 3.8148 | 0.6913 |
| 2 | 1,000,000 | static_dropout_0.1 |
0.100 | 5 | 4.5186 | 0.0135 | 3.6524 | 0.8662 |
| 2 | 1,000,000 | static_dropout_0.26 |
0.260 | 5 | 4.5262 | 0.0101 | 3.9071 | 0.6191 |
| 2 | 1,000,000 | static_dropout_0.08 |
0.080 | 5 | 4.5326 | 0.0117 | 3.6000 | 0.9326 |
| 2 | 1,000,000 | static_dropout_0.3 |
0.300 | 5 | 4.5462 | 0.0127 | 3.9708 | 0.5754 |
| 2 | 1,000,000 | static_dropout_0.06 |
0.060 | 5 | 4.5574 | 0.0126 | 3.5554 | 1.0020 |
| 2 | 1,000,000 | static_dropout_0.04 |
0.040 | 5 | 4.5959 | 0.0146 | 3.5030 | 1.0929 |
| 2 | 1,000,000 | static_dropout_0.02 |
0.020 | 5 | 4.6558 | 0.0159 | 3.4324 | 1.2234 |
| 2 | 1,000,000 | static_dropout_0 |
0.000 | 5 | 4.7661 | 0.0324 | 3.3658 | 1.4003 |
| 3 | 2,000,000 | wikitext103_formula_l12 |
0.090 | 5 | 4.2607 | 0.0181 | 3.8089 | 0.4518 |
| 3 | 2,000,000 | wikitext103_low_decay |
0.060 | 5 | 4.2736 | 0.0228 | 3.7474 | 0.5261 |
| 3 | 2,000,000 | wikitext103_probe_blend |
0.040 | 5 | 4.2743 | 0.0174 | 3.7200 | 0.5543 |
| 3 | 2,000,000 | static_dropout_0.14 |
0.140 | 5 | 4.2816 | 0.0214 | 3.8287 | 0.4529 |
| 3 | 2,000,000 | static_dropout_0.1 |
0.100 | 5 | 4.2857 | 0.0190 | 3.7809 | 0.5048 |
| 3 | 2,000,000 | static_dropout_0.18 |
0.180 | 5 | 4.2889 | 0.0195 | 3.8768 | 0.4121 |
| 3 | 2,000,000 | static_dropout_0.08 |
0.080 | 5 | 4.2925 | 0.0166 | 3.7519 | 0.5406 |
| 3 | 2,000,000 | static_dropout_0.2 |
0.200 | 5 | 4.2942 | 0.0160 | 3.8982 | 0.3960 |
| 3 | 2,000,000 | static_dropout_0.06 |
0.060 | 5 | 4.3059 | 0.0182 | 3.7318 | 0.5741 |
| 3 | 2,000,000 | static_dropout_0.26 |
0.260 | 5 | 4.3248 | 0.0179 | 3.9655 | 0.3593 |
| 3 | 2,000,000 | static_dropout_0.04 |
0.040 | 5 | 4.3262 | 0.0165 | 3.7038 | 0.6225 |
| 3 | 2,000,000 | static_dropout_0.3 |
0.300 | 5 | 4.3467 | 0.0159 | 4.0097 | 0.3370 |
| 3 | 2,000,000 | static_dropout_0.02 |
0.020 | 5 | 4.3551 | 0.0265 | 3.6673 | 0.6878 |
| 3 | 2,000,000 | static_dropout_0 |
0.000 | 5 | 4.4174 | 0.0188 | 3.6409 | 0.7765 |
| 4 | 4,000,000 | wikitext103_formula_l12 |
0.020 | 5 | 4.0808 | 0.0195 | 3.7991 | 0.2817 |
| 4 | 4,000,000 | wikitext103_probe_blend |
0.010 | 5 | 4.0961 | 0.0145 | 3.7674 | 0.3287 |
| 4 | 4,000,000 | wikitext103_low_decay |
0.020 | 5 | 4.1020 | 0.0166 | 3.7769 | 0.3251 |
| 4 | 4,000,000 | static_dropout_0.1 |
0.100 | 5 | 4.1105 | 0.0188 | 3.8417 | 0.2687 |
| 4 | 4,000,000 | static_dropout_0.08 |
0.080 | 5 | 4.1116 | 0.0186 | 3.8268 | 0.2848 |
| 4 | 4,000,000 | static_dropout_0.06 |
0.060 | 5 | 4.1197 | 0.0082 | 3.8066 | 0.3131 |
| 4 | 4,000,000 | static_dropout_0.14 |
0.140 | 5 | 4.1221 | 0.0155 | 3.8674 | 0.2548 |
| 4 | 4,000,000 | static_dropout_0.18 |
0.180 | 5 | 4.1304 | 0.0130 | 3.9015 | 0.2289 |
| 4 | 4,000,000 | static_dropout_0.04 |
0.040 | 5 | 4.1331 | 0.0227 | 3.7978 | 0.3353 |
| 4 | 4,000,000 | static_dropout_0.2 |
0.200 | 5 | 4.1394 | 0.0167 | 3.9155 | 0.2239 |
| 4 | 4,000,000 | static_dropout_0.02 |
0.020 | 5 | 4.1459 | 0.0165 | 3.7759 | 0.3700 |
| 4 | 4,000,000 | static_dropout_0.26 |
0.260 | 5 | 4.1784 | 0.0145 | 3.9775 | 0.2008 |
| 4 | 4,000,000 | static_dropout_0 |
0.000 | 5 | 4.1835 | 0.0165 | 3.7750 | 0.4085 |
| 4 | 4,000,000 | static_dropout_0.3 |
0.300 | 5 | 4.1946 | 0.0141 | 4.0127 | 0.1819 |
Interpretation
wikitext103_formula_l12has the best 5-seed mean final validation loss: 4.0808 +/- 0.0195.- The second-best final condition is
wikitext103_probe_blendat 4.0961 +/- 0.0145. - The best static baseline by mean final loss is
static_dropout_0.1at 4.1105 +/- 0.0188. wikitext103_formula_l12beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0181.wikitext103_probe_blendbeats the per-seed best static baseline in 4/5 seeds; worst paired delta is +0.0009.wikitext103_low_decaybeats the per-seed best static baseline in 4/5 seeds; worst paired delta is +0.0047.- The best first-stage condition is
static_dropout_0.18at prefix 250,000 with mean validation loss 5.1616; compare this with the final ranking before claiming a schedule is uniformly better. - This is a saved-run streaming validation artifact. Treat it as strong evidence only when the tested conditions, seeds, static baselines, and stream protocol match the claim being made.