dropout-decay / docs /wikitext103_streaming_report.md
Mandeep Sidhu
Add WikiText-103 five-seed streaming validation
cf52b0e

WikiText-103 Streaming Validation

Date: 2026-05-31

This report combines 5 random seeds (1, 2, 3, 4, 5) from saved streaming runs. No additional training is performed by this script; it reads saved metrics.jsonl files.

Regime: WikiText-103 cached-corpus streaming setup with L12_H8_D320, 17,367,040 parameters, five prefixes from 250k to 4M tokens, and 1,000 optimizer steps per stage. This is a clean five-seed run including three dropout decay schedules and broad static dropout baselines from 0.00 through 0.30.

Sources

  • runs/wikitext103_l12_streaming_validation_5seed/locked_stream/20260531-093525/metrics.jsonl

Condition Ranking By Final Loss

Condition Kind N Mean trajectory val Std trajectory val Mean final val Std final val Mean final gap Dropout path
wikitext103_formula_l12 anchor_decay 5 4.5711 0.0045 4.0808 0.0195 0.2817 0.30 -> 0.26 -> 0.18 -> 0.09 -> 0.02
wikitext103_probe_blend anchor_decay 5 4.5635 0.0046 4.0961 0.0145 0.3287 0.19 -> 0.14 -> 0.09 -> 0.04 -> 0.01
wikitext103_low_decay anchor_decay 5 4.5681 0.0073 4.1020 0.0166 0.3251 0.14 -> 0.14 -> 0.10 -> 0.06 -> 0.02
static_dropout_0.1 static 5 4.5836 0.0062 4.1105 0.0188 0.2687 0.10 -> 0.10 -> 0.10 -> 0.10 -> 0.10
static_dropout_0.08 static 5 4.5967 0.0073 4.1116 0.0186 0.2848 0.08 -> 0.08 -> 0.08 -> 0.08 -> 0.08
static_dropout_0.06 static 5 4.6186 0.0048 4.1197 0.0082 0.3131 0.06 -> 0.06 -> 0.06 -> 0.06 -> 0.06
static_dropout_0.14 static 5 4.5735 0.0077 4.1221 0.0155 0.2548 0.14 -> 0.14 -> 0.14 -> 0.14 -> 0.14
static_dropout_0.18 static 5 4.5756 0.0041 4.1304 0.0130 0.2289 0.18 -> 0.18 -> 0.18 -> 0.18 -> 0.18
static_dropout_0.04 static 5 4.6501 0.0077 4.1331 0.0227 0.3353 0.04 -> 0.04 -> 0.04 -> 0.04 -> 0.04
static_dropout_0.2 static 5 4.5794 0.0050 4.1394 0.0167 0.2239 0.20 -> 0.20 -> 0.20 -> 0.20 -> 0.20
static_dropout_0.02 static 5 4.6954 0.0086 4.1459 0.0165 0.3700 0.02 -> 0.02 -> 0.02 -> 0.02 -> 0.02
static_dropout_0.26 static 5 4.6063 0.0051 4.1784 0.0145 0.2008 0.26 -> 0.26 -> 0.26 -> 0.26 -> 0.26
static_dropout_0 static 5 4.7762 0.0109 4.1835 0.0165 0.4085 0.00 -> 0.00 -> 0.00 -> 0.00 -> 0.00
static_dropout_0.3 static 5 4.6253 0.0034 4.1946 0.0141 0.1819 0.30 -> 0.30 -> 0.30 -> 0.30 -> 0.30

Paired Final-Loss Deltas

Negative delta_vs_best_static means the condition beat the best static baseline for that seed.

Seed Condition Final val Best static Best static final val Delta vs best static
1 wikitext103_formula_l12 4.0623 static_dropout_0.1 4.0807 -0.0184
1 wikitext103_probe_blend 4.0738 static_dropout_0.1 4.0807 -0.0069
1 wikitext103_low_decay 4.0854 static_dropout_0.1 4.0807 +0.0047
1 static_dropout_0.1 4.0807 static_dropout_0.1 4.0807 +0.0000
1 static_dropout_0.08 4.0893 static_dropout_0.1 4.0807 +0.0086
1 static_dropout_0.06 4.1112 static_dropout_0.1 4.0807 +0.0305
1 static_dropout_0.14 4.1082 static_dropout_0.1 4.0807 +0.0275
1 static_dropout_0.18 4.1108 static_dropout_0.1 4.0807 +0.0301
1 static_dropout_0.2 4.1162 static_dropout_0.1 4.0807 +0.0355
1 static_dropout_0.04 4.1031 static_dropout_0.1 4.0807 +0.0224
1 static_dropout_0.02 4.1371 static_dropout_0.1 4.0807 +0.0564
1 static_dropout_0 4.1600 static_dropout_0.1 4.0807 +0.0793
1 static_dropout_0.26 4.1557 static_dropout_0.1 4.0807 +0.0750
1 static_dropout_0.3 4.1802 static_dropout_0.1 4.0807 +0.0994
2 wikitext103_formula_l12 4.1123 static_dropout_0.06 4.1304 -0.0181
2 wikitext103_probe_blend 4.1113 static_dropout_0.06 4.1304 -0.0191
2 wikitext103_low_decay 4.1291 static_dropout_0.06 4.1304 -0.0013
2 static_dropout_0.1 4.1320 static_dropout_0.06 4.1304 +0.0016
2 static_dropout_0.08 4.1374 static_dropout_0.06 4.1304 +0.0071
2 static_dropout_0.06 4.1304 static_dropout_0.06 4.1304 +0.0000
2 static_dropout_0.14 4.1476 static_dropout_0.06 4.1304 +0.0172
2 static_dropout_0.18 4.1471 static_dropout_0.06 4.1304 +0.0167
2 static_dropout_0.2 4.1633 static_dropout_0.06 4.1304 +0.0329
2 static_dropout_0.04 4.1648 static_dropout_0.06 4.1304 +0.0344
2 static_dropout_0.02 4.1746 static_dropout_0.06 4.1304 +0.0442
2 static_dropout_0 4.2030 static_dropout_0.06 4.1304 +0.0726
2 static_dropout_0.26 4.1961 static_dropout_0.06 4.1304 +0.0658
2 static_dropout_0.3 4.2155 static_dropout_0.06 4.1304 +0.0852
3 wikitext103_formula_l12 4.0763 static_dropout_0.08 4.1036 -0.0272
3 wikitext103_probe_blend 4.0934 static_dropout_0.08 4.1036 -0.0102
3 wikitext103_low_decay 4.1006 static_dropout_0.08 4.1036 -0.0030
3 static_dropout_0.1 4.1115 static_dropout_0.08 4.1036 +0.0079
3 static_dropout_0.08 4.1036 static_dropout_0.08 4.1036 +0.0000
3 static_dropout_0.06 4.1127 static_dropout_0.08 4.1036 +0.0092
3 static_dropout_0.14 4.1240 static_dropout_0.08 4.1036 +0.0204
3 static_dropout_0.18 4.1285 static_dropout_0.08 4.1036 +0.0250
3 static_dropout_0.2 4.1367 static_dropout_0.08 4.1036 +0.0332
3 static_dropout_0.04 4.1246 static_dropout_0.08 4.1036 +0.0211
3 static_dropout_0.02 4.1443 static_dropout_0.08 4.1036 +0.0408
3 static_dropout_0 4.1758 static_dropout_0.08 4.1036 +0.0722
3 static_dropout_0.26 4.1796 static_dropout_0.08 4.1036 +0.0761
3 static_dropout_0.3 4.1926 static_dropout_0.08 4.1036 +0.0890
4 wikitext103_formula_l12 4.0845 static_dropout_0.1 4.1096 -0.0251
4 wikitext103_probe_blend 4.0954 static_dropout_0.1 4.1096 -0.0141
4 wikitext103_low_decay 4.0928 static_dropout_0.1 4.1096 -0.0167
4 static_dropout_0.1 4.1096 static_dropout_0.1 4.1096 +0.0000
4 static_dropout_0.08 4.1223 static_dropout_0.1 4.1096 +0.0127
4 static_dropout_0.06 4.1188 static_dropout_0.1 4.1096 +0.0093
4 static_dropout_0.14 4.1117 static_dropout_0.1 4.1096 +0.0021
4 static_dropout_0.18 4.1330 static_dropout_0.1 4.1096 +0.0234
4 static_dropout_0.2 4.1388 static_dropout_0.1 4.1096 +0.0292
4 static_dropout_0.04 4.1312 static_dropout_0.1 4.1096 +0.0217
4 static_dropout_0.02 4.1387 static_dropout_0.1 4.1096 +0.0291
4 static_dropout_0 4.1853 static_dropout_0.1 4.1096 +0.0757
4 static_dropout_0.26 4.1782 static_dropout_0.1 4.1096 +0.0686
4 static_dropout_0.3 4.2007 static_dropout_0.1 4.1096 +0.0912
5 wikitext103_formula_l12 4.0686 static_dropout_0.08 4.1056 -0.0370
5 wikitext103_probe_blend 4.1066 static_dropout_0.08 4.1056 +0.0009
5 wikitext103_low_decay 4.1021 static_dropout_0.08 4.1056 -0.0035
5 static_dropout_0.1 4.1186 static_dropout_0.08 4.1056 +0.0129
5 static_dropout_0.08 4.1056 static_dropout_0.08 4.1056 +0.0000
5 static_dropout_0.06 4.1253 static_dropout_0.08 4.1056 +0.0197
5 static_dropout_0.14 4.1192 static_dropout_0.08 4.1056 +0.0135
5 static_dropout_0.18 4.1325 static_dropout_0.08 4.1056 +0.0269
5 static_dropout_0.2 4.1418 static_dropout_0.08 4.1056 +0.0362
5 static_dropout_0.04 4.1419 static_dropout_0.08 4.1056 +0.0363
5 static_dropout_0.02 4.1346 static_dropout_0.08 4.1056 +0.0290
5 static_dropout_0 4.1934 static_dropout_0.08 4.1056 +0.0878
5 static_dropout_0.26 4.1821 static_dropout_0.08 4.1056 +0.0765
5 static_dropout_0.3 4.1841 static_dropout_0.08 4.1056 +0.0785

Stage Trajectory

Stage Prefix tokens Condition Dropout N Mean val Std val Mean train Mean gap
0 250,000 static_dropout_0.18 0.180 5 5.1616 0.0150 3.9964 1.1652
0 250,000 wikitext103_low_decay 0.140 5 5.1635 0.0220 3.9051 1.2585
0 250,000 static_dropout_0.14 0.140 5 5.1635 0.0220 3.9051 1.2585
0 250,000 wikitext103_probe_blend 0.190 5 5.1659 0.0171 4.0201 1.1458
0 250,000 static_dropout_0.1 0.100 5 5.1699 0.0237 3.8219 1.3480
0 250,000 static_dropout_0.2 0.200 5 5.1701 0.0141 4.0363 1.1338
0 250,000 static_dropout_0.08 0.080 5 5.1894 0.0161 3.7619 1.4274
0 250,000 static_dropout_0.26 0.260 5 5.1940 0.0161 4.1496 1.0444
0 250,000 wikitext103_formula_l12 0.300 5 5.2148 0.0181 4.2131 1.0017
0 250,000 static_dropout_0.3 0.300 5 5.2148 0.0181 4.2131 1.0017
0 250,000 static_dropout_0.06 0.060 5 5.2154 0.0173 3.7128 1.5026
0 250,000 static_dropout_0.04 0.040 5 5.2378 0.0186 3.6441 1.5938
0 250,000 static_dropout_0.02 0.020 5 5.2750 0.0255 3.5725 1.7025
0 250,000 static_dropout_0 0.000 5 5.3403 0.0270 3.5230 1.8172
1 500,000 wikitext103_probe_blend 0.140 5 4.7872 0.0269 3.6846 1.1027
1 500,000 static_dropout_0.2 0.200 5 4.7873 0.0236 3.7914 0.9959
1 500,000 static_dropout_0.18 0.180 5 4.7946 0.0206 3.7572 1.0375
1 500,000 static_dropout_0.14 0.140 5 4.8001 0.0198 3.6650 1.1351
1 500,000 wikitext103_low_decay 0.140 5 4.8001 0.0198 3.6650 1.1351
1 500,000 wikitext103_formula_l12 0.260 5 4.8053 0.0278 3.9182 0.8871
1 500,000 static_dropout_0.26 0.260 5 4.8081 0.0216 3.9053 0.9028
1 500,000 static_dropout_0.3 0.300 5 4.8242 0.0296 3.9765 0.8476
1 500,000 static_dropout_0.1 0.100 5 4.8332 0.0258 3.5637 1.2695
1 500,000 static_dropout_0.08 0.080 5 4.8576 0.0239 3.5036 1.3540
1 500,000 static_dropout_0.06 0.060 5 4.8947 0.0213 3.4394 1.4552
1 500,000 static_dropout_0.04 0.040 5 4.9573 0.0250 3.3515 1.6058
1 500,000 static_dropout_0.02 0.020 5 5.0451 0.0169 3.2612 1.7839
1 500,000 static_dropout_0 0.000 5 5.1741 0.0252 3.1506 2.0235
2 1,000,000 wikitext103_formula_l12 0.180 5 4.4938 0.0147 3.8283 0.6655
2 1,000,000 wikitext103_probe_blend 0.090 5 4.4940 0.0159 3.6495 0.8445
2 1,000,000 static_dropout_0.14 0.140 5 4.5001 0.0148 3.7163 0.7838
2 1,000,000 wikitext103_low_decay 0.100 5 4.5013 0.0158 3.6607 0.8406
2 1,000,000 static_dropout_0.18 0.180 5 4.5023 0.0185 3.7866 0.7157
2 1,000,000 static_dropout_0.2 0.200 5 4.5060 0.0204 3.8148 0.6913
2 1,000,000 static_dropout_0.1 0.100 5 4.5186 0.0135 3.6524 0.8662
2 1,000,000 static_dropout_0.26 0.260 5 4.5262 0.0101 3.9071 0.6191
2 1,000,000 static_dropout_0.08 0.080 5 4.5326 0.0117 3.6000 0.9326
2 1,000,000 static_dropout_0.3 0.300 5 4.5462 0.0127 3.9708 0.5754
2 1,000,000 static_dropout_0.06 0.060 5 4.5574 0.0126 3.5554 1.0020
2 1,000,000 static_dropout_0.04 0.040 5 4.5959 0.0146 3.5030 1.0929
2 1,000,000 static_dropout_0.02 0.020 5 4.6558 0.0159 3.4324 1.2234
2 1,000,000 static_dropout_0 0.000 5 4.7661 0.0324 3.3658 1.4003
3 2,000,000 wikitext103_formula_l12 0.090 5 4.2607 0.0181 3.8089 0.4518
3 2,000,000 wikitext103_low_decay 0.060 5 4.2736 0.0228 3.7474 0.5261
3 2,000,000 wikitext103_probe_blend 0.040 5 4.2743 0.0174 3.7200 0.5543
3 2,000,000 static_dropout_0.14 0.140 5 4.2816 0.0214 3.8287 0.4529
3 2,000,000 static_dropout_0.1 0.100 5 4.2857 0.0190 3.7809 0.5048
3 2,000,000 static_dropout_0.18 0.180 5 4.2889 0.0195 3.8768 0.4121
3 2,000,000 static_dropout_0.08 0.080 5 4.2925 0.0166 3.7519 0.5406
3 2,000,000 static_dropout_0.2 0.200 5 4.2942 0.0160 3.8982 0.3960
3 2,000,000 static_dropout_0.06 0.060 5 4.3059 0.0182 3.7318 0.5741
3 2,000,000 static_dropout_0.26 0.260 5 4.3248 0.0179 3.9655 0.3593
3 2,000,000 static_dropout_0.04 0.040 5 4.3262 0.0165 3.7038 0.6225
3 2,000,000 static_dropout_0.3 0.300 5 4.3467 0.0159 4.0097 0.3370
3 2,000,000 static_dropout_0.02 0.020 5 4.3551 0.0265 3.6673 0.6878
3 2,000,000 static_dropout_0 0.000 5 4.4174 0.0188 3.6409 0.7765
4 4,000,000 wikitext103_formula_l12 0.020 5 4.0808 0.0195 3.7991 0.2817
4 4,000,000 wikitext103_probe_blend 0.010 5 4.0961 0.0145 3.7674 0.3287
4 4,000,000 wikitext103_low_decay 0.020 5 4.1020 0.0166 3.7769 0.3251
4 4,000,000 static_dropout_0.1 0.100 5 4.1105 0.0188 3.8417 0.2687
4 4,000,000 static_dropout_0.08 0.080 5 4.1116 0.0186 3.8268 0.2848
4 4,000,000 static_dropout_0.06 0.060 5 4.1197 0.0082 3.8066 0.3131
4 4,000,000 static_dropout_0.14 0.140 5 4.1221 0.0155 3.8674 0.2548
4 4,000,000 static_dropout_0.18 0.180 5 4.1304 0.0130 3.9015 0.2289
4 4,000,000 static_dropout_0.04 0.040 5 4.1331 0.0227 3.7978 0.3353
4 4,000,000 static_dropout_0.2 0.200 5 4.1394 0.0167 3.9155 0.2239
4 4,000,000 static_dropout_0.02 0.020 5 4.1459 0.0165 3.7759 0.3700
4 4,000,000 static_dropout_0.26 0.260 5 4.1784 0.0145 3.9775 0.2008
4 4,000,000 static_dropout_0 0.000 5 4.1835 0.0165 3.7750 0.4085
4 4,000,000 static_dropout_0.3 0.300 5 4.1946 0.0141 4.0127 0.1819

Interpretation

  • wikitext103_formula_l12 has the best 5-seed mean final validation loss: 4.0808 +/- 0.0195.
  • The second-best final condition is wikitext103_probe_blend at 4.0961 +/- 0.0145.
  • The best static baseline by mean final loss is static_dropout_0.1 at 4.1105 +/- 0.0188.
  • wikitext103_formula_l12 beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0181.
  • wikitext103_probe_blend beats the per-seed best static baseline in 4/5 seeds; worst paired delta is +0.0009.
  • wikitext103_low_decay beats the per-seed best static baseline in 4/5 seeds; worst paired delta is +0.0047.
  • The best first-stage condition is static_dropout_0.18 at prefix 250,000 with mean validation loss 5.1616; compare this with the final ranking before claiming a schedule is uniformly better.
  • This is a saved-run streaming validation artifact. Treat it as strong evidence only when the tested conditions, seeds, static baselines, and stream protocol match the claim being made.