dropout-decay / docs /openwebtext10k_streaming_report.md
Mandeep Sidhu
Document regime runbook and schedule provenance
b5daf7c

OpenWebText10K Streaming Validation

Date: 2026-05-30

This report combines 5 random seeds (1, 2, 3, 4, 5) from saved streaming runs. No additional training is performed by this script; it reads saved metrics.jsonl files.

Regime: OpenWebText10K cached-corpus streaming setup with L16_H8_D384, 31,457,280 parameters, five prefixes from 250k to 4M tokens, and 1,000 optimizer steps per stage. This is a clean five-seed run including the OpenWebText10K interaction schedule, empirical decay schedules, and static baselines.

Sources

  • runs/openwebtext10k_l16_updated_formula_clean_5seed/locked_stream/20260530-174525/metrics.jsonl

Condition Provenance

The anchor_decay label means the dropout value is chosen from explicit prefix-token anchors. It does not by itself imply that the schedule came from the coefficient formula.

Condition Provenance Dropout path Interpretation
openwebtext10k_interaction coefficient-derived schedule 0.39 -> 0.32 -> 0.23 -> 0.14 -> 0.07 Main OpenWebText10K formula-derived schedule. This is the condition that tests the regime-specific interaction coefficient hypothesis.
hold_30_then_decay heuristic schedule-search ablation 0.30 -> 0.30 -> 0.20 -> 0.10 -> 0.02 Manually specified after exploratory single-seed OpenWebText10K schedule search. It caps the initial dropout at 0.30, holds it for the two smallest stream prefixes, then releases capacity aggressively.
mild_30_to_08 heuristic schedule-search ablation 0.30 -> 0.24 -> 0.18 -> 0.12 -> 0.08 Manually specified after exploratory single-seed OpenWebText10K schedule search. It tests whether a smoother decay from 0.30 to a moderate final dropout is competitive.
fitted_l16_static_law older fitted/static-law schedule 0.60 -> 0.40 -> 0.30 -> 0.14 -> 0.02 Retained as a comparison to the earlier overly aggressive fitted schedule; it is not the current interaction formula schedule.
static_dropout_* static baseline constant Fixed dropout used at every stream prefix.

The two heuristic schedules should be treated as ablations, not as independent evidence that the coefficient formula generated their exact paths. Their role is to show that the shape of the decay matters and that reasonable hand-designed decays can also beat weak static choices. The main formula claim for this regime should be based on openwebtext10k_interaction.

Condition Ranking By Final Loss

Condition Kind N Mean trajectory val Std trajectory val Mean final val Std final val Mean final gap Dropout path
openwebtext10k_interaction anchor_decay 5 4.8609 0.0046 4.3981 0.0095 0.3177 0.39 -> 0.32 -> 0.23 -> 0.14 -> 0.07
hold_30_then_decay anchor_decay 5 4.8512 0.0017 4.4052 0.0112 0.3565 0.30 -> 0.30 -> 0.20 -> 0.10 -> 0.02
mild_30_to_08 anchor_decay 5 4.8509 0.0015 4.4073 0.0085 0.3337 0.30 -> 0.24 -> 0.18 -> 0.12 -> 0.08
fitted_l16_static_law anchor_decay 5 4.9521 0.0039 4.4124 0.0084 0.3137 0.60 -> 0.40 -> 0.30 -> 0.14 -> 0.02
static_dropout_0.14 static 5 4.9051 0.0088 4.4455 0.0120 0.3289 0.14 -> 0.14 -> 0.14 -> 0.14 -> 0.14
static_dropout_0.3 static 5 4.8767 0.0019 4.4668 0.0141 0.2349 0.30 -> 0.30 -> 0.30 -> 0.30 -> 0.30
static_dropout_0.02 static 5 5.1571 0.0097 4.5358 0.0091 0.4829 0.02 -> 0.02 -> 0.02 -> 0.02 -> 0.02
static_dropout_0 static 5 5.2511 0.0160 4.5943 0.0216 0.5529 0.00 -> 0.00 -> 0.00 -> 0.00 -> 0.00

Paired Final-Loss Deltas

Negative delta_vs_best_static means the condition beat the best static baseline for that seed.

Seed Condition Final val Best static Best static final val Delta vs best static
1 openwebtext10k_interaction 4.4023 static_dropout_0.14 4.4418 -0.0394
1 hold_30_then_decay 4.3939 static_dropout_0.14 4.4418 -0.0479
1 mild_30_to_08 4.3995 static_dropout_0.14 4.4418 -0.0423
1 fitted_l16_static_law 4.4207 static_dropout_0.14 4.4418 -0.0211
1 static_dropout_0.14 4.4418 static_dropout_0.14 4.4418 +0.0000
1 static_dropout_0.3 4.4602 static_dropout_0.14 4.4418 +0.0184
1 static_dropout_0.02 4.5402 static_dropout_0.14 4.4418 +0.0984
1 static_dropout_0 4.5704 static_dropout_0.14 4.4418 +0.1286
2 openwebtext10k_interaction 4.4020 static_dropout_0.14 4.4602 -0.0583
2 hold_30_then_decay 4.4068 static_dropout_0.14 4.4602 -0.0534
2 mild_30_to_08 4.4080 static_dropout_0.14 4.4602 -0.0522
2 fitted_l16_static_law 4.4136 static_dropout_0.14 4.4602 -0.0466
2 static_dropout_0.14 4.4602 static_dropout_0.14 4.4602 +0.0000
2 static_dropout_0.3 4.4719 static_dropout_0.14 4.4602 +0.0117
2 static_dropout_0.02 4.5466 static_dropout_0.14 4.4602 +0.0864
2 static_dropout_0 4.6094 static_dropout_0.14 4.4602 +0.1492
3 openwebtext10k_interaction 4.4029 static_dropout_0.14 4.4356 -0.0328
3 hold_30_then_decay 4.4174 static_dropout_0.14 4.4356 -0.0183
3 mild_30_to_08 4.4151 static_dropout_0.14 4.4356 -0.0206
3 fitted_l16_static_law 4.4134 static_dropout_0.14 4.4356 -0.0223
3 static_dropout_0.14 4.4356 static_dropout_0.14 4.4356 +0.0000
3 static_dropout_0.3 4.4758 static_dropout_0.14 4.4356 +0.0401
3 static_dropout_0.02 4.5345 static_dropout_0.14 4.4356 +0.0988
3 static_dropout_0 4.5928 static_dropout_0.14 4.4356 +0.1571
4 openwebtext10k_interaction 4.3811 static_dropout_0.14 4.4337 -0.0526
4 hold_30_then_decay 4.3936 static_dropout_0.14 4.4337 -0.0400
4 mild_30_to_08 4.3978 static_dropout_0.14 4.4337 -0.0359
4 fitted_l16_static_law 4.3983 static_dropout_0.14 4.4337 -0.0354
4 static_dropout_0.14 4.4337 static_dropout_0.14 4.4337 +0.0000
4 static_dropout_0.3 4.4455 static_dropout_0.14 4.4337 +0.0118
4 static_dropout_0.02 4.5220 static_dropout_0.14 4.4337 +0.0883
4 static_dropout_0 4.5768 static_dropout_0.14 4.4337 +0.1432
5 openwebtext10k_interaction 4.4024 static_dropout_0.14 4.4560 -0.0536
5 hold_30_then_decay 4.4145 static_dropout_0.14 4.4560 -0.0415
5 mild_30_to_08 4.4161 static_dropout_0.14 4.4560 -0.0399
5 fitted_l16_static_law 4.4161 static_dropout_0.14 4.4560 -0.0399
5 static_dropout_0.14 4.4560 static_dropout_0.14 4.4560 +0.0000
5 static_dropout_0.3 4.4805 static_dropout_0.14 4.4560 +0.0245
5 static_dropout_0.02 4.5355 static_dropout_0.14 4.4560 +0.0796
5 static_dropout_0 4.6219 static_dropout_0.14 4.4560 +0.1660

Stage Trajectory

Stage Prefix tokens Condition Dropout N Mean val Std val Mean train Mean gap
0 250,000 mild_30_to_08 0.300 5 5.4483 0.0138 4.4429 1.0054
0 250,000 hold_30_then_decay 0.300 5 5.4483 0.0138 4.4429 1.0054
0 250,000 static_dropout_0.3 0.300 5 5.4483 0.0138 4.4429 1.0054
0 250,000 static_dropout_0.14 0.140 5 5.4773 0.0224 4.0298 1.4475
0 250,000 openwebtext10k_interaction 0.385 5 5.4947 0.0109 4.6016 0.8930
0 250,000 static_dropout_0.02 0.020 5 5.7426 0.0242 3.5371 2.2055
0 250,000 fitted_l16_static_law 0.600 5 5.7842 0.0096 5.1640 0.6202
0 250,000 static_dropout_0 0.000 5 5.8330 0.0198 3.4443 2.3887
1 500,000 mild_30_to_08 0.240 5 5.0582 0.0159 4.0349 1.0233
1 500,000 static_dropout_0.3 0.300 5 5.0667 0.0173 4.1383 0.9284
1 500,000 hold_30_then_decay 0.300 5 5.0667 0.0173 4.1383 0.9284
1 500,000 openwebtext10k_interaction 0.319 5 5.0715 0.0118 4.2065 0.8650
1 500,000 static_dropout_0.14 0.140 5 5.1492 0.0070 3.7143 1.4349
1 500,000 fitted_l16_static_law 0.400 5 5.1507 0.0102 4.4632 0.6875
1 500,000 static_dropout_0.02 0.020 5 5.5754 0.0248 3.1246 2.4508
1 500,000 static_dropout_0 0.000 5 5.7175 0.0502 2.9583 2.7592
2 1,000,000 hold_30_then_decay 0.200 5 4.7757 0.0144 4.0378 0.7379
2 1,000,000 mild_30_to_08 0.180 5 4.7774 0.0138 3.9886 0.7888
2 1,000,000 openwebtext10k_interaction 0.227 5 4.7811 0.0084 4.0826 0.6984
2 1,000,000 static_dropout_0.3 0.300 5 4.7983 0.0144 4.1501 0.6481
2 1,000,000 fitted_l16_static_law 0.300 5 4.8326 0.0102 4.2632 0.5694
2 1,000,000 static_dropout_0.14 0.140 5 4.8490 0.0202 3.8712 0.9779
2 1,000,000 static_dropout_0.02 0.020 5 5.1470 0.0222 3.4615 1.6854
2 1,000,000 static_dropout_0 0.000 5 5.2637 0.0274 3.3260 1.9377
3 2,000,000 openwebtext10k_interaction 0.139 5 4.5590 0.0142 4.0802 0.4788
3 2,000,000 hold_30_then_decay 0.100 5 4.5599 0.0161 4.0445 0.5154
3 2,000,000 mild_30_to_08 0.120 5 4.5631 0.0155 4.0441 0.5190
3 2,000,000 fitted_l16_static_law 0.140 5 4.5806 0.0153 4.1471 0.4334
3 2,000,000 static_dropout_0.3 0.300 5 4.6035 0.0141 4.2150 0.3885
3 2,000,000 static_dropout_0.14 0.140 5 4.6048 0.0136 4.0399 0.5648
3 2,000,000 static_dropout_0.02 0.020 5 4.7847 0.0196 3.8405 0.9442
3 2,000,000 static_dropout_0 0.000 5 4.8472 0.0171 3.7786 1.0687
4 4,000,000 openwebtext10k_interaction 0.066 5 4.3981 0.0095 4.0805 0.3177
4 4,000,000 hold_30_then_decay 0.020 5 4.4052 0.0112 4.0488 0.3565
4 4,000,000 mild_30_to_08 0.080 5 4.4073 0.0085 4.0736 0.3337
4 4,000,000 fitted_l16_static_law 0.020 5 4.4124 0.0084 4.0987 0.3137
4 4,000,000 static_dropout_0.14 0.140 5 4.4455 0.0120 4.1165 0.3289
4 4,000,000 static_dropout_0.3 0.300 5 4.4668 0.0141 4.2319 0.2349
4 4,000,000 static_dropout_0.02 0.020 5 4.5358 0.0091 4.0529 0.4829
4 4,000,000 static_dropout_0 0.000 5 4.5943 0.0216 4.0414 0.5529

Interpretation

  • openwebtext10k_interaction has the best 5-seed mean final validation loss: 4.3981 +/- 0.0095.
  • The second-best final condition is hold_30_then_decay at 4.4052 +/- 0.0112.
  • The best static baseline by mean final loss is static_dropout_0.14 at 4.4455 +/- 0.0120.
  • openwebtext10k_interaction beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0328.
  • hold_30_then_decay beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0183.
  • mild_30_to_08 beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0206.
  • fitted_l16_static_law beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0211.
  • The best first-stage condition is mild_30_to_08 at prefix 250,000 with mean validation loss 5.4483; compare this with the final ranking before claiming a schedule is uniformly better.
  • This is a saved-run streaming validation artifact. Treat it as strong evidence only when the tested conditions, seeds, static baselines, and stream protocol match the claim being made.