OpenWebText10K Streaming Validation
Date: 2026-05-30
This report combines 5 random seeds (1, 2, 3, 4, 5) from saved streaming runs.
No additional training is performed by this script; it reads saved
metrics.jsonl files.
Regime: OpenWebText10K cached-corpus streaming setup with L16_H8_D384, 31,457,280 parameters, five prefixes from 250k to 4M tokens, and 1,000 optimizer steps per stage. This is a clean five-seed run including the OpenWebText10K interaction schedule, empirical decay schedules, and static baselines.
Sources
runs/openwebtext10k_l16_updated_formula_clean_5seed/locked_stream/20260530-174525/metrics.jsonl
Condition Provenance
The anchor_decay label means the dropout value is chosen from explicit
prefix-token anchors. It does not by itself imply that the schedule came from
the coefficient formula.
| Condition | Provenance | Dropout path | Interpretation |
|---|---|---|---|
openwebtext10k_interaction |
coefficient-derived schedule | 0.39 -> 0.32 -> 0.23 -> 0.14 -> 0.07 |
Main OpenWebText10K formula-derived schedule. This is the condition that tests the regime-specific interaction coefficient hypothesis. |
hold_30_then_decay |
heuristic schedule-search ablation | 0.30 -> 0.30 -> 0.20 -> 0.10 -> 0.02 |
Manually specified after exploratory single-seed OpenWebText10K schedule search. It caps the initial dropout at 0.30, holds it for the two smallest stream prefixes, then releases capacity aggressively. |
mild_30_to_08 |
heuristic schedule-search ablation | 0.30 -> 0.24 -> 0.18 -> 0.12 -> 0.08 |
Manually specified after exploratory single-seed OpenWebText10K schedule search. It tests whether a smoother decay from 0.30 to a moderate final dropout is competitive. |
fitted_l16_static_law |
older fitted/static-law schedule | 0.60 -> 0.40 -> 0.30 -> 0.14 -> 0.02 |
Retained as a comparison to the earlier overly aggressive fitted schedule; it is not the current interaction formula schedule. |
static_dropout_* |
static baseline | constant | Fixed dropout used at every stream prefix. |
The two heuristic schedules should be treated as ablations, not as independent
evidence that the coefficient formula generated their exact paths. Their role is
to show that the shape of the decay matters and that reasonable hand-designed
decays can also beat weak static choices. The main formula claim for this
regime should be based on openwebtext10k_interaction.
Condition Ranking By Final Loss
| Condition | Kind | N | Mean trajectory val | Std trajectory val | Mean final val | Std final val | Mean final gap | Dropout path |
|---|---|---|---|---|---|---|---|---|
openwebtext10k_interaction |
anchor_decay |
5 | 4.8609 | 0.0046 | 4.3981 | 0.0095 | 0.3177 | 0.39 -> 0.32 -> 0.23 -> 0.14 -> 0.07 |
hold_30_then_decay |
anchor_decay |
5 | 4.8512 | 0.0017 | 4.4052 | 0.0112 | 0.3565 | 0.30 -> 0.30 -> 0.20 -> 0.10 -> 0.02 |
mild_30_to_08 |
anchor_decay |
5 | 4.8509 | 0.0015 | 4.4073 | 0.0085 | 0.3337 | 0.30 -> 0.24 -> 0.18 -> 0.12 -> 0.08 |
fitted_l16_static_law |
anchor_decay |
5 | 4.9521 | 0.0039 | 4.4124 | 0.0084 | 0.3137 | 0.60 -> 0.40 -> 0.30 -> 0.14 -> 0.02 |
static_dropout_0.14 |
static |
5 | 4.9051 | 0.0088 | 4.4455 | 0.0120 | 0.3289 | 0.14 -> 0.14 -> 0.14 -> 0.14 -> 0.14 |
static_dropout_0.3 |
static |
5 | 4.8767 | 0.0019 | 4.4668 | 0.0141 | 0.2349 | 0.30 -> 0.30 -> 0.30 -> 0.30 -> 0.30 |
static_dropout_0.02 |
static |
5 | 5.1571 | 0.0097 | 4.5358 | 0.0091 | 0.4829 | 0.02 -> 0.02 -> 0.02 -> 0.02 -> 0.02 |
static_dropout_0 |
static |
5 | 5.2511 | 0.0160 | 4.5943 | 0.0216 | 0.5529 | 0.00 -> 0.00 -> 0.00 -> 0.00 -> 0.00 |
Paired Final-Loss Deltas
Negative delta_vs_best_static means the condition beat the best static
baseline for that seed.
| Seed | Condition | Final val | Best static | Best static final val | Delta vs best static |
|---|---|---|---|---|---|
| 1 | openwebtext10k_interaction |
4.4023 | static_dropout_0.14 |
4.4418 | -0.0394 |
| 1 | hold_30_then_decay |
4.3939 | static_dropout_0.14 |
4.4418 | -0.0479 |
| 1 | mild_30_to_08 |
4.3995 | static_dropout_0.14 |
4.4418 | -0.0423 |
| 1 | fitted_l16_static_law |
4.4207 | static_dropout_0.14 |
4.4418 | -0.0211 |
| 1 | static_dropout_0.14 |
4.4418 | static_dropout_0.14 |
4.4418 | +0.0000 |
| 1 | static_dropout_0.3 |
4.4602 | static_dropout_0.14 |
4.4418 | +0.0184 |
| 1 | static_dropout_0.02 |
4.5402 | static_dropout_0.14 |
4.4418 | +0.0984 |
| 1 | static_dropout_0 |
4.5704 | static_dropout_0.14 |
4.4418 | +0.1286 |
| 2 | openwebtext10k_interaction |
4.4020 | static_dropout_0.14 |
4.4602 | -0.0583 |
| 2 | hold_30_then_decay |
4.4068 | static_dropout_0.14 |
4.4602 | -0.0534 |
| 2 | mild_30_to_08 |
4.4080 | static_dropout_0.14 |
4.4602 | -0.0522 |
| 2 | fitted_l16_static_law |
4.4136 | static_dropout_0.14 |
4.4602 | -0.0466 |
| 2 | static_dropout_0.14 |
4.4602 | static_dropout_0.14 |
4.4602 | +0.0000 |
| 2 | static_dropout_0.3 |
4.4719 | static_dropout_0.14 |
4.4602 | +0.0117 |
| 2 | static_dropout_0.02 |
4.5466 | static_dropout_0.14 |
4.4602 | +0.0864 |
| 2 | static_dropout_0 |
4.6094 | static_dropout_0.14 |
4.4602 | +0.1492 |
| 3 | openwebtext10k_interaction |
4.4029 | static_dropout_0.14 |
4.4356 | -0.0328 |
| 3 | hold_30_then_decay |
4.4174 | static_dropout_0.14 |
4.4356 | -0.0183 |
| 3 | mild_30_to_08 |
4.4151 | static_dropout_0.14 |
4.4356 | -0.0206 |
| 3 | fitted_l16_static_law |
4.4134 | static_dropout_0.14 |
4.4356 | -0.0223 |
| 3 | static_dropout_0.14 |
4.4356 | static_dropout_0.14 |
4.4356 | +0.0000 |
| 3 | static_dropout_0.3 |
4.4758 | static_dropout_0.14 |
4.4356 | +0.0401 |
| 3 | static_dropout_0.02 |
4.5345 | static_dropout_0.14 |
4.4356 | +0.0988 |
| 3 | static_dropout_0 |
4.5928 | static_dropout_0.14 |
4.4356 | +0.1571 |
| 4 | openwebtext10k_interaction |
4.3811 | static_dropout_0.14 |
4.4337 | -0.0526 |
| 4 | hold_30_then_decay |
4.3936 | static_dropout_0.14 |
4.4337 | -0.0400 |
| 4 | mild_30_to_08 |
4.3978 | static_dropout_0.14 |
4.4337 | -0.0359 |
| 4 | fitted_l16_static_law |
4.3983 | static_dropout_0.14 |
4.4337 | -0.0354 |
| 4 | static_dropout_0.14 |
4.4337 | static_dropout_0.14 |
4.4337 | +0.0000 |
| 4 | static_dropout_0.3 |
4.4455 | static_dropout_0.14 |
4.4337 | +0.0118 |
| 4 | static_dropout_0.02 |
4.5220 | static_dropout_0.14 |
4.4337 | +0.0883 |
| 4 | static_dropout_0 |
4.5768 | static_dropout_0.14 |
4.4337 | +0.1432 |
| 5 | openwebtext10k_interaction |
4.4024 | static_dropout_0.14 |
4.4560 | -0.0536 |
| 5 | hold_30_then_decay |
4.4145 | static_dropout_0.14 |
4.4560 | -0.0415 |
| 5 | mild_30_to_08 |
4.4161 | static_dropout_0.14 |
4.4560 | -0.0399 |
| 5 | fitted_l16_static_law |
4.4161 | static_dropout_0.14 |
4.4560 | -0.0399 |
| 5 | static_dropout_0.14 |
4.4560 | static_dropout_0.14 |
4.4560 | +0.0000 |
| 5 | static_dropout_0.3 |
4.4805 | static_dropout_0.14 |
4.4560 | +0.0245 |
| 5 | static_dropout_0.02 |
4.5355 | static_dropout_0.14 |
4.4560 | +0.0796 |
| 5 | static_dropout_0 |
4.6219 | static_dropout_0.14 |
4.4560 | +0.1660 |
Stage Trajectory
| Stage | Prefix tokens | Condition | Dropout | N | Mean val | Std val | Mean train | Mean gap |
|---|---|---|---|---|---|---|---|---|
| 0 | 250,000 | mild_30_to_08 |
0.300 | 5 | 5.4483 | 0.0138 | 4.4429 | 1.0054 |
| 0 | 250,000 | hold_30_then_decay |
0.300 | 5 | 5.4483 | 0.0138 | 4.4429 | 1.0054 |
| 0 | 250,000 | static_dropout_0.3 |
0.300 | 5 | 5.4483 | 0.0138 | 4.4429 | 1.0054 |
| 0 | 250,000 | static_dropout_0.14 |
0.140 | 5 | 5.4773 | 0.0224 | 4.0298 | 1.4475 |
| 0 | 250,000 | openwebtext10k_interaction |
0.385 | 5 | 5.4947 | 0.0109 | 4.6016 | 0.8930 |
| 0 | 250,000 | static_dropout_0.02 |
0.020 | 5 | 5.7426 | 0.0242 | 3.5371 | 2.2055 |
| 0 | 250,000 | fitted_l16_static_law |
0.600 | 5 | 5.7842 | 0.0096 | 5.1640 | 0.6202 |
| 0 | 250,000 | static_dropout_0 |
0.000 | 5 | 5.8330 | 0.0198 | 3.4443 | 2.3887 |
| 1 | 500,000 | mild_30_to_08 |
0.240 | 5 | 5.0582 | 0.0159 | 4.0349 | 1.0233 |
| 1 | 500,000 | static_dropout_0.3 |
0.300 | 5 | 5.0667 | 0.0173 | 4.1383 | 0.9284 |
| 1 | 500,000 | hold_30_then_decay |
0.300 | 5 | 5.0667 | 0.0173 | 4.1383 | 0.9284 |
| 1 | 500,000 | openwebtext10k_interaction |
0.319 | 5 | 5.0715 | 0.0118 | 4.2065 | 0.8650 |
| 1 | 500,000 | static_dropout_0.14 |
0.140 | 5 | 5.1492 | 0.0070 | 3.7143 | 1.4349 |
| 1 | 500,000 | fitted_l16_static_law |
0.400 | 5 | 5.1507 | 0.0102 | 4.4632 | 0.6875 |
| 1 | 500,000 | static_dropout_0.02 |
0.020 | 5 | 5.5754 | 0.0248 | 3.1246 | 2.4508 |
| 1 | 500,000 | static_dropout_0 |
0.000 | 5 | 5.7175 | 0.0502 | 2.9583 | 2.7592 |
| 2 | 1,000,000 | hold_30_then_decay |
0.200 | 5 | 4.7757 | 0.0144 | 4.0378 | 0.7379 |
| 2 | 1,000,000 | mild_30_to_08 |
0.180 | 5 | 4.7774 | 0.0138 | 3.9886 | 0.7888 |
| 2 | 1,000,000 | openwebtext10k_interaction |
0.227 | 5 | 4.7811 | 0.0084 | 4.0826 | 0.6984 |
| 2 | 1,000,000 | static_dropout_0.3 |
0.300 | 5 | 4.7983 | 0.0144 | 4.1501 | 0.6481 |
| 2 | 1,000,000 | fitted_l16_static_law |
0.300 | 5 | 4.8326 | 0.0102 | 4.2632 | 0.5694 |
| 2 | 1,000,000 | static_dropout_0.14 |
0.140 | 5 | 4.8490 | 0.0202 | 3.8712 | 0.9779 |
| 2 | 1,000,000 | static_dropout_0.02 |
0.020 | 5 | 5.1470 | 0.0222 | 3.4615 | 1.6854 |
| 2 | 1,000,000 | static_dropout_0 |
0.000 | 5 | 5.2637 | 0.0274 | 3.3260 | 1.9377 |
| 3 | 2,000,000 | openwebtext10k_interaction |
0.139 | 5 | 4.5590 | 0.0142 | 4.0802 | 0.4788 |
| 3 | 2,000,000 | hold_30_then_decay |
0.100 | 5 | 4.5599 | 0.0161 | 4.0445 | 0.5154 |
| 3 | 2,000,000 | mild_30_to_08 |
0.120 | 5 | 4.5631 | 0.0155 | 4.0441 | 0.5190 |
| 3 | 2,000,000 | fitted_l16_static_law |
0.140 | 5 | 4.5806 | 0.0153 | 4.1471 | 0.4334 |
| 3 | 2,000,000 | static_dropout_0.3 |
0.300 | 5 | 4.6035 | 0.0141 | 4.2150 | 0.3885 |
| 3 | 2,000,000 | static_dropout_0.14 |
0.140 | 5 | 4.6048 | 0.0136 | 4.0399 | 0.5648 |
| 3 | 2,000,000 | static_dropout_0.02 |
0.020 | 5 | 4.7847 | 0.0196 | 3.8405 | 0.9442 |
| 3 | 2,000,000 | static_dropout_0 |
0.000 | 5 | 4.8472 | 0.0171 | 3.7786 | 1.0687 |
| 4 | 4,000,000 | openwebtext10k_interaction |
0.066 | 5 | 4.3981 | 0.0095 | 4.0805 | 0.3177 |
| 4 | 4,000,000 | hold_30_then_decay |
0.020 | 5 | 4.4052 | 0.0112 | 4.0488 | 0.3565 |
| 4 | 4,000,000 | mild_30_to_08 |
0.080 | 5 | 4.4073 | 0.0085 | 4.0736 | 0.3337 |
| 4 | 4,000,000 | fitted_l16_static_law |
0.020 | 5 | 4.4124 | 0.0084 | 4.0987 | 0.3137 |
| 4 | 4,000,000 | static_dropout_0.14 |
0.140 | 5 | 4.4455 | 0.0120 | 4.1165 | 0.3289 |
| 4 | 4,000,000 | static_dropout_0.3 |
0.300 | 5 | 4.4668 | 0.0141 | 4.2319 | 0.2349 |
| 4 | 4,000,000 | static_dropout_0.02 |
0.020 | 5 | 4.5358 | 0.0091 | 4.0529 | 0.4829 |
| 4 | 4,000,000 | static_dropout_0 |
0.000 | 5 | 4.5943 | 0.0216 | 4.0414 | 0.5529 |
Interpretation
openwebtext10k_interactionhas the best 5-seed mean final validation loss: 4.3981 +/- 0.0095.- The second-best final condition is
hold_30_then_decayat 4.4052 +/- 0.0112. - The best static baseline by mean final loss is
static_dropout_0.14at 4.4455 +/- 0.0120. openwebtext10k_interactionbeats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0328.hold_30_then_decaybeats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0183.mild_30_to_08beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0206.fitted_l16_static_lawbeats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0211.- The best first-stage condition is
mild_30_to_08at prefix 250,000 with mean validation loss 5.4483; compare this with the final ranking before claiming a schedule is uniformly better. - This is a saved-run streaming validation artifact. Treat it as strong evidence only when the tested conditions, seeds, static baselines, and stream protocol match the claim being made.