# OpenWebText10K Streaming Validation Date: 2026-05-30 This report combines 5 random seeds (1, 2, 3, 4, 5) from saved streaming runs. No additional training is performed by this script; it reads saved `metrics.jsonl` files. Regime: OpenWebText10K cached-corpus streaming setup with L16_H8_D384, 31,457,280 parameters, five prefixes from 250k to 4M tokens, and 1,000 optimizer steps per stage. This is a clean five-seed run including the OpenWebText10K interaction schedule, empirical decay schedules, and static baselines. ## Sources - `runs/openwebtext10k_l16_updated_formula_clean_5seed/locked_stream/20260530-174525/metrics.jsonl` ## Condition Provenance The `anchor_decay` label means the dropout value is chosen from explicit prefix-token anchors. It does not by itself imply that the schedule came from the coefficient formula. | Condition | Provenance | Dropout path | Interpretation | |---|---|---|---| | `openwebtext10k_interaction` | coefficient-derived schedule | `0.39 -> 0.32 -> 0.23 -> 0.14 -> 0.07` | Main OpenWebText10K formula-derived schedule. This is the condition that tests the regime-specific interaction coefficient hypothesis. | | `hold_30_then_decay` | heuristic schedule-search ablation | `0.30 -> 0.30 -> 0.20 -> 0.10 -> 0.02` | Manually specified after exploratory single-seed OpenWebText10K schedule search. It caps the initial dropout at `0.30`, holds it for the two smallest stream prefixes, then releases capacity aggressively. | | `mild_30_to_08` | heuristic schedule-search ablation | `0.30 -> 0.24 -> 0.18 -> 0.12 -> 0.08` | Manually specified after exploratory single-seed OpenWebText10K schedule search. It tests whether a smoother decay from `0.30` to a moderate final dropout is competitive. | | `fitted_l16_static_law` | older fitted/static-law schedule | `0.60 -> 0.40 -> 0.30 -> 0.14 -> 0.02` | Retained as a comparison to the earlier overly aggressive fitted schedule; it is not the current interaction formula schedule. | | `static_dropout_*` | static baseline | constant | Fixed dropout used at every stream prefix. | The two heuristic schedules should be treated as ablations, not as independent evidence that the coefficient formula generated their exact paths. Their role is to show that the shape of the decay matters and that reasonable hand-designed decays can also beat weak static choices. The main formula claim for this regime should be based on `openwebtext10k_interaction`. ## Condition Ranking By Final Loss | Condition | Kind | N | Mean trajectory val | Std trajectory val | Mean final val | Std final val | Mean final gap | Dropout path | |---|---|---:|---:|---:|---:|---:|---:|---| | `openwebtext10k_interaction` | `anchor_decay` | 5 | 4.8609 | 0.0046 | 4.3981 | 0.0095 | 0.3177 | `0.39 -> 0.32 -> 0.23 -> 0.14 -> 0.07` | | `hold_30_then_decay` | `anchor_decay` | 5 | 4.8512 | 0.0017 | 4.4052 | 0.0112 | 0.3565 | `0.30 -> 0.30 -> 0.20 -> 0.10 -> 0.02` | | `mild_30_to_08` | `anchor_decay` | 5 | 4.8509 | 0.0015 | 4.4073 | 0.0085 | 0.3337 | `0.30 -> 0.24 -> 0.18 -> 0.12 -> 0.08` | | `fitted_l16_static_law` | `anchor_decay` | 5 | 4.9521 | 0.0039 | 4.4124 | 0.0084 | 0.3137 | `0.60 -> 0.40 -> 0.30 -> 0.14 -> 0.02` | | `static_dropout_0.14` | `static` | 5 | 4.9051 | 0.0088 | 4.4455 | 0.0120 | 0.3289 | `0.14 -> 0.14 -> 0.14 -> 0.14 -> 0.14` | | `static_dropout_0.3` | `static` | 5 | 4.8767 | 0.0019 | 4.4668 | 0.0141 | 0.2349 | `0.30 -> 0.30 -> 0.30 -> 0.30 -> 0.30` | | `static_dropout_0.02` | `static` | 5 | 5.1571 | 0.0097 | 4.5358 | 0.0091 | 0.4829 | `0.02 -> 0.02 -> 0.02 -> 0.02 -> 0.02` | | `static_dropout_0` | `static` | 5 | 5.2511 | 0.0160 | 4.5943 | 0.0216 | 0.5529 | `0.00 -> 0.00 -> 0.00 -> 0.00 -> 0.00` | ## Paired Final-Loss Deltas Negative `delta_vs_best_static` means the condition beat the best static baseline for that seed. | Seed | Condition | Final val | Best static | Best static final val | Delta vs best static | |---:|---|---:|---|---:|---:| | 1 | `openwebtext10k_interaction` | 4.4023 | `static_dropout_0.14` | 4.4418 | -0.0394 | | 1 | `hold_30_then_decay` | 4.3939 | `static_dropout_0.14` | 4.4418 | -0.0479 | | 1 | `mild_30_to_08` | 4.3995 | `static_dropout_0.14` | 4.4418 | -0.0423 | | 1 | `fitted_l16_static_law` | 4.4207 | `static_dropout_0.14` | 4.4418 | -0.0211 | | 1 | `static_dropout_0.14` | 4.4418 | `static_dropout_0.14` | 4.4418 | +0.0000 | | 1 | `static_dropout_0.3` | 4.4602 | `static_dropout_0.14` | 4.4418 | +0.0184 | | 1 | `static_dropout_0.02` | 4.5402 | `static_dropout_0.14` | 4.4418 | +0.0984 | | 1 | `static_dropout_0` | 4.5704 | `static_dropout_0.14` | 4.4418 | +0.1286 | | 2 | `openwebtext10k_interaction` | 4.4020 | `static_dropout_0.14` | 4.4602 | -0.0583 | | 2 | `hold_30_then_decay` | 4.4068 | `static_dropout_0.14` | 4.4602 | -0.0534 | | 2 | `mild_30_to_08` | 4.4080 | `static_dropout_0.14` | 4.4602 | -0.0522 | | 2 | `fitted_l16_static_law` | 4.4136 | `static_dropout_0.14` | 4.4602 | -0.0466 | | 2 | `static_dropout_0.14` | 4.4602 | `static_dropout_0.14` | 4.4602 | +0.0000 | | 2 | `static_dropout_0.3` | 4.4719 | `static_dropout_0.14` | 4.4602 | +0.0117 | | 2 | `static_dropout_0.02` | 4.5466 | `static_dropout_0.14` | 4.4602 | +0.0864 | | 2 | `static_dropout_0` | 4.6094 | `static_dropout_0.14` | 4.4602 | +0.1492 | | 3 | `openwebtext10k_interaction` | 4.4029 | `static_dropout_0.14` | 4.4356 | -0.0328 | | 3 | `hold_30_then_decay` | 4.4174 | `static_dropout_0.14` | 4.4356 | -0.0183 | | 3 | `mild_30_to_08` | 4.4151 | `static_dropout_0.14` | 4.4356 | -0.0206 | | 3 | `fitted_l16_static_law` | 4.4134 | `static_dropout_0.14` | 4.4356 | -0.0223 | | 3 | `static_dropout_0.14` | 4.4356 | `static_dropout_0.14` | 4.4356 | +0.0000 | | 3 | `static_dropout_0.3` | 4.4758 | `static_dropout_0.14` | 4.4356 | +0.0401 | | 3 | `static_dropout_0.02` | 4.5345 | `static_dropout_0.14` | 4.4356 | +0.0988 | | 3 | `static_dropout_0` | 4.5928 | `static_dropout_0.14` | 4.4356 | +0.1571 | | 4 | `openwebtext10k_interaction` | 4.3811 | `static_dropout_0.14` | 4.4337 | -0.0526 | | 4 | `hold_30_then_decay` | 4.3936 | `static_dropout_0.14` | 4.4337 | -0.0400 | | 4 | `mild_30_to_08` | 4.3978 | `static_dropout_0.14` | 4.4337 | -0.0359 | | 4 | `fitted_l16_static_law` | 4.3983 | `static_dropout_0.14` | 4.4337 | -0.0354 | | 4 | `static_dropout_0.14` | 4.4337 | `static_dropout_0.14` | 4.4337 | +0.0000 | | 4 | `static_dropout_0.3` | 4.4455 | `static_dropout_0.14` | 4.4337 | +0.0118 | | 4 | `static_dropout_0.02` | 4.5220 | `static_dropout_0.14` | 4.4337 | +0.0883 | | 4 | `static_dropout_0` | 4.5768 | `static_dropout_0.14` | 4.4337 | +0.1432 | | 5 | `openwebtext10k_interaction` | 4.4024 | `static_dropout_0.14` | 4.4560 | -0.0536 | | 5 | `hold_30_then_decay` | 4.4145 | `static_dropout_0.14` | 4.4560 | -0.0415 | | 5 | `mild_30_to_08` | 4.4161 | `static_dropout_0.14` | 4.4560 | -0.0399 | | 5 | `fitted_l16_static_law` | 4.4161 | `static_dropout_0.14` | 4.4560 | -0.0399 | | 5 | `static_dropout_0.14` | 4.4560 | `static_dropout_0.14` | 4.4560 | +0.0000 | | 5 | `static_dropout_0.3` | 4.4805 | `static_dropout_0.14` | 4.4560 | +0.0245 | | 5 | `static_dropout_0.02` | 4.5355 | `static_dropout_0.14` | 4.4560 | +0.0796 | | 5 | `static_dropout_0` | 4.6219 | `static_dropout_0.14` | 4.4560 | +0.1660 | ## Stage Trajectory | Stage | Prefix tokens | Condition | Dropout | N | Mean val | Std val | Mean train | Mean gap | |---:|---:|---|---:|---:|---:|---:|---:|---:| | 0 | 250,000 | `mild_30_to_08` | 0.300 | 5 | 5.4483 | 0.0138 | 4.4429 | 1.0054 | | 0 | 250,000 | `hold_30_then_decay` | 0.300 | 5 | 5.4483 | 0.0138 | 4.4429 | 1.0054 | | 0 | 250,000 | `static_dropout_0.3` | 0.300 | 5 | 5.4483 | 0.0138 | 4.4429 | 1.0054 | | 0 | 250,000 | `static_dropout_0.14` | 0.140 | 5 | 5.4773 | 0.0224 | 4.0298 | 1.4475 | | 0 | 250,000 | `openwebtext10k_interaction` | 0.385 | 5 | 5.4947 | 0.0109 | 4.6016 | 0.8930 | | 0 | 250,000 | `static_dropout_0.02` | 0.020 | 5 | 5.7426 | 0.0242 | 3.5371 | 2.2055 | | 0 | 250,000 | `fitted_l16_static_law` | 0.600 | 5 | 5.7842 | 0.0096 | 5.1640 | 0.6202 | | 0 | 250,000 | `static_dropout_0` | 0.000 | 5 | 5.8330 | 0.0198 | 3.4443 | 2.3887 | | 1 | 500,000 | `mild_30_to_08` | 0.240 | 5 | 5.0582 | 0.0159 | 4.0349 | 1.0233 | | 1 | 500,000 | `static_dropout_0.3` | 0.300 | 5 | 5.0667 | 0.0173 | 4.1383 | 0.9284 | | 1 | 500,000 | `hold_30_then_decay` | 0.300 | 5 | 5.0667 | 0.0173 | 4.1383 | 0.9284 | | 1 | 500,000 | `openwebtext10k_interaction` | 0.319 | 5 | 5.0715 | 0.0118 | 4.2065 | 0.8650 | | 1 | 500,000 | `static_dropout_0.14` | 0.140 | 5 | 5.1492 | 0.0070 | 3.7143 | 1.4349 | | 1 | 500,000 | `fitted_l16_static_law` | 0.400 | 5 | 5.1507 | 0.0102 | 4.4632 | 0.6875 | | 1 | 500,000 | `static_dropout_0.02` | 0.020 | 5 | 5.5754 | 0.0248 | 3.1246 | 2.4508 | | 1 | 500,000 | `static_dropout_0` | 0.000 | 5 | 5.7175 | 0.0502 | 2.9583 | 2.7592 | | 2 | 1,000,000 | `hold_30_then_decay` | 0.200 | 5 | 4.7757 | 0.0144 | 4.0378 | 0.7379 | | 2 | 1,000,000 | `mild_30_to_08` | 0.180 | 5 | 4.7774 | 0.0138 | 3.9886 | 0.7888 | | 2 | 1,000,000 | `openwebtext10k_interaction` | 0.227 | 5 | 4.7811 | 0.0084 | 4.0826 | 0.6984 | | 2 | 1,000,000 | `static_dropout_0.3` | 0.300 | 5 | 4.7983 | 0.0144 | 4.1501 | 0.6481 | | 2 | 1,000,000 | `fitted_l16_static_law` | 0.300 | 5 | 4.8326 | 0.0102 | 4.2632 | 0.5694 | | 2 | 1,000,000 | `static_dropout_0.14` | 0.140 | 5 | 4.8490 | 0.0202 | 3.8712 | 0.9779 | | 2 | 1,000,000 | `static_dropout_0.02` | 0.020 | 5 | 5.1470 | 0.0222 | 3.4615 | 1.6854 | | 2 | 1,000,000 | `static_dropout_0` | 0.000 | 5 | 5.2637 | 0.0274 | 3.3260 | 1.9377 | | 3 | 2,000,000 | `openwebtext10k_interaction` | 0.139 | 5 | 4.5590 | 0.0142 | 4.0802 | 0.4788 | | 3 | 2,000,000 | `hold_30_then_decay` | 0.100 | 5 | 4.5599 | 0.0161 | 4.0445 | 0.5154 | | 3 | 2,000,000 | `mild_30_to_08` | 0.120 | 5 | 4.5631 | 0.0155 | 4.0441 | 0.5190 | | 3 | 2,000,000 | `fitted_l16_static_law` | 0.140 | 5 | 4.5806 | 0.0153 | 4.1471 | 0.4334 | | 3 | 2,000,000 | `static_dropout_0.3` | 0.300 | 5 | 4.6035 | 0.0141 | 4.2150 | 0.3885 | | 3 | 2,000,000 | `static_dropout_0.14` | 0.140 | 5 | 4.6048 | 0.0136 | 4.0399 | 0.5648 | | 3 | 2,000,000 | `static_dropout_0.02` | 0.020 | 5 | 4.7847 | 0.0196 | 3.8405 | 0.9442 | | 3 | 2,000,000 | `static_dropout_0` | 0.000 | 5 | 4.8472 | 0.0171 | 3.7786 | 1.0687 | | 4 | 4,000,000 | `openwebtext10k_interaction` | 0.066 | 5 | 4.3981 | 0.0095 | 4.0805 | 0.3177 | | 4 | 4,000,000 | `hold_30_then_decay` | 0.020 | 5 | 4.4052 | 0.0112 | 4.0488 | 0.3565 | | 4 | 4,000,000 | `mild_30_to_08` | 0.080 | 5 | 4.4073 | 0.0085 | 4.0736 | 0.3337 | | 4 | 4,000,000 | `fitted_l16_static_law` | 0.020 | 5 | 4.4124 | 0.0084 | 4.0987 | 0.3137 | | 4 | 4,000,000 | `static_dropout_0.14` | 0.140 | 5 | 4.4455 | 0.0120 | 4.1165 | 0.3289 | | 4 | 4,000,000 | `static_dropout_0.3` | 0.300 | 5 | 4.4668 | 0.0141 | 4.2319 | 0.2349 | | 4 | 4,000,000 | `static_dropout_0.02` | 0.020 | 5 | 4.5358 | 0.0091 | 4.0529 | 0.4829 | | 4 | 4,000,000 | `static_dropout_0` | 0.000 | 5 | 4.5943 | 0.0216 | 4.0414 | 0.5529 | ## Interpretation - `openwebtext10k_interaction` has the best 5-seed mean final validation loss: 4.3981 +/- 0.0095. - The second-best final condition is `hold_30_then_decay` at 4.4052 +/- 0.0112. - The best static baseline by mean final loss is `static_dropout_0.14` at 4.4455 +/- 0.0120. - `openwebtext10k_interaction` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0328. - `hold_30_then_decay` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0183. - `mild_30_to_08` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0206. - `fitted_l16_static_law` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0211. - The best first-stage condition is `mild_30_to_08` at prefix 250,000 with mean validation loss 5.4483; compare this with the final ranking before claiming a schedule is uniformly better. - This is a saved-run streaming validation artifact. Treat it as strong evidence only when the tested conditions, seeds, static baselines, and stream protocol match the claim being made.