| # TinyStories Multi-Seed Streaming Validation |
|
|
| Date: 2026-05-30 |
|
|
| This report combines 5 random seeds (1, 2, 3, 4, 5) from saved streaming runs. |
| No additional training is performed by this script; it reads saved |
| `metrics.jsonl` files. |
|
|
| Regime: TinyStories BPE streaming validation with L12_H8_D320, 17,367,040 parameters, four prefixes from 500k to 4M tokens, and 2,000 optimizer steps per stage. |
|
|
| ## Sources |
|
|
| - `runs/streaming_tinystories_interaction_schedule_l12/locked_stream/20260530-053831/metrics.jsonl` |
| - `runs/streaming_tinystories_multiseed_validation_l12/locked_stream/20260530-111523/metrics.jsonl` |
| - `runs/streaming_tinystories_multiseed_validation_l12/locked_stream/20260530-141335/metrics.jsonl` |
|
|
| ## Condition Ranking By Final Loss |
|
|
| | Condition | Kind | N | Mean trajectory val | Std trajectory val | Mean final val | Std final val | Mean final gap | Dropout path | |
| |---|---|---:|---:|---:|---:|---:|---:|---| |
| | `interaction` | `anchor_decay` | 5 | 2.8309 | 0.0068 | 2.5311 | 0.0213 | 0.2626 | `0.18 -> 0.14 -> 0.08 -> 0.04` | |
| | `smooth_low` | `decay` | 5 | 2.8307 | 0.0069 | 2.5321 | 0.0203 | 0.2607 | `0.16 -> 0.11 -> 0.07 -> 0.05` | |
| | `baseabc` | `anchor_decay` | 5 | 2.8474 | 0.0028 | 2.5357 | 0.0175 | 0.2655 | `0.25 -> 0.19 -> 0.10 -> 0.02` | |
| | `static_dropout_0.08` | `static` | 5 | 2.8434 | 0.0072 | 2.5444 | 0.0211 | 0.2593 | `0.08 -> 0.08 -> 0.08 -> 0.08` | |
| | `static_dropout_0.12` | `static` | 5 | 2.8357 | 0.0061 | 2.5477 | 0.0178 | 0.2269 | `0.12 -> 0.12 -> 0.12 -> 0.12` | |
| | `static_dropout_0.18` | `static` | 5 | 2.8461 | 0.0047 | 2.5644 | 0.0182 | 0.2035 | `0.18 -> 0.18 -> 0.18 -> 0.18` | |
|
|
| ## Paired Final-Loss Deltas |
|
|
| Negative `delta_vs_best_static` means the condition beat the best static |
| baseline for that seed. |
|
|
| | Seed | Condition | Final val | Best static | Best static final val | Delta vs best static | |
| |---:|---|---:|---|---:|---:| |
| | 1 | `interaction` | 2.5414 | `static_dropout_0.08` | 2.5419 | -0.0005 | |
| | 1 | `baseabc` | 2.5397 | `static_dropout_0.08` | 2.5419 | -0.0022 | |
| | 1 | `smooth_low` | 2.5423 | `static_dropout_0.08` | 2.5419 | +0.0003 | |
| | 1 | `static_dropout_0.08` | 2.5419 | `static_dropout_0.08` | 2.5419 | +0.0000 | |
| | 1 | `static_dropout_0.12` | 2.5526 | `static_dropout_0.08` | 2.5419 | +0.0106 | |
| | 1 | `static_dropout_0.18` | 2.5636 | `static_dropout_0.08` | 2.5419 | +0.0217 | |
| | 2 | `interaction` | 2.5377 | `static_dropout_0.12` | 2.5588 | -0.0211 | |
| | 2 | `baseabc` | 2.5432 | `static_dropout_0.12` | 2.5588 | -0.0156 | |
| | 2 | `smooth_low` | 2.5386 | `static_dropout_0.12` | 2.5588 | -0.0202 | |
| | 2 | `static_dropout_0.08` | 2.5636 | `static_dropout_0.12` | 2.5588 | +0.0048 | |
| | 2 | `static_dropout_0.12` | 2.5588 | `static_dropout_0.12` | 2.5588 | +0.0000 | |
| | 2 | `static_dropout_0.18` | 2.5768 | `static_dropout_0.12` | 2.5588 | +0.0180 | |
| | 3 | `interaction` | 2.5385 | `static_dropout_0.08` | 2.5478 | -0.0092 | |
| | 3 | `baseabc` | 2.5425 | `static_dropout_0.08` | 2.5478 | -0.0052 | |
| | 3 | `smooth_low` | 2.5407 | `static_dropout_0.08` | 2.5478 | -0.0071 | |
| | 3 | `static_dropout_0.08` | 2.5478 | `static_dropout_0.08` | 2.5478 | +0.0000 | |
| | 3 | `static_dropout_0.12` | 2.5510 | `static_dropout_0.08` | 2.5478 | +0.0033 | |
| | 3 | `static_dropout_0.18` | 2.5667 | `static_dropout_0.08` | 2.5478 | +0.0189 | |
| | 4 | `interaction` | 2.4932 | `static_dropout_0.08` | 2.5098 | -0.0166 | |
| | 4 | `baseabc` | 2.5049 | `static_dropout_0.08` | 2.5098 | -0.0049 | |
| | 4 | `smooth_low` | 2.4959 | `static_dropout_0.08` | 2.5098 | -0.0139 | |
| | 4 | `static_dropout_0.08` | 2.5098 | `static_dropout_0.08` | 2.5098 | +0.0000 | |
| | 4 | `static_dropout_0.12` | 2.5166 | `static_dropout_0.08` | 2.5098 | +0.0068 | |
| | 4 | `static_dropout_0.18` | 2.5343 | `static_dropout_0.08` | 2.5098 | +0.0244 | |
| | 5 | `interaction` | 2.5447 | `static_dropout_0.08` | 2.5588 | -0.0141 | |
| | 5 | `baseabc` | 2.5481 | `static_dropout_0.08` | 2.5588 | -0.0107 | |
| | 5 | `smooth_low` | 2.5428 | `static_dropout_0.08` | 2.5588 | -0.0159 | |
| | 5 | `static_dropout_0.08` | 2.5588 | `static_dropout_0.08` | 2.5588 | +0.0000 | |
| | 5 | `static_dropout_0.12` | 2.5595 | `static_dropout_0.08` | 2.5588 | +0.0008 | |
| | 5 | `static_dropout_0.18` | 2.5806 | `static_dropout_0.08` | 2.5588 | +0.0218 | |
|
|
| ## Stage Trajectory |
|
|
| | Stage | Prefix tokens | Condition | Dropout | N | Mean val | Std val | Mean train | Mean gap | |
| |---:|---:|---|---:|---:|---:|---:|---:|---:| |
| | 0 | 500,000 | `static_dropout_0.12` | 0.120 | 5 | 3.2226 | 0.0143 | 2.6968 | 0.5257 | |
| | 0 | 500,000 | `smooth_low` | 0.162 | 5 | 3.2287 | 0.0122 | 2.7909 | 0.4377 | |
| | 0 | 500,000 | `static_dropout_0.08` | 0.080 | 5 | 3.2304 | 0.0102 | 2.6173 | 0.6131 | |
| | 0 | 500,000 | `interaction` | 0.184 | 5 | 3.2326 | 0.0123 | 2.8108 | 0.4218 | |
| | 0 | 500,000 | `static_dropout_0.18` | 0.180 | 5 | 3.2349 | 0.0151 | 2.8056 | 0.4293 | |
| | 0 | 500,000 | `baseabc` | 0.251 | 5 | 3.2728 | 0.0102 | 2.9139 | 0.3588 | |
| | 1 | 1,000,000 | `interaction` | 0.141 | 5 | 2.8908 | 0.0027 | 2.4842 | 0.4065 | |
| | 1 | 1,000,000 | `smooth_low` | 0.115 | 5 | 2.8912 | 0.0018 | 2.4678 | 0.4234 | |
| | 1 | 1,000,000 | `static_dropout_0.12` | 0.120 | 5 | 2.8930 | 0.0121 | 2.4335 | 0.4595 | |
| | 1 | 1,000,000 | `static_dropout_0.18` | 0.180 | 5 | 2.8990 | 0.0106 | 2.5397 | 0.3593 | |
| | 1 | 1,000,000 | `baseabc` | 0.186 | 5 | 2.9041 | 0.0037 | 2.5659 | 0.3382 | |
| | 1 | 1,000,000 | `static_dropout_0.08` | 0.080 | 5 | 2.9132 | 0.0068 | 2.3531 | 0.5601 | |
| | 2 | 2,000,000 | `interaction` | 0.084 | 5 | 2.6690 | 0.0207 | 2.3392 | 0.3298 | |
| | 2 | 2,000,000 | `smooth_low` | 0.067 | 5 | 2.6708 | 0.0218 | 2.3360 | 0.3347 | |
| | 2 | 2,000,000 | `baseabc` | 0.105 | 5 | 2.6770 | 0.0186 | 2.3938 | 0.2833 | |
| | 2 | 2,000,000 | `static_dropout_0.12` | 0.120 | 5 | 2.6795 | 0.0163 | 2.3697 | 0.3098 | |
| | 2 | 2,000,000 | `static_dropout_0.08` | 0.080 | 5 | 2.6856 | 0.0161 | 2.3109 | 0.3747 | |
| | 2 | 2,000,000 | `static_dropout_0.18` | 0.180 | 5 | 2.6860 | 0.0159 | 2.4347 | 0.2513 | |
| | 3 | 4,000,000 | `interaction` | 0.045 | 5 | 2.5311 | 0.0213 | 2.2685 | 0.2626 | |
| | 3 | 4,000,000 | `smooth_low` | 0.045 | 5 | 2.5321 | 0.0203 | 2.2713 | 0.2607 | |
| | 3 | 4,000,000 | `baseabc` | 0.020 | 5 | 2.5357 | 0.0175 | 2.2702 | 0.2655 | |
| | 3 | 4,000,000 | `static_dropout_0.08` | 0.080 | 5 | 2.5444 | 0.0211 | 2.2851 | 0.2593 | |
| | 3 | 4,000,000 | `static_dropout_0.12` | 0.120 | 5 | 2.5477 | 0.0178 | 2.3208 | 0.2269 | |
| | 3 | 4,000,000 | `static_dropout_0.18` | 0.180 | 5 | 2.5644 | 0.0182 | 2.3609 | 0.2035 | |
|
|
| ## Interpretation |
|
|
| - `interaction` has the best 5-seed mean final validation loss: 2.5311 +/- 0.0213. |
| - The second-best final condition is `smooth_low` at 2.5321 +/- 0.0203. |
| - The best static baseline by mean final loss is `static_dropout_0.08` at 2.5444 +/- 0.0211. |
| - `interaction` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0005. |
| - `smooth_low` beats the per-seed best static baseline in 4/5 seeds; worst paired delta is +0.0003. |
| - `baseabc` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0022. |
| - The best first-stage condition is `static_dropout_0.12` at prefix 500,000 with mean validation loss 3.2226; compare this with the final ranking before claiming a schedule is uniformly better. |
| - This is a saved-run streaming validation artifact. Treat it as strong |
| evidence only when the tested conditions, seeds, static baselines, and |
| stream protocol match the claim being made. |
|
|