# TinyStories Multi-Seed Streaming Validation Date: 2026-05-30 This report combines 5 random seeds (1, 2, 3, 4, 5) from saved streaming runs. No additional training is performed by this script; it reads saved `metrics.jsonl` files. Regime: TinyStories BPE streaming validation with L12_H8_D320, 17,367,040 parameters, four prefixes from 500k to 4M tokens, and 2,000 optimizer steps per stage. ## Sources - `runs/streaming_tinystories_interaction_schedule_l12/locked_stream/20260530-053831/metrics.jsonl` - `runs/streaming_tinystories_multiseed_validation_l12/locked_stream/20260530-111523/metrics.jsonl` - `runs/streaming_tinystories_multiseed_validation_l12/locked_stream/20260530-141335/metrics.jsonl` ## Condition Ranking By Final Loss | Condition | Kind | N | Mean trajectory val | Std trajectory val | Mean final val | Std final val | Mean final gap | Dropout path | |---|---|---:|---:|---:|---:|---:|---:|---| | `interaction` | `anchor_decay` | 5 | 2.8309 | 0.0068 | 2.5311 | 0.0213 | 0.2626 | `0.18 -> 0.14 -> 0.08 -> 0.04` | | `smooth_low` | `decay` | 5 | 2.8307 | 0.0069 | 2.5321 | 0.0203 | 0.2607 | `0.16 -> 0.11 -> 0.07 -> 0.05` | | `baseabc` | `anchor_decay` | 5 | 2.8474 | 0.0028 | 2.5357 | 0.0175 | 0.2655 | `0.25 -> 0.19 -> 0.10 -> 0.02` | | `static_dropout_0.08` | `static` | 5 | 2.8434 | 0.0072 | 2.5444 | 0.0211 | 0.2593 | `0.08 -> 0.08 -> 0.08 -> 0.08` | | `static_dropout_0.12` | `static` | 5 | 2.8357 | 0.0061 | 2.5477 | 0.0178 | 0.2269 | `0.12 -> 0.12 -> 0.12 -> 0.12` | | `static_dropout_0.18` | `static` | 5 | 2.8461 | 0.0047 | 2.5644 | 0.0182 | 0.2035 | `0.18 -> 0.18 -> 0.18 -> 0.18` | ## Paired Final-Loss Deltas Negative `delta_vs_best_static` means the condition beat the best static baseline for that seed. | Seed | Condition | Final val | Best static | Best static final val | Delta vs best static | |---:|---|---:|---|---:|---:| | 1 | `interaction` | 2.5414 | `static_dropout_0.08` | 2.5419 | -0.0005 | | 1 | `baseabc` | 2.5397 | `static_dropout_0.08` | 2.5419 | -0.0022 | | 1 | `smooth_low` | 2.5423 | `static_dropout_0.08` | 2.5419 | +0.0003 | | 1 | `static_dropout_0.08` | 2.5419 | `static_dropout_0.08` | 2.5419 | +0.0000 | | 1 | `static_dropout_0.12` | 2.5526 | `static_dropout_0.08` | 2.5419 | +0.0106 | | 1 | `static_dropout_0.18` | 2.5636 | `static_dropout_0.08` | 2.5419 | +0.0217 | | 2 | `interaction` | 2.5377 | `static_dropout_0.12` | 2.5588 | -0.0211 | | 2 | `baseabc` | 2.5432 | `static_dropout_0.12` | 2.5588 | -0.0156 | | 2 | `smooth_low` | 2.5386 | `static_dropout_0.12` | 2.5588 | -0.0202 | | 2 | `static_dropout_0.08` | 2.5636 | `static_dropout_0.12` | 2.5588 | +0.0048 | | 2 | `static_dropout_0.12` | 2.5588 | `static_dropout_0.12` | 2.5588 | +0.0000 | | 2 | `static_dropout_0.18` | 2.5768 | `static_dropout_0.12` | 2.5588 | +0.0180 | | 3 | `interaction` | 2.5385 | `static_dropout_0.08` | 2.5478 | -0.0092 | | 3 | `baseabc` | 2.5425 | `static_dropout_0.08` | 2.5478 | -0.0052 | | 3 | `smooth_low` | 2.5407 | `static_dropout_0.08` | 2.5478 | -0.0071 | | 3 | `static_dropout_0.08` | 2.5478 | `static_dropout_0.08` | 2.5478 | +0.0000 | | 3 | `static_dropout_0.12` | 2.5510 | `static_dropout_0.08` | 2.5478 | +0.0033 | | 3 | `static_dropout_0.18` | 2.5667 | `static_dropout_0.08` | 2.5478 | +0.0189 | | 4 | `interaction` | 2.4932 | `static_dropout_0.08` | 2.5098 | -0.0166 | | 4 | `baseabc` | 2.5049 | `static_dropout_0.08` | 2.5098 | -0.0049 | | 4 | `smooth_low` | 2.4959 | `static_dropout_0.08` | 2.5098 | -0.0139 | | 4 | `static_dropout_0.08` | 2.5098 | `static_dropout_0.08` | 2.5098 | +0.0000 | | 4 | `static_dropout_0.12` | 2.5166 | `static_dropout_0.08` | 2.5098 | +0.0068 | | 4 | `static_dropout_0.18` | 2.5343 | `static_dropout_0.08` | 2.5098 | +0.0244 | | 5 | `interaction` | 2.5447 | `static_dropout_0.08` | 2.5588 | -0.0141 | | 5 | `baseabc` | 2.5481 | `static_dropout_0.08` | 2.5588 | -0.0107 | | 5 | `smooth_low` | 2.5428 | `static_dropout_0.08` | 2.5588 | -0.0159 | | 5 | `static_dropout_0.08` | 2.5588 | `static_dropout_0.08` | 2.5588 | +0.0000 | | 5 | `static_dropout_0.12` | 2.5595 | `static_dropout_0.08` | 2.5588 | +0.0008 | | 5 | `static_dropout_0.18` | 2.5806 | `static_dropout_0.08` | 2.5588 | +0.0218 | ## Stage Trajectory | Stage | Prefix tokens | Condition | Dropout | N | Mean val | Std val | Mean train | Mean gap | |---:|---:|---|---:|---:|---:|---:|---:|---:| | 0 | 500,000 | `static_dropout_0.12` | 0.120 | 5 | 3.2226 | 0.0143 | 2.6968 | 0.5257 | | 0 | 500,000 | `smooth_low` | 0.162 | 5 | 3.2287 | 0.0122 | 2.7909 | 0.4377 | | 0 | 500,000 | `static_dropout_0.08` | 0.080 | 5 | 3.2304 | 0.0102 | 2.6173 | 0.6131 | | 0 | 500,000 | `interaction` | 0.184 | 5 | 3.2326 | 0.0123 | 2.8108 | 0.4218 | | 0 | 500,000 | `static_dropout_0.18` | 0.180 | 5 | 3.2349 | 0.0151 | 2.8056 | 0.4293 | | 0 | 500,000 | `baseabc` | 0.251 | 5 | 3.2728 | 0.0102 | 2.9139 | 0.3588 | | 1 | 1,000,000 | `interaction` | 0.141 | 5 | 2.8908 | 0.0027 | 2.4842 | 0.4065 | | 1 | 1,000,000 | `smooth_low` | 0.115 | 5 | 2.8912 | 0.0018 | 2.4678 | 0.4234 | | 1 | 1,000,000 | `static_dropout_0.12` | 0.120 | 5 | 2.8930 | 0.0121 | 2.4335 | 0.4595 | | 1 | 1,000,000 | `static_dropout_0.18` | 0.180 | 5 | 2.8990 | 0.0106 | 2.5397 | 0.3593 | | 1 | 1,000,000 | `baseabc` | 0.186 | 5 | 2.9041 | 0.0037 | 2.5659 | 0.3382 | | 1 | 1,000,000 | `static_dropout_0.08` | 0.080 | 5 | 2.9132 | 0.0068 | 2.3531 | 0.5601 | | 2 | 2,000,000 | `interaction` | 0.084 | 5 | 2.6690 | 0.0207 | 2.3392 | 0.3298 | | 2 | 2,000,000 | `smooth_low` | 0.067 | 5 | 2.6708 | 0.0218 | 2.3360 | 0.3347 | | 2 | 2,000,000 | `baseabc` | 0.105 | 5 | 2.6770 | 0.0186 | 2.3938 | 0.2833 | | 2 | 2,000,000 | `static_dropout_0.12` | 0.120 | 5 | 2.6795 | 0.0163 | 2.3697 | 0.3098 | | 2 | 2,000,000 | `static_dropout_0.08` | 0.080 | 5 | 2.6856 | 0.0161 | 2.3109 | 0.3747 | | 2 | 2,000,000 | `static_dropout_0.18` | 0.180 | 5 | 2.6860 | 0.0159 | 2.4347 | 0.2513 | | 3 | 4,000,000 | `interaction` | 0.045 | 5 | 2.5311 | 0.0213 | 2.2685 | 0.2626 | | 3 | 4,000,000 | `smooth_low` | 0.045 | 5 | 2.5321 | 0.0203 | 2.2713 | 0.2607 | | 3 | 4,000,000 | `baseabc` | 0.020 | 5 | 2.5357 | 0.0175 | 2.2702 | 0.2655 | | 3 | 4,000,000 | `static_dropout_0.08` | 0.080 | 5 | 2.5444 | 0.0211 | 2.2851 | 0.2593 | | 3 | 4,000,000 | `static_dropout_0.12` | 0.120 | 5 | 2.5477 | 0.0178 | 2.3208 | 0.2269 | | 3 | 4,000,000 | `static_dropout_0.18` | 0.180 | 5 | 2.5644 | 0.0182 | 2.3609 | 0.2035 | ## Interpretation - `interaction` has the best 5-seed mean final validation loss: 2.5311 +/- 0.0213. - The second-best final condition is `smooth_low` at 2.5321 +/- 0.0203. - The best static baseline by mean final loss is `static_dropout_0.08` at 2.5444 +/- 0.0211. - `interaction` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0005. - `smooth_low` beats the per-seed best static baseline in 4/5 seeds; worst paired delta is +0.0003. - `baseabc` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0022. - The best first-stage condition is `static_dropout_0.12` at prefix 500,000 with mean validation loss 3.2226; compare this with the final ranking before claiming a schedule is uniformly better. - This is a saved-run streaming validation artifact. Treat it as strong evidence only when the tested conditions, seeds, static baselines, and stream protocol match the claim being made.