TinyStories Multi-Seed Streaming Validation
Date: 2026-05-30
This report combines 5 random seeds (1, 2, 3, 4, 5) from saved streaming runs.
No additional training is performed by this script; it reads saved
metrics.jsonl files.
Regime: TinyStories BPE streaming validation with L12_H8_D320, 17,367,040 parameters, four prefixes from 500k to 4M tokens, and 2,000 optimizer steps per stage.
Sources
runs/streaming_tinystories_interaction_schedule_l12/locked_stream/20260530-053831/metrics.jsonlruns/streaming_tinystories_multiseed_validation_l12/locked_stream/20260530-111523/metrics.jsonlruns/streaming_tinystories_multiseed_validation_l12/locked_stream/20260530-141335/metrics.jsonl
Condition Ranking By Final Loss
| Condition | Kind | N | Mean trajectory val | Std trajectory val | Mean final val | Std final val | Mean final gap | Dropout path |
|---|---|---|---|---|---|---|---|---|
interaction |
anchor_decay |
5 | 2.8309 | 0.0068 | 2.5311 | 0.0213 | 0.2626 | 0.18 -> 0.14 -> 0.08 -> 0.04 |
smooth_low |
decay |
5 | 2.8307 | 0.0069 | 2.5321 | 0.0203 | 0.2607 | 0.16 -> 0.11 -> 0.07 -> 0.05 |
baseabc |
anchor_decay |
5 | 2.8474 | 0.0028 | 2.5357 | 0.0175 | 0.2655 | 0.25 -> 0.19 -> 0.10 -> 0.02 |
static_dropout_0.08 |
static |
5 | 2.8434 | 0.0072 | 2.5444 | 0.0211 | 0.2593 | 0.08 -> 0.08 -> 0.08 -> 0.08 |
static_dropout_0.12 |
static |
5 | 2.8357 | 0.0061 | 2.5477 | 0.0178 | 0.2269 | 0.12 -> 0.12 -> 0.12 -> 0.12 |
static_dropout_0.18 |
static |
5 | 2.8461 | 0.0047 | 2.5644 | 0.0182 | 0.2035 | 0.18 -> 0.18 -> 0.18 -> 0.18 |
Paired Final-Loss Deltas
Negative delta_vs_best_static means the condition beat the best static
baseline for that seed.
| Seed | Condition | Final val | Best static | Best static final val | Delta vs best static |
|---|---|---|---|---|---|
| 1 | interaction |
2.5414 | static_dropout_0.08 |
2.5419 | -0.0005 |
| 1 | baseabc |
2.5397 | static_dropout_0.08 |
2.5419 | -0.0022 |
| 1 | smooth_low |
2.5423 | static_dropout_0.08 |
2.5419 | +0.0003 |
| 1 | static_dropout_0.08 |
2.5419 | static_dropout_0.08 |
2.5419 | +0.0000 |
| 1 | static_dropout_0.12 |
2.5526 | static_dropout_0.08 |
2.5419 | +0.0106 |
| 1 | static_dropout_0.18 |
2.5636 | static_dropout_0.08 |
2.5419 | +0.0217 |
| 2 | interaction |
2.5377 | static_dropout_0.12 |
2.5588 | -0.0211 |
| 2 | baseabc |
2.5432 | static_dropout_0.12 |
2.5588 | -0.0156 |
| 2 | smooth_low |
2.5386 | static_dropout_0.12 |
2.5588 | -0.0202 |
| 2 | static_dropout_0.08 |
2.5636 | static_dropout_0.12 |
2.5588 | +0.0048 |
| 2 | static_dropout_0.12 |
2.5588 | static_dropout_0.12 |
2.5588 | +0.0000 |
| 2 | static_dropout_0.18 |
2.5768 | static_dropout_0.12 |
2.5588 | +0.0180 |
| 3 | interaction |
2.5385 | static_dropout_0.08 |
2.5478 | -0.0092 |
| 3 | baseabc |
2.5425 | static_dropout_0.08 |
2.5478 | -0.0052 |
| 3 | smooth_low |
2.5407 | static_dropout_0.08 |
2.5478 | -0.0071 |
| 3 | static_dropout_0.08 |
2.5478 | static_dropout_0.08 |
2.5478 | +0.0000 |
| 3 | static_dropout_0.12 |
2.5510 | static_dropout_0.08 |
2.5478 | +0.0033 |
| 3 | static_dropout_0.18 |
2.5667 | static_dropout_0.08 |
2.5478 | +0.0189 |
| 4 | interaction |
2.4932 | static_dropout_0.08 |
2.5098 | -0.0166 |
| 4 | baseabc |
2.5049 | static_dropout_0.08 |
2.5098 | -0.0049 |
| 4 | smooth_low |
2.4959 | static_dropout_0.08 |
2.5098 | -0.0139 |
| 4 | static_dropout_0.08 |
2.5098 | static_dropout_0.08 |
2.5098 | +0.0000 |
| 4 | static_dropout_0.12 |
2.5166 | static_dropout_0.08 |
2.5098 | +0.0068 |
| 4 | static_dropout_0.18 |
2.5343 | static_dropout_0.08 |
2.5098 | +0.0244 |
| 5 | interaction |
2.5447 | static_dropout_0.08 |
2.5588 | -0.0141 |
| 5 | baseabc |
2.5481 | static_dropout_0.08 |
2.5588 | -0.0107 |
| 5 | smooth_low |
2.5428 | static_dropout_0.08 |
2.5588 | -0.0159 |
| 5 | static_dropout_0.08 |
2.5588 | static_dropout_0.08 |
2.5588 | +0.0000 |
| 5 | static_dropout_0.12 |
2.5595 | static_dropout_0.08 |
2.5588 | +0.0008 |
| 5 | static_dropout_0.18 |
2.5806 | static_dropout_0.08 |
2.5588 | +0.0218 |
Stage Trajectory
| Stage | Prefix tokens | Condition | Dropout | N | Mean val | Std val | Mean train | Mean gap |
|---|---|---|---|---|---|---|---|---|
| 0 | 500,000 | static_dropout_0.12 |
0.120 | 5 | 3.2226 | 0.0143 | 2.6968 | 0.5257 |
| 0 | 500,000 | smooth_low |
0.162 | 5 | 3.2287 | 0.0122 | 2.7909 | 0.4377 |
| 0 | 500,000 | static_dropout_0.08 |
0.080 | 5 | 3.2304 | 0.0102 | 2.6173 | 0.6131 |
| 0 | 500,000 | interaction |
0.184 | 5 | 3.2326 | 0.0123 | 2.8108 | 0.4218 |
| 0 | 500,000 | static_dropout_0.18 |
0.180 | 5 | 3.2349 | 0.0151 | 2.8056 | 0.4293 |
| 0 | 500,000 | baseabc |
0.251 | 5 | 3.2728 | 0.0102 | 2.9139 | 0.3588 |
| 1 | 1,000,000 | interaction |
0.141 | 5 | 2.8908 | 0.0027 | 2.4842 | 0.4065 |
| 1 | 1,000,000 | smooth_low |
0.115 | 5 | 2.8912 | 0.0018 | 2.4678 | 0.4234 |
| 1 | 1,000,000 | static_dropout_0.12 |
0.120 | 5 | 2.8930 | 0.0121 | 2.4335 | 0.4595 |
| 1 | 1,000,000 | static_dropout_0.18 |
0.180 | 5 | 2.8990 | 0.0106 | 2.5397 | 0.3593 |
| 1 | 1,000,000 | baseabc |
0.186 | 5 | 2.9041 | 0.0037 | 2.5659 | 0.3382 |
| 1 | 1,000,000 | static_dropout_0.08 |
0.080 | 5 | 2.9132 | 0.0068 | 2.3531 | 0.5601 |
| 2 | 2,000,000 | interaction |
0.084 | 5 | 2.6690 | 0.0207 | 2.3392 | 0.3298 |
| 2 | 2,000,000 | smooth_low |
0.067 | 5 | 2.6708 | 0.0218 | 2.3360 | 0.3347 |
| 2 | 2,000,000 | baseabc |
0.105 | 5 | 2.6770 | 0.0186 | 2.3938 | 0.2833 |
| 2 | 2,000,000 | static_dropout_0.12 |
0.120 | 5 | 2.6795 | 0.0163 | 2.3697 | 0.3098 |
| 2 | 2,000,000 | static_dropout_0.08 |
0.080 | 5 | 2.6856 | 0.0161 | 2.3109 | 0.3747 |
| 2 | 2,000,000 | static_dropout_0.18 |
0.180 | 5 | 2.6860 | 0.0159 | 2.4347 | 0.2513 |
| 3 | 4,000,000 | interaction |
0.045 | 5 | 2.5311 | 0.0213 | 2.2685 | 0.2626 |
| 3 | 4,000,000 | smooth_low |
0.045 | 5 | 2.5321 | 0.0203 | 2.2713 | 0.2607 |
| 3 | 4,000,000 | baseabc |
0.020 | 5 | 2.5357 | 0.0175 | 2.2702 | 0.2655 |
| 3 | 4,000,000 | static_dropout_0.08 |
0.080 | 5 | 2.5444 | 0.0211 | 2.2851 | 0.2593 |
| 3 | 4,000,000 | static_dropout_0.12 |
0.120 | 5 | 2.5477 | 0.0178 | 2.3208 | 0.2269 |
| 3 | 4,000,000 | static_dropout_0.18 |
0.180 | 5 | 2.5644 | 0.0182 | 2.3609 | 0.2035 |
Interpretation
interactionhas the best 5-seed mean final validation loss: 2.5311 +/- 0.0213.- The second-best final condition is
smooth_lowat 2.5321 +/- 0.0203. - The best static baseline by mean final loss is
static_dropout_0.08at 2.5444 +/- 0.0211. interactionbeats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0005.smooth_lowbeats the per-seed best static baseline in 4/5 seeds; worst paired delta is +0.0003.baseabcbeats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0022.- The best first-stage condition is
static_dropout_0.12at prefix 500,000 with mean validation loss 3.2226; compare this with the final ranking before claiming a schedule is uniformly better. - This is a saved-run streaming validation artifact. Treat it as strong evidence only when the tested conditions, seeds, static baselines, and stream protocol match the claim being made.