# TinyStories Multi-Seed Streaming Validation

Date: 2026-05-30

This report combines 5 random seeds (1, 2, 3, 4, 5) from saved streaming runs.
No additional training is performed by this script; it reads saved
`metrics.jsonl` files.

Regime: TinyStories BPE streaming validation with L12_H8_D320, 17,367,040 parameters, four prefixes from 500k to 4M tokens, and 2,000 optimizer steps per stage.

## Sources

- `runs/streaming_tinystories_interaction_schedule_l12/locked_stream/20260530-053831/metrics.jsonl`
- `runs/streaming_tinystories_multiseed_validation_l12/locked_stream/20260530-111523/metrics.jsonl`
- `runs/streaming_tinystories_multiseed_validation_l12/locked_stream/20260530-141335/metrics.jsonl`

## Condition Ranking By Final Loss

| Condition | Kind | N | Mean trajectory val | Std trajectory val | Mean final val | Std final val | Mean final gap | Dropout path |
|---|---|---:|---:|---:|---:|---:|---:|---|
| `interaction` | `anchor_decay` | 5 | 2.8309 | 0.0068 | 2.5311 | 0.0213 | 0.2626 | `0.18 -> 0.14 -> 0.08 -> 0.04` |
| `smooth_low` | `decay` | 5 | 2.8307 | 0.0069 | 2.5321 | 0.0203 | 0.2607 | `0.16 -> 0.11 -> 0.07 -> 0.05` |
| `baseabc` | `anchor_decay` | 5 | 2.8474 | 0.0028 | 2.5357 | 0.0175 | 0.2655 | `0.25 -> 0.19 -> 0.10 -> 0.02` |
| `static_dropout_0.08` | `static` | 5 | 2.8434 | 0.0072 | 2.5444 | 0.0211 | 0.2593 | `0.08 -> 0.08 -> 0.08 -> 0.08` |
| `static_dropout_0.12` | `static` | 5 | 2.8357 | 0.0061 | 2.5477 | 0.0178 | 0.2269 | `0.12 -> 0.12 -> 0.12 -> 0.12` |
| `static_dropout_0.18` | `static` | 5 | 2.8461 | 0.0047 | 2.5644 | 0.0182 | 0.2035 | `0.18 -> 0.18 -> 0.18 -> 0.18` |

## Paired Final-Loss Deltas

Negative `delta_vs_best_static` means the condition beat the best static
baseline for that seed.

| Seed | Condition | Final val | Best static | Best static final val | Delta vs best static |
|---:|---|---:|---|---:|---:|
| 1 | `interaction` | 2.5414 | `static_dropout_0.08` | 2.5419 | -0.0005 |
| 1 | `baseabc` | 2.5397 | `static_dropout_0.08` | 2.5419 | -0.0022 |
| 1 | `smooth_low` | 2.5423 | `static_dropout_0.08` | 2.5419 | +0.0003 |
| 1 | `static_dropout_0.08` | 2.5419 | `static_dropout_0.08` | 2.5419 | +0.0000 |
| 1 | `static_dropout_0.12` | 2.5526 | `static_dropout_0.08` | 2.5419 | +0.0106 |
| 1 | `static_dropout_0.18` | 2.5636 | `static_dropout_0.08` | 2.5419 | +0.0217 |
| 2 | `interaction` | 2.5377 | `static_dropout_0.12` | 2.5588 | -0.0211 |
| 2 | `baseabc` | 2.5432 | `static_dropout_0.12` | 2.5588 | -0.0156 |
| 2 | `smooth_low` | 2.5386 | `static_dropout_0.12` | 2.5588 | -0.0202 |
| 2 | `static_dropout_0.08` | 2.5636 | `static_dropout_0.12` | 2.5588 | +0.0048 |
| 2 | `static_dropout_0.12` | 2.5588 | `static_dropout_0.12` | 2.5588 | +0.0000 |
| 2 | `static_dropout_0.18` | 2.5768 | `static_dropout_0.12` | 2.5588 | +0.0180 |
| 3 | `interaction` | 2.5385 | `static_dropout_0.08` | 2.5478 | -0.0092 |
| 3 | `baseabc` | 2.5425 | `static_dropout_0.08` | 2.5478 | -0.0052 |
| 3 | `smooth_low` | 2.5407 | `static_dropout_0.08` | 2.5478 | -0.0071 |
| 3 | `static_dropout_0.08` | 2.5478 | `static_dropout_0.08` | 2.5478 | +0.0000 |
| 3 | `static_dropout_0.12` | 2.5510 | `static_dropout_0.08` | 2.5478 | +0.0033 |
| 3 | `static_dropout_0.18` | 2.5667 | `static_dropout_0.08` | 2.5478 | +0.0189 |
| 4 | `interaction` | 2.4932 | `static_dropout_0.08` | 2.5098 | -0.0166 |
| 4 | `baseabc` | 2.5049 | `static_dropout_0.08` | 2.5098 | -0.0049 |
| 4 | `smooth_low` | 2.4959 | `static_dropout_0.08` | 2.5098 | -0.0139 |
| 4 | `static_dropout_0.08` | 2.5098 | `static_dropout_0.08` | 2.5098 | +0.0000 |
| 4 | `static_dropout_0.12` | 2.5166 | `static_dropout_0.08` | 2.5098 | +0.0068 |
| 4 | `static_dropout_0.18` | 2.5343 | `static_dropout_0.08` | 2.5098 | +0.0244 |
| 5 | `interaction` | 2.5447 | `static_dropout_0.08` | 2.5588 | -0.0141 |
| 5 | `baseabc` | 2.5481 | `static_dropout_0.08` | 2.5588 | -0.0107 |
| 5 | `smooth_low` | 2.5428 | `static_dropout_0.08` | 2.5588 | -0.0159 |
| 5 | `static_dropout_0.08` | 2.5588 | `static_dropout_0.08` | 2.5588 | +0.0000 |
| 5 | `static_dropout_0.12` | 2.5595 | `static_dropout_0.08` | 2.5588 | +0.0008 |
| 5 | `static_dropout_0.18` | 2.5806 | `static_dropout_0.08` | 2.5588 | +0.0218 |

## Stage Trajectory

| Stage | Prefix tokens | Condition | Dropout | N | Mean val | Std val | Mean train | Mean gap |
|---:|---:|---|---:|---:|---:|---:|---:|---:|
| 0 | 500,000 | `static_dropout_0.12` | 0.120 | 5 | 3.2226 | 0.0143 | 2.6968 | 0.5257 |
| 0 | 500,000 | `smooth_low` | 0.162 | 5 | 3.2287 | 0.0122 | 2.7909 | 0.4377 |
| 0 | 500,000 | `static_dropout_0.08` | 0.080 | 5 | 3.2304 | 0.0102 | 2.6173 | 0.6131 |
| 0 | 500,000 | `interaction` | 0.184 | 5 | 3.2326 | 0.0123 | 2.8108 | 0.4218 |
| 0 | 500,000 | `static_dropout_0.18` | 0.180 | 5 | 3.2349 | 0.0151 | 2.8056 | 0.4293 |
| 0 | 500,000 | `baseabc` | 0.251 | 5 | 3.2728 | 0.0102 | 2.9139 | 0.3588 |
| 1 | 1,000,000 | `interaction` | 0.141 | 5 | 2.8908 | 0.0027 | 2.4842 | 0.4065 |
| 1 | 1,000,000 | `smooth_low` | 0.115 | 5 | 2.8912 | 0.0018 | 2.4678 | 0.4234 |
| 1 | 1,000,000 | `static_dropout_0.12` | 0.120 | 5 | 2.8930 | 0.0121 | 2.4335 | 0.4595 |
| 1 | 1,000,000 | `static_dropout_0.18` | 0.180 | 5 | 2.8990 | 0.0106 | 2.5397 | 0.3593 |
| 1 | 1,000,000 | `baseabc` | 0.186 | 5 | 2.9041 | 0.0037 | 2.5659 | 0.3382 |
| 1 | 1,000,000 | `static_dropout_0.08` | 0.080 | 5 | 2.9132 | 0.0068 | 2.3531 | 0.5601 |
| 2 | 2,000,000 | `interaction` | 0.084 | 5 | 2.6690 | 0.0207 | 2.3392 | 0.3298 |
| 2 | 2,000,000 | `smooth_low` | 0.067 | 5 | 2.6708 | 0.0218 | 2.3360 | 0.3347 |
| 2 | 2,000,000 | `baseabc` | 0.105 | 5 | 2.6770 | 0.0186 | 2.3938 | 0.2833 |
| 2 | 2,000,000 | `static_dropout_0.12` | 0.120 | 5 | 2.6795 | 0.0163 | 2.3697 | 0.3098 |
| 2 | 2,000,000 | `static_dropout_0.08` | 0.080 | 5 | 2.6856 | 0.0161 | 2.3109 | 0.3747 |
| 2 | 2,000,000 | `static_dropout_0.18` | 0.180 | 5 | 2.6860 | 0.0159 | 2.4347 | 0.2513 |
| 3 | 4,000,000 | `interaction` | 0.045 | 5 | 2.5311 | 0.0213 | 2.2685 | 0.2626 |
| 3 | 4,000,000 | `smooth_low` | 0.045 | 5 | 2.5321 | 0.0203 | 2.2713 | 0.2607 |
| 3 | 4,000,000 | `baseabc` | 0.020 | 5 | 2.5357 | 0.0175 | 2.2702 | 0.2655 |
| 3 | 4,000,000 | `static_dropout_0.08` | 0.080 | 5 | 2.5444 | 0.0211 | 2.2851 | 0.2593 |
| 3 | 4,000,000 | `static_dropout_0.12` | 0.120 | 5 | 2.5477 | 0.0178 | 2.3208 | 0.2269 |
| 3 | 4,000,000 | `static_dropout_0.18` | 0.180 | 5 | 2.5644 | 0.0182 | 2.3609 | 0.2035 |

## Interpretation

- `interaction` has the best 5-seed mean final validation loss: 2.5311 +/- 0.0213.
- The second-best final condition is `smooth_low` at 2.5321 +/- 0.0203.
- The best static baseline by mean final loss is `static_dropout_0.08` at 2.5444 +/- 0.0211.
- `interaction` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0005.
- `smooth_low` beats the per-seed best static baseline in 4/5 seeds; worst paired delta is +0.0003.
- `baseabc` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0022.
- The best first-stage condition is `static_dropout_0.12` at prefix 500,000 with mean validation loss 3.2226; compare this with the final ranking before claiming a schedule is uniformly better.
- This is a saved-run streaming validation artifact. Treat it as strong
  evidence only when the tested conditions, seeds, static baselines, and
  stream protocol match the claim being made.