dropout-decay / docs /tinystories_streaming_report.md
Mandeep Sidhu
Use absolute regime names for streaming reports
dcae82e

TinyStories Multi-Seed Streaming Validation

Date: 2026-05-30

This report combines 5 random seeds (1, 2, 3, 4, 5) from saved streaming runs. No additional training is performed by this script; it reads saved metrics.jsonl files.

Regime: TinyStories BPE streaming validation with L12_H8_D320, 17,367,040 parameters, four prefixes from 500k to 4M tokens, and 2,000 optimizer steps per stage.

Sources

  • runs/streaming_tinystories_interaction_schedule_l12/locked_stream/20260530-053831/metrics.jsonl
  • runs/streaming_tinystories_multiseed_validation_l12/locked_stream/20260530-111523/metrics.jsonl
  • runs/streaming_tinystories_multiseed_validation_l12/locked_stream/20260530-141335/metrics.jsonl

Condition Ranking By Final Loss

Condition Kind N Mean trajectory val Std trajectory val Mean final val Std final val Mean final gap Dropout path
interaction anchor_decay 5 2.8309 0.0068 2.5311 0.0213 0.2626 0.18 -> 0.14 -> 0.08 -> 0.04
smooth_low decay 5 2.8307 0.0069 2.5321 0.0203 0.2607 0.16 -> 0.11 -> 0.07 -> 0.05
baseabc anchor_decay 5 2.8474 0.0028 2.5357 0.0175 0.2655 0.25 -> 0.19 -> 0.10 -> 0.02
static_dropout_0.08 static 5 2.8434 0.0072 2.5444 0.0211 0.2593 0.08 -> 0.08 -> 0.08 -> 0.08
static_dropout_0.12 static 5 2.8357 0.0061 2.5477 0.0178 0.2269 0.12 -> 0.12 -> 0.12 -> 0.12
static_dropout_0.18 static 5 2.8461 0.0047 2.5644 0.0182 0.2035 0.18 -> 0.18 -> 0.18 -> 0.18

Paired Final-Loss Deltas

Negative delta_vs_best_static means the condition beat the best static baseline for that seed.

Seed Condition Final val Best static Best static final val Delta vs best static
1 interaction 2.5414 static_dropout_0.08 2.5419 -0.0005
1 baseabc 2.5397 static_dropout_0.08 2.5419 -0.0022
1 smooth_low 2.5423 static_dropout_0.08 2.5419 +0.0003
1 static_dropout_0.08 2.5419 static_dropout_0.08 2.5419 +0.0000
1 static_dropout_0.12 2.5526 static_dropout_0.08 2.5419 +0.0106
1 static_dropout_0.18 2.5636 static_dropout_0.08 2.5419 +0.0217
2 interaction 2.5377 static_dropout_0.12 2.5588 -0.0211
2 baseabc 2.5432 static_dropout_0.12 2.5588 -0.0156
2 smooth_low 2.5386 static_dropout_0.12 2.5588 -0.0202
2 static_dropout_0.08 2.5636 static_dropout_0.12 2.5588 +0.0048
2 static_dropout_0.12 2.5588 static_dropout_0.12 2.5588 +0.0000
2 static_dropout_0.18 2.5768 static_dropout_0.12 2.5588 +0.0180
3 interaction 2.5385 static_dropout_0.08 2.5478 -0.0092
3 baseabc 2.5425 static_dropout_0.08 2.5478 -0.0052
3 smooth_low 2.5407 static_dropout_0.08 2.5478 -0.0071
3 static_dropout_0.08 2.5478 static_dropout_0.08 2.5478 +0.0000
3 static_dropout_0.12 2.5510 static_dropout_0.08 2.5478 +0.0033
3 static_dropout_0.18 2.5667 static_dropout_0.08 2.5478 +0.0189
4 interaction 2.4932 static_dropout_0.08 2.5098 -0.0166
4 baseabc 2.5049 static_dropout_0.08 2.5098 -0.0049
4 smooth_low 2.4959 static_dropout_0.08 2.5098 -0.0139
4 static_dropout_0.08 2.5098 static_dropout_0.08 2.5098 +0.0000
4 static_dropout_0.12 2.5166 static_dropout_0.08 2.5098 +0.0068
4 static_dropout_0.18 2.5343 static_dropout_0.08 2.5098 +0.0244
5 interaction 2.5447 static_dropout_0.08 2.5588 -0.0141
5 baseabc 2.5481 static_dropout_0.08 2.5588 -0.0107
5 smooth_low 2.5428 static_dropout_0.08 2.5588 -0.0159
5 static_dropout_0.08 2.5588 static_dropout_0.08 2.5588 +0.0000
5 static_dropout_0.12 2.5595 static_dropout_0.08 2.5588 +0.0008
5 static_dropout_0.18 2.5806 static_dropout_0.08 2.5588 +0.0218

Stage Trajectory

Stage Prefix tokens Condition Dropout N Mean val Std val Mean train Mean gap
0 500,000 static_dropout_0.12 0.120 5 3.2226 0.0143 2.6968 0.5257
0 500,000 smooth_low 0.162 5 3.2287 0.0122 2.7909 0.4377
0 500,000 static_dropout_0.08 0.080 5 3.2304 0.0102 2.6173 0.6131
0 500,000 interaction 0.184 5 3.2326 0.0123 2.8108 0.4218
0 500,000 static_dropout_0.18 0.180 5 3.2349 0.0151 2.8056 0.4293
0 500,000 baseabc 0.251 5 3.2728 0.0102 2.9139 0.3588
1 1,000,000 interaction 0.141 5 2.8908 0.0027 2.4842 0.4065
1 1,000,000 smooth_low 0.115 5 2.8912 0.0018 2.4678 0.4234
1 1,000,000 static_dropout_0.12 0.120 5 2.8930 0.0121 2.4335 0.4595
1 1,000,000 static_dropout_0.18 0.180 5 2.8990 0.0106 2.5397 0.3593
1 1,000,000 baseabc 0.186 5 2.9041 0.0037 2.5659 0.3382
1 1,000,000 static_dropout_0.08 0.080 5 2.9132 0.0068 2.3531 0.5601
2 2,000,000 interaction 0.084 5 2.6690 0.0207 2.3392 0.3298
2 2,000,000 smooth_low 0.067 5 2.6708 0.0218 2.3360 0.3347
2 2,000,000 baseabc 0.105 5 2.6770 0.0186 2.3938 0.2833
2 2,000,000 static_dropout_0.12 0.120 5 2.6795 0.0163 2.3697 0.3098
2 2,000,000 static_dropout_0.08 0.080 5 2.6856 0.0161 2.3109 0.3747
2 2,000,000 static_dropout_0.18 0.180 5 2.6860 0.0159 2.4347 0.2513
3 4,000,000 interaction 0.045 5 2.5311 0.0213 2.2685 0.2626
3 4,000,000 smooth_low 0.045 5 2.5321 0.0203 2.2713 0.2607
3 4,000,000 baseabc 0.020 5 2.5357 0.0175 2.2702 0.2655
3 4,000,000 static_dropout_0.08 0.080 5 2.5444 0.0211 2.2851 0.2593
3 4,000,000 static_dropout_0.12 0.120 5 2.5477 0.0178 2.3208 0.2269
3 4,000,000 static_dropout_0.18 0.180 5 2.5644 0.0182 2.3609 0.2035

Interpretation

  • interaction has the best 5-seed mean final validation loss: 2.5311 +/- 0.0213.
  • The second-best final condition is smooth_low at 2.5321 +/- 0.0203.
  • The best static baseline by mean final loss is static_dropout_0.08 at 2.5444 +/- 0.0211.
  • interaction beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0005.
  • smooth_low beats the per-seed best static baseline in 4/5 seeds; worst paired delta is +0.0003.
  • baseabc beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0022.
  • The best first-stage condition is static_dropout_0.12 at prefix 500,000 with mean validation loss 3.2226; compare this with the final ranking before claiming a schedule is uniformly better.
  • This is a saved-run streaming validation artifact. Treat it as strong evidence only when the tested conditions, seeds, static baselines, and stream protocol match the claim being made.