Mandeep Sidhu commited on
Commit ·
dcae82e
1
Parent(s): bf705c0
Use absolute regime names for streaming reports
Browse files- docs/formula_coefficient_methodology.md +5 -8
- docs/{previous_regime_streaming_report.md → openwebtext10k_streaming_report.md} +20 -16
- docs/plan.md +21 -21
- docs/{streaming_multiseed_validation_report.md → tinystories_streaming_report.md} +0 -0
- paper/dropout_decay_pressure_law.tex +2 -2
- runs/coefficient_calibration/cross_regime_backtest/cross_regime_transfer.csv +4 -4
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_base → openwebtext10k_main_base}/calibration_cells.csv +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_base → openwebtext10k_main_base}/coefficients.json +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_base → openwebtext10k_main_base}/fit_diagnostics.md +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_base → openwebtext10k_main_base}/next_dropout_suggestions.csv +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_interaction → openwebtext10k_main_interaction}/calibration_cells.csv +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_interaction → openwebtext10k_main_interaction}/coefficients.json +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_interaction → openwebtext10k_main_interaction}/fit_diagnostics.md +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_interaction → openwebtext10k_main_interaction}/next_dropout_suggestions.csv +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_base → openwebtext10k_plus_5m_base}/calibration_cells.csv +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_base → openwebtext10k_plus_5m_base}/coefficients.json +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_base → openwebtext10k_plus_5m_base}/fit_diagnostics.md +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_base → openwebtext10k_plus_5m_base}/next_dropout_suggestions.csv +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_interaction → openwebtext10k_plus_5m_interaction}/calibration_cells.csv +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_interaction → openwebtext10k_plus_5m_interaction}/coefficients.json +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_interaction → openwebtext10k_plus_5m_interaction}/fit_diagnostics.md +0 -0
- runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_interaction → openwebtext10k_plus_5m_interaction}/next_dropout_suggestions.csv +0 -0
- runs/{previous_local_updated_formula_clean_l16 → openwebtext10k_l16_updated_formula_clean_5seed}/locked_stream/20260530-174525/RESULT_SUMMARY.md +7 -7
- runs/{previous_local_updated_formula_clean_l16 → openwebtext10k_l16_updated_formula_clean_5seed}/locked_stream/20260530-174525/config.json +2 -2
- runs/{previous_local_updated_formula_clean_l16 → openwebtext10k_l16_updated_formula_clean_5seed}/locked_stream/20260530-174525/metrics.jsonl +25 -25
- runs/{previous_local_updated_formula_clean_l16 → openwebtext10k_l16_updated_formula_clean_5seed}/locked_stream/20260530-174525/summary.csv +5 -5
- runs/{previous_local_updated_formula_clean_l16 → openwebtext10k_l16_updated_formula_clean_5seed}/locked_stream/20260530-174525/summary.json +5 -5
- runs/{previous_local_updated_formula_clean_l16 → openwebtext10k_l16_updated_formula_clean_5seed}/locked_stream/20260530-174525/trace.jsonl +100 -100
- runs/{previous_local_streaming_report → openwebtext10k_streaming_report}/l16_multiseed_confirm/condition_summary.csv +0 -0
- runs/{previous_local_streaming_report → openwebtext10k_streaming_report}/l16_multiseed_confirm/paired_final_deltas.csv +0 -0
- runs/{previous_local_streaming_report → openwebtext10k_streaming_report}/l16_multiseed_confirm/stage_summary.csv +0 -0
- runs/{previous_local_streaming_report → openwebtext10k_streaming_report}/l16_updated_formula_clean_5seed/condition_summary.csv +1 -1
- runs/{previous_local_streaming_report → openwebtext10k_streaming_report}/l16_updated_formula_clean_5seed/paired_final_deltas.csv +5 -5
- runs/{previous_local_streaming_report → openwebtext10k_streaming_report}/l16_updated_formula_clean_5seed/stage_summary.csv +5 -5
- scripts/summarize_cross_regime_backtest.py +16 -16
docs/formula_coefficient_methodology.md
CHANGED
|
@@ -706,16 +706,13 @@ static-to-streaming correction offline and backtest it on saved results.
|
|
| 706 |
The active proof artifact is:
|
| 707 |
|
| 708 |
```text
|
| 709 |
-
docs/
|
| 710 |
-
runs/streaming_tinystories_multiseed_validation_l12/
|
| 711 |
```
|
| 712 |
|
| 713 |
-
|
| 714 |
-
|
| 715 |
-
|
| 716 |
-
2. keep the same model, stream prefixes, and six conditions;
|
| 717 |
-
3. regenerate the combined streaming report at `n=5`;
|
| 718 |
-
4. use that report as the main TinyStories evidence.
|
| 719 |
|
| 720 |
For any later regime, repeat the same pattern: first use static backtests to
|
| 721 |
choose coefficients, then create a streaming multi-seed validation report as the
|
|
|
|
| 706 |
The active proof artifact is:
|
| 707 |
|
| 708 |
```text
|
| 709 |
+
docs/tinystories_streaming_report.md
|
| 710 |
+
runs/streaming_tinystories_multiseed_validation_l12/combined_5seed_summary/
|
| 711 |
```
|
| 712 |
|
| 713 |
+
TinyStories has already been regenerated at `n=5`. The next paper-grade
|
| 714 |
+
streaming validation target is WikiText-103, after reconciling the TinyStories
|
| 715 |
+
and OpenWebText10K reports.
|
|
|
|
|
|
|
|
|
|
| 716 |
|
| 717 |
For any later regime, repeat the same pattern: first use static backtests to
|
| 718 |
choose coefficients, then create a streaming multi-seed validation report as the
|
docs/{previous_regime_streaming_report.md → openwebtext10k_streaming_report.md}
RENAMED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
Date: 2026-05-30
|
| 4 |
|
|
@@ -6,17 +6,21 @@ This report combines 5 random seeds (1, 2, 3, 4, 5) from saved streaming runs.
|
|
| 6 |
No additional training is performed by this script; it reads saved
|
| 7 |
`metrics.jsonl` files.
|
| 8 |
|
| 9 |
-
Regime:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
## Sources
|
| 12 |
|
| 13 |
-
- `runs/
|
| 14 |
|
| 15 |
## Condition Ranking By Final Loss
|
| 16 |
|
| 17 |
| Condition | Kind | N | Mean trajectory val | Std trajectory val | Mean final val | Std final val | Mean final gap | Dropout path |
|
| 18 |
|---|---|---:|---:|---:|---:|---:|---:|---|
|
| 19 |
-
| `
|
| 20 |
| `hold_30_then_decay` | `anchor_decay` | 5 | 4.8512 | 0.0017 | 4.4052 | 0.0112 | 0.3565 | `0.30 -> 0.30 -> 0.20 -> 0.10 -> 0.02` |
|
| 21 |
| `mild_30_to_08` | `anchor_decay` | 5 | 4.8509 | 0.0015 | 4.4073 | 0.0085 | 0.3337 | `0.30 -> 0.24 -> 0.18 -> 0.12 -> 0.08` |
|
| 22 |
| `fitted_l16_static_law` | `anchor_decay` | 5 | 4.9521 | 0.0039 | 4.4124 | 0.0084 | 0.3137 | `0.60 -> 0.40 -> 0.30 -> 0.14 -> 0.02` |
|
|
@@ -32,7 +36,7 @@ baseline for that seed.
|
|
| 32 |
|
| 33 |
| Seed | Condition | Final val | Best static | Best static final val | Delta vs best static |
|
| 34 |
|---:|---|---:|---|---:|---:|
|
| 35 |
-
| 1 | `
|
| 36 |
| 1 | `hold_30_then_decay` | 4.3939 | `static_dropout_0.14` | 4.4418 | -0.0479 |
|
| 37 |
| 1 | `mild_30_to_08` | 4.3995 | `static_dropout_0.14` | 4.4418 | -0.0423 |
|
| 38 |
| 1 | `fitted_l16_static_law` | 4.4207 | `static_dropout_0.14` | 4.4418 | -0.0211 |
|
|
@@ -40,7 +44,7 @@ baseline for that seed.
|
|
| 40 |
| 1 | `static_dropout_0.3` | 4.4602 | `static_dropout_0.14` | 4.4418 | +0.0184 |
|
| 41 |
| 1 | `static_dropout_0.02` | 4.5402 | `static_dropout_0.14` | 4.4418 | +0.0984 |
|
| 42 |
| 1 | `static_dropout_0` | 4.5704 | `static_dropout_0.14` | 4.4418 | +0.1286 |
|
| 43 |
-
| 2 | `
|
| 44 |
| 2 | `hold_30_then_decay` | 4.4068 | `static_dropout_0.14` | 4.4602 | -0.0534 |
|
| 45 |
| 2 | `mild_30_to_08` | 4.4080 | `static_dropout_0.14` | 4.4602 | -0.0522 |
|
| 46 |
| 2 | `fitted_l16_static_law` | 4.4136 | `static_dropout_0.14` | 4.4602 | -0.0466 |
|
|
@@ -48,7 +52,7 @@ baseline for that seed.
|
|
| 48 |
| 2 | `static_dropout_0.3` | 4.4719 | `static_dropout_0.14` | 4.4602 | +0.0117 |
|
| 49 |
| 2 | `static_dropout_0.02` | 4.5466 | `static_dropout_0.14` | 4.4602 | +0.0864 |
|
| 50 |
| 2 | `static_dropout_0` | 4.6094 | `static_dropout_0.14` | 4.4602 | +0.1492 |
|
| 51 |
-
| 3 | `
|
| 52 |
| 3 | `hold_30_then_decay` | 4.4174 | `static_dropout_0.14` | 4.4356 | -0.0183 |
|
| 53 |
| 3 | `mild_30_to_08` | 4.4151 | `static_dropout_0.14` | 4.4356 | -0.0206 |
|
| 54 |
| 3 | `fitted_l16_static_law` | 4.4134 | `static_dropout_0.14` | 4.4356 | -0.0223 |
|
|
@@ -56,7 +60,7 @@ baseline for that seed.
|
|
| 56 |
| 3 | `static_dropout_0.3` | 4.4758 | `static_dropout_0.14` | 4.4356 | +0.0401 |
|
| 57 |
| 3 | `static_dropout_0.02` | 4.5345 | `static_dropout_0.14` | 4.4356 | +0.0988 |
|
| 58 |
| 3 | `static_dropout_0` | 4.5928 | `static_dropout_0.14` | 4.4356 | +0.1571 |
|
| 59 |
-
| 4 | `
|
| 60 |
| 4 | `hold_30_then_decay` | 4.3936 | `static_dropout_0.14` | 4.4337 | -0.0400 |
|
| 61 |
| 4 | `mild_30_to_08` | 4.3978 | `static_dropout_0.14` | 4.4337 | -0.0359 |
|
| 62 |
| 4 | `fitted_l16_static_law` | 4.3983 | `static_dropout_0.14` | 4.4337 | -0.0354 |
|
|
@@ -64,7 +68,7 @@ baseline for that seed.
|
|
| 64 |
| 4 | `static_dropout_0.3` | 4.4455 | `static_dropout_0.14` | 4.4337 | +0.0118 |
|
| 65 |
| 4 | `static_dropout_0.02` | 4.5220 | `static_dropout_0.14` | 4.4337 | +0.0883 |
|
| 66 |
| 4 | `static_dropout_0` | 4.5768 | `static_dropout_0.14` | 4.4337 | +0.1432 |
|
| 67 |
-
| 5 | `
|
| 68 |
| 5 | `hold_30_then_decay` | 4.4145 | `static_dropout_0.14` | 4.4560 | -0.0415 |
|
| 69 |
| 5 | `mild_30_to_08` | 4.4161 | `static_dropout_0.14` | 4.4560 | -0.0399 |
|
| 70 |
| 5 | `fitted_l16_static_law` | 4.4161 | `static_dropout_0.14` | 4.4560 | -0.0399 |
|
|
@@ -81,27 +85,27 @@ baseline for that seed.
|
|
| 81 |
| 0 | 250,000 | `hold_30_then_decay` | 0.300 | 5 | 5.4483 | 0.0138 | 4.4429 | 1.0054 |
|
| 82 |
| 0 | 250,000 | `static_dropout_0.3` | 0.300 | 5 | 5.4483 | 0.0138 | 4.4429 | 1.0054 |
|
| 83 |
| 0 | 250,000 | `static_dropout_0.14` | 0.140 | 5 | 5.4773 | 0.0224 | 4.0298 | 1.4475 |
|
| 84 |
-
| 0 | 250,000 | `
|
| 85 |
| 0 | 250,000 | `static_dropout_0.02` | 0.020 | 5 | 5.7426 | 0.0242 | 3.5371 | 2.2055 |
|
| 86 |
| 0 | 250,000 | `fitted_l16_static_law` | 0.600 | 5 | 5.7842 | 0.0096 | 5.1640 | 0.6202 |
|
| 87 |
| 0 | 250,000 | `static_dropout_0` | 0.000 | 5 | 5.8330 | 0.0198 | 3.4443 | 2.3887 |
|
| 88 |
| 1 | 500,000 | `mild_30_to_08` | 0.240 | 5 | 5.0582 | 0.0159 | 4.0349 | 1.0233 |
|
| 89 |
| 1 | 500,000 | `static_dropout_0.3` | 0.300 | 5 | 5.0667 | 0.0173 | 4.1383 | 0.9284 |
|
| 90 |
| 1 | 500,000 | `hold_30_then_decay` | 0.300 | 5 | 5.0667 | 0.0173 | 4.1383 | 0.9284 |
|
| 91 |
-
| 1 | 500,000 | `
|
| 92 |
| 1 | 500,000 | `static_dropout_0.14` | 0.140 | 5 | 5.1492 | 0.0070 | 3.7143 | 1.4349 |
|
| 93 |
| 1 | 500,000 | `fitted_l16_static_law` | 0.400 | 5 | 5.1507 | 0.0102 | 4.4632 | 0.6875 |
|
| 94 |
| 1 | 500,000 | `static_dropout_0.02` | 0.020 | 5 | 5.5754 | 0.0248 | 3.1246 | 2.4508 |
|
| 95 |
| 1 | 500,000 | `static_dropout_0` | 0.000 | 5 | 5.7175 | 0.0502 | 2.9583 | 2.7592 |
|
| 96 |
| 2 | 1,000,000 | `hold_30_then_decay` | 0.200 | 5 | 4.7757 | 0.0144 | 4.0378 | 0.7379 |
|
| 97 |
| 2 | 1,000,000 | `mild_30_to_08` | 0.180 | 5 | 4.7774 | 0.0138 | 3.9886 | 0.7888 |
|
| 98 |
-
| 2 | 1,000,000 | `
|
| 99 |
| 2 | 1,000,000 | `static_dropout_0.3` | 0.300 | 5 | 4.7983 | 0.0144 | 4.1501 | 0.6481 |
|
| 100 |
| 2 | 1,000,000 | `fitted_l16_static_law` | 0.300 | 5 | 4.8326 | 0.0102 | 4.2632 | 0.5694 |
|
| 101 |
| 2 | 1,000,000 | `static_dropout_0.14` | 0.140 | 5 | 4.8490 | 0.0202 | 3.8712 | 0.9779 |
|
| 102 |
| 2 | 1,000,000 | `static_dropout_0.02` | 0.020 | 5 | 5.1470 | 0.0222 | 3.4615 | 1.6854 |
|
| 103 |
| 2 | 1,000,000 | `static_dropout_0` | 0.000 | 5 | 5.2637 | 0.0274 | 3.3260 | 1.9377 |
|
| 104 |
-
| 3 | 2,000,000 | `
|
| 105 |
| 3 | 2,000,000 | `hold_30_then_decay` | 0.100 | 5 | 4.5599 | 0.0161 | 4.0445 | 0.5154 |
|
| 106 |
| 3 | 2,000,000 | `mild_30_to_08` | 0.120 | 5 | 4.5631 | 0.0155 | 4.0441 | 0.5190 |
|
| 107 |
| 3 | 2,000,000 | `fitted_l16_static_law` | 0.140 | 5 | 4.5806 | 0.0153 | 4.1471 | 0.4334 |
|
|
@@ -109,7 +113,7 @@ baseline for that seed.
|
|
| 109 |
| 3 | 2,000,000 | `static_dropout_0.14` | 0.140 | 5 | 4.6048 | 0.0136 | 4.0399 | 0.5648 |
|
| 110 |
| 3 | 2,000,000 | `static_dropout_0.02` | 0.020 | 5 | 4.7847 | 0.0196 | 3.8405 | 0.9442 |
|
| 111 |
| 3 | 2,000,000 | `static_dropout_0` | 0.000 | 5 | 4.8472 | 0.0171 | 3.7786 | 1.0687 |
|
| 112 |
-
| 4 | 4,000,000 | `
|
| 113 |
| 4 | 4,000,000 | `hold_30_then_decay` | 0.020 | 5 | 4.4052 | 0.0112 | 4.0488 | 0.3565 |
|
| 114 |
| 4 | 4,000,000 | `mild_30_to_08` | 0.080 | 5 | 4.4073 | 0.0085 | 4.0736 | 0.3337 |
|
| 115 |
| 4 | 4,000,000 | `fitted_l16_static_law` | 0.020 | 5 | 4.4124 | 0.0084 | 4.0987 | 0.3137 |
|
|
@@ -120,10 +124,10 @@ baseline for that seed.
|
|
| 120 |
|
| 121 |
## Interpretation
|
| 122 |
|
| 123 |
-
- `
|
| 124 |
- The second-best final condition is `hold_30_then_decay` at 4.4052 +/- 0.0112.
|
| 125 |
- The best static baseline by mean final loss is `static_dropout_0.14` at 4.4455 +/- 0.0120.
|
| 126 |
-
- `
|
| 127 |
- `hold_30_then_decay` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0183.
|
| 128 |
- `mild_30_to_08` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0206.
|
| 129 |
- `fitted_l16_static_law` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0211.
|
|
|
|
| 1 |
+
# OpenWebText10K Streaming Validation
|
| 2 |
|
| 3 |
Date: 2026-05-30
|
| 4 |
|
|
|
|
| 6 |
No additional training is performed by this script; it reads saved
|
| 7 |
`metrics.jsonl` files.
|
| 8 |
|
| 9 |
+
Regime: OpenWebText10K cached-corpus streaming setup with L16_H8_D384,
|
| 10 |
+
31,457,280 parameters, five prefixes from 250k to 4M tokens, and 1,000
|
| 11 |
+
optimizer steps per stage. This is a clean five-seed run including the
|
| 12 |
+
OpenWebText10K interaction schedule, empirical decay schedules, and static
|
| 13 |
+
baselines.
|
| 14 |
|
| 15 |
## Sources
|
| 16 |
|
| 17 |
+
- `runs/openwebtext10k_l16_updated_formula_clean_5seed/locked_stream/20260530-174525/metrics.jsonl`
|
| 18 |
|
| 19 |
## Condition Ranking By Final Loss
|
| 20 |
|
| 21 |
| Condition | Kind | N | Mean trajectory val | Std trajectory val | Mean final val | Std final val | Mean final gap | Dropout path |
|
| 22 |
|---|---|---:|---:|---:|---:|---:|---:|---|
|
| 23 |
+
| `openwebtext10k_interaction` | `anchor_decay` | 5 | 4.8609 | 0.0046 | 4.3981 | 0.0095 | 0.3177 | `0.39 -> 0.32 -> 0.23 -> 0.14 -> 0.07` |
|
| 24 |
| `hold_30_then_decay` | `anchor_decay` | 5 | 4.8512 | 0.0017 | 4.4052 | 0.0112 | 0.3565 | `0.30 -> 0.30 -> 0.20 -> 0.10 -> 0.02` |
|
| 25 |
| `mild_30_to_08` | `anchor_decay` | 5 | 4.8509 | 0.0015 | 4.4073 | 0.0085 | 0.3337 | `0.30 -> 0.24 -> 0.18 -> 0.12 -> 0.08` |
|
| 26 |
| `fitted_l16_static_law` | `anchor_decay` | 5 | 4.9521 | 0.0039 | 4.4124 | 0.0084 | 0.3137 | `0.60 -> 0.40 -> 0.30 -> 0.14 -> 0.02` |
|
|
|
|
| 36 |
|
| 37 |
| Seed | Condition | Final val | Best static | Best static final val | Delta vs best static |
|
| 38 |
|---:|---|---:|---|---:|---:|
|
| 39 |
+
| 1 | `openwebtext10k_interaction` | 4.4023 | `static_dropout_0.14` | 4.4418 | -0.0394 |
|
| 40 |
| 1 | `hold_30_then_decay` | 4.3939 | `static_dropout_0.14` | 4.4418 | -0.0479 |
|
| 41 |
| 1 | `mild_30_to_08` | 4.3995 | `static_dropout_0.14` | 4.4418 | -0.0423 |
|
| 42 |
| 1 | `fitted_l16_static_law` | 4.4207 | `static_dropout_0.14` | 4.4418 | -0.0211 |
|
|
|
|
| 44 |
| 1 | `static_dropout_0.3` | 4.4602 | `static_dropout_0.14` | 4.4418 | +0.0184 |
|
| 45 |
| 1 | `static_dropout_0.02` | 4.5402 | `static_dropout_0.14` | 4.4418 | +0.0984 |
|
| 46 |
| 1 | `static_dropout_0` | 4.5704 | `static_dropout_0.14` | 4.4418 | +0.1286 |
|
| 47 |
+
| 2 | `openwebtext10k_interaction` | 4.4020 | `static_dropout_0.14` | 4.4602 | -0.0583 |
|
| 48 |
| 2 | `hold_30_then_decay` | 4.4068 | `static_dropout_0.14` | 4.4602 | -0.0534 |
|
| 49 |
| 2 | `mild_30_to_08` | 4.4080 | `static_dropout_0.14` | 4.4602 | -0.0522 |
|
| 50 |
| 2 | `fitted_l16_static_law` | 4.4136 | `static_dropout_0.14` | 4.4602 | -0.0466 |
|
|
|
|
| 52 |
| 2 | `static_dropout_0.3` | 4.4719 | `static_dropout_0.14` | 4.4602 | +0.0117 |
|
| 53 |
| 2 | `static_dropout_0.02` | 4.5466 | `static_dropout_0.14` | 4.4602 | +0.0864 |
|
| 54 |
| 2 | `static_dropout_0` | 4.6094 | `static_dropout_0.14` | 4.4602 | +0.1492 |
|
| 55 |
+
| 3 | `openwebtext10k_interaction` | 4.4029 | `static_dropout_0.14` | 4.4356 | -0.0328 |
|
| 56 |
| 3 | `hold_30_then_decay` | 4.4174 | `static_dropout_0.14` | 4.4356 | -0.0183 |
|
| 57 |
| 3 | `mild_30_to_08` | 4.4151 | `static_dropout_0.14` | 4.4356 | -0.0206 |
|
| 58 |
| 3 | `fitted_l16_static_law` | 4.4134 | `static_dropout_0.14` | 4.4356 | -0.0223 |
|
|
|
|
| 60 |
| 3 | `static_dropout_0.3` | 4.4758 | `static_dropout_0.14` | 4.4356 | +0.0401 |
|
| 61 |
| 3 | `static_dropout_0.02` | 4.5345 | `static_dropout_0.14` | 4.4356 | +0.0988 |
|
| 62 |
| 3 | `static_dropout_0` | 4.5928 | `static_dropout_0.14` | 4.4356 | +0.1571 |
|
| 63 |
+
| 4 | `openwebtext10k_interaction` | 4.3811 | `static_dropout_0.14` | 4.4337 | -0.0526 |
|
| 64 |
| 4 | `hold_30_then_decay` | 4.3936 | `static_dropout_0.14` | 4.4337 | -0.0400 |
|
| 65 |
| 4 | `mild_30_to_08` | 4.3978 | `static_dropout_0.14` | 4.4337 | -0.0359 |
|
| 66 |
| 4 | `fitted_l16_static_law` | 4.3983 | `static_dropout_0.14` | 4.4337 | -0.0354 |
|
|
|
|
| 68 |
| 4 | `static_dropout_0.3` | 4.4455 | `static_dropout_0.14` | 4.4337 | +0.0118 |
|
| 69 |
| 4 | `static_dropout_0.02` | 4.5220 | `static_dropout_0.14` | 4.4337 | +0.0883 |
|
| 70 |
| 4 | `static_dropout_0` | 4.5768 | `static_dropout_0.14` | 4.4337 | +0.1432 |
|
| 71 |
+
| 5 | `openwebtext10k_interaction` | 4.4024 | `static_dropout_0.14` | 4.4560 | -0.0536 |
|
| 72 |
| 5 | `hold_30_then_decay` | 4.4145 | `static_dropout_0.14` | 4.4560 | -0.0415 |
|
| 73 |
| 5 | `mild_30_to_08` | 4.4161 | `static_dropout_0.14` | 4.4560 | -0.0399 |
|
| 74 |
| 5 | `fitted_l16_static_law` | 4.4161 | `static_dropout_0.14` | 4.4560 | -0.0399 |
|
|
|
|
| 85 |
| 0 | 250,000 | `hold_30_then_decay` | 0.300 | 5 | 5.4483 | 0.0138 | 4.4429 | 1.0054 |
|
| 86 |
| 0 | 250,000 | `static_dropout_0.3` | 0.300 | 5 | 5.4483 | 0.0138 | 4.4429 | 1.0054 |
|
| 87 |
| 0 | 250,000 | `static_dropout_0.14` | 0.140 | 5 | 5.4773 | 0.0224 | 4.0298 | 1.4475 |
|
| 88 |
+
| 0 | 250,000 | `openwebtext10k_interaction` | 0.385 | 5 | 5.4947 | 0.0109 | 4.6016 | 0.8930 |
|
| 89 |
| 0 | 250,000 | `static_dropout_0.02` | 0.020 | 5 | 5.7426 | 0.0242 | 3.5371 | 2.2055 |
|
| 90 |
| 0 | 250,000 | `fitted_l16_static_law` | 0.600 | 5 | 5.7842 | 0.0096 | 5.1640 | 0.6202 |
|
| 91 |
| 0 | 250,000 | `static_dropout_0` | 0.000 | 5 | 5.8330 | 0.0198 | 3.4443 | 2.3887 |
|
| 92 |
| 1 | 500,000 | `mild_30_to_08` | 0.240 | 5 | 5.0582 | 0.0159 | 4.0349 | 1.0233 |
|
| 93 |
| 1 | 500,000 | `static_dropout_0.3` | 0.300 | 5 | 5.0667 | 0.0173 | 4.1383 | 0.9284 |
|
| 94 |
| 1 | 500,000 | `hold_30_then_decay` | 0.300 | 5 | 5.0667 | 0.0173 | 4.1383 | 0.9284 |
|
| 95 |
+
| 1 | 500,000 | `openwebtext10k_interaction` | 0.319 | 5 | 5.0715 | 0.0118 | 4.2065 | 0.8650 |
|
| 96 |
| 1 | 500,000 | `static_dropout_0.14` | 0.140 | 5 | 5.1492 | 0.0070 | 3.7143 | 1.4349 |
|
| 97 |
| 1 | 500,000 | `fitted_l16_static_law` | 0.400 | 5 | 5.1507 | 0.0102 | 4.4632 | 0.6875 |
|
| 98 |
| 1 | 500,000 | `static_dropout_0.02` | 0.020 | 5 | 5.5754 | 0.0248 | 3.1246 | 2.4508 |
|
| 99 |
| 1 | 500,000 | `static_dropout_0` | 0.000 | 5 | 5.7175 | 0.0502 | 2.9583 | 2.7592 |
|
| 100 |
| 2 | 1,000,000 | `hold_30_then_decay` | 0.200 | 5 | 4.7757 | 0.0144 | 4.0378 | 0.7379 |
|
| 101 |
| 2 | 1,000,000 | `mild_30_to_08` | 0.180 | 5 | 4.7774 | 0.0138 | 3.9886 | 0.7888 |
|
| 102 |
+
| 2 | 1,000,000 | `openwebtext10k_interaction` | 0.227 | 5 | 4.7811 | 0.0084 | 4.0826 | 0.6984 |
|
| 103 |
| 2 | 1,000,000 | `static_dropout_0.3` | 0.300 | 5 | 4.7983 | 0.0144 | 4.1501 | 0.6481 |
|
| 104 |
| 2 | 1,000,000 | `fitted_l16_static_law` | 0.300 | 5 | 4.8326 | 0.0102 | 4.2632 | 0.5694 |
|
| 105 |
| 2 | 1,000,000 | `static_dropout_0.14` | 0.140 | 5 | 4.8490 | 0.0202 | 3.8712 | 0.9779 |
|
| 106 |
| 2 | 1,000,000 | `static_dropout_0.02` | 0.020 | 5 | 5.1470 | 0.0222 | 3.4615 | 1.6854 |
|
| 107 |
| 2 | 1,000,000 | `static_dropout_0` | 0.000 | 5 | 5.2637 | 0.0274 | 3.3260 | 1.9377 |
|
| 108 |
+
| 3 | 2,000,000 | `openwebtext10k_interaction` | 0.139 | 5 | 4.5590 | 0.0142 | 4.0802 | 0.4788 |
|
| 109 |
| 3 | 2,000,000 | `hold_30_then_decay` | 0.100 | 5 | 4.5599 | 0.0161 | 4.0445 | 0.5154 |
|
| 110 |
| 3 | 2,000,000 | `mild_30_to_08` | 0.120 | 5 | 4.5631 | 0.0155 | 4.0441 | 0.5190 |
|
| 111 |
| 3 | 2,000,000 | `fitted_l16_static_law` | 0.140 | 5 | 4.5806 | 0.0153 | 4.1471 | 0.4334 |
|
|
|
|
| 113 |
| 3 | 2,000,000 | `static_dropout_0.14` | 0.140 | 5 | 4.6048 | 0.0136 | 4.0399 | 0.5648 |
|
| 114 |
| 3 | 2,000,000 | `static_dropout_0.02` | 0.020 | 5 | 4.7847 | 0.0196 | 3.8405 | 0.9442 |
|
| 115 |
| 3 | 2,000,000 | `static_dropout_0` | 0.000 | 5 | 4.8472 | 0.0171 | 3.7786 | 1.0687 |
|
| 116 |
+
| 4 | 4,000,000 | `openwebtext10k_interaction` | 0.066 | 5 | 4.3981 | 0.0095 | 4.0805 | 0.3177 |
|
| 117 |
| 4 | 4,000,000 | `hold_30_then_decay` | 0.020 | 5 | 4.4052 | 0.0112 | 4.0488 | 0.3565 |
|
| 118 |
| 4 | 4,000,000 | `mild_30_to_08` | 0.080 | 5 | 4.4073 | 0.0085 | 4.0736 | 0.3337 |
|
| 119 |
| 4 | 4,000,000 | `fitted_l16_static_law` | 0.020 | 5 | 4.4124 | 0.0084 | 4.0987 | 0.3137 |
|
|
|
|
| 124 |
|
| 125 |
## Interpretation
|
| 126 |
|
| 127 |
+
- `openwebtext10k_interaction` has the best 5-seed mean final validation loss: 4.3981 +/- 0.0095.
|
| 128 |
- The second-best final condition is `hold_30_then_decay` at 4.4052 +/- 0.0112.
|
| 129 |
- The best static baseline by mean final loss is `static_dropout_0.14` at 4.4455 +/- 0.0120.
|
| 130 |
+
- `openwebtext10k_interaction` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0328.
|
| 131 |
- `hold_30_then_decay` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0183.
|
| 132 |
- `mild_30_to_08` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0206.
|
| 133 |
- `fitted_l16_static_law` beats the per-seed best static baseline in 5/5 seeds; worst paired delta is -0.0211.
|
docs/plan.md
CHANGED
|
@@ -281,11 +281,11 @@ Use this order for every regime.
|
|
| 281 |
|
| 282 |
| Regime | Status | Role |
|
| 283 |
|---|---|---|
|
| 284 |
-
|
|
| 285 |
| TinyStories static/coefficient regime | active | main coefficient evidence |
|
| 286 |
| TinyStories streaming regime | 5-seed validation complete | current main streaming evidence; interaction decay beats best static in 5/5 paired final-loss comparisons |
|
| 287 |
-
|
|
| 288 |
-
|
|
| 289 |
|
| 290 |
## Current Formula Status
|
| 291 |
|
|
@@ -320,7 +320,7 @@ p_t = clamp(p_min, p_max,
|
|
| 320 |
```
|
| 321 |
|
| 322 |
Use these only as the current TinyStories-regime coefficients. They are not
|
| 323 |
-
assumed to transfer numerically to the
|
| 324 |
corpus regime. The cross-regime claim we are testing is that the pressure-law
|
| 325 |
structure transfers, while coefficients may be regime-specific.
|
| 326 |
|
|
@@ -328,12 +328,12 @@ structure transfers, while coefficients may be regime-specific.
|
|
| 328 |
|
| 329 |
| Evidence item | Current reading |
|
| 330 |
|---|---|
|
| 331 |
-
|
|
| 332 |
| TinyStories static optima | interaction form fits static dropout optima better than base ABC |
|
| 333 |
| TinyStories held-out prefix | supports pressure dependence on unique tokens |
|
| 334 |
| TinyStories held-out model | supports pressure dependence on model size |
|
| 335 |
| TinyStories streaming, 5 seeds | interaction has best mean final loss; interaction beats best static in 5/5 paired final-loss comparisons |
|
| 336 |
-
|
|
| 337 |
| cross-regime raw coefficient transfer | weaker than within-regime fit; supports regime-specific coefficients rather than universal numeric coefficients |
|
| 338 |
|
| 339 |
Latest TinyStories 5-seed streaming final-loss table:
|
|
@@ -355,7 +355,7 @@ Paired final-loss result:
|
|
| 355 |
| `baseabc` | 5/5 |
|
| 356 |
| `smooth_low` | 4/5, with the one miss only `+0.0003` |
|
| 357 |
|
| 358 |
-
The immediate risk is no longer seed count for TinyStories or
|
| 359 |
The main remaining risk is external validity beyond two tested regimes. The
|
| 360 |
current defensible claim is:
|
| 361 |
|
|
@@ -370,15 +370,15 @@ The stronger claim:
|
|
| 370 |
Formula-derived dropout decay beats the best static dropout.
|
| 371 |
```
|
| 372 |
|
| 373 |
-
is supported at `n=5` in both the TinyStories and
|
| 374 |
setups, with interaction decay beating the per-seed best static baseline in all
|
| 375 |
five seeds in both regimes.
|
| 376 |
|
| 377 |
-
Latest
|
| 378 |
|
| 379 |
| Condition | Mean final 4M validation loss | Std |
|
| 380 |
|---|---:|---:|
|
| 381 |
-
| `
|
| 382 |
| `hold_30_then_decay` | 4.4052 | 0.0112 |
|
| 383 |
| `mild_30_to_08` | 4.4073 | 0.0085 |
|
| 384 |
| `fitted_l16_static_law` | 4.4124 | 0.0084 |
|
|
@@ -391,14 +391,14 @@ Paired final-loss result:
|
|
| 391 |
|
| 392 |
| Decay schedule | Paired wins vs best static |
|
| 393 |
|---|---:|
|
| 394 |
-
| `
|
| 395 |
| `hold_30_then_decay` | 5/5 |
|
| 396 |
| `mild_30_to_08` | 5/5 |
|
| 397 |
| `fitted_l16_static_law` | 5/5 |
|
| 398 |
|
| 399 |
-
The best static baseline in the clean
|
| 400 |
`0.14`. The interaction schedule improves mean final validation loss by about
|
| 401 |
-
`0.0473` and wins every paired seed comparison. This promotes
|
| 402 |
from exploratory support to a second multi-seed streaming validation regime.
|
| 403 |
|
| 404 |
## Completed Static Backtest Gate
|
|
@@ -413,7 +413,7 @@ runs/coefficient_calibration/cross_regime_backtest/
|
|
| 413 |
Main reading:
|
| 414 |
|
| 415 |
```text
|
| 416 |
-
the interaction pressure-law structure is supported in both the
|
| 417 |
regime and the current TinyStories regime, but coefficient values are
|
| 418 |
regime-specific.
|
| 419 |
```
|
|
@@ -423,7 +423,7 @@ streaming multi-seed reports for each regime.
|
|
| 423 |
|
| 424 |
## Immediate Next Action
|
| 425 |
|
| 426 |
-
Reconcile the TinyStories five-seed report and
|
| 427 |
into the paper outline. The seed-count gap is now closed. The next empirical
|
| 428 |
weakness is external validity, so the preferred next experiment is a third
|
| 429 |
held-out regime with minimal coefficient calibration followed by narrowed
|
|
@@ -432,12 +432,12 @@ multi-seed streaming validation.
|
|
| 432 |
## Next Training After Current Gate
|
| 433 |
|
| 434 |
No MPS training should launch until the two completed five-seed streaming
|
| 435 |
-
reports are read together. Since
|
| 436 |
limiting issue, use a third held-out regime for the next validation step:
|
| 437 |
|
| 438 |
```text
|
| 439 |
completed: TinyStories 5-seed streaming report
|
| 440 |
-
completed:
|
| 441 |
next: third held-out regime with minimal calibration
|
| 442 |
avoid: broad new sweep before cross-regime report reconciliation
|
| 443 |
```
|
|
@@ -452,7 +452,7 @@ decay minus best-static delta per seed
|
|
| 452 |
rank consistency across seeds
|
| 453 |
```
|
| 454 |
|
| 455 |
-
Because
|
| 456 |
streaming claim to "supported in two regimes." Do not yet claim universal
|
| 457 |
numeric coefficients. The next claim to test is whether the pressure-law
|
| 458 |
structure and regime-specific fitting procedure reproduce the win in a third
|
|
@@ -461,8 +461,8 @@ held-out regime.
|
|
| 461 |
Latest streaming report:
|
| 462 |
|
| 463 |
```text
|
| 464 |
-
docs/
|
| 465 |
-
docs/
|
| 466 |
runs/streaming_tinystories_multiseed_validation_l12/combined_5seed_summary/
|
| 467 |
-
runs/
|
| 468 |
```
|
|
|
|
| 281 |
|
| 282 |
| Regime | Status | Role |
|
| 283 |
|---|---|---|
|
| 284 |
+
| OpenWebText10K static/coefficient regime | offline backtest complete | retrospective support for interaction pressure law; do not rerun unless necessary |
|
| 285 |
| TinyStories static/coefficient regime | active | main coefficient evidence |
|
| 286 |
| TinyStories streaming regime | 5-seed validation complete | current main streaming evidence; interaction decay beats best static in 5/5 paired final-loss comparisons |
|
| 287 |
+
| OpenWebText10K streaming regime | 5-seed clean validation complete | OpenWebText10K interaction decay beats best static in 5/5 paired final-loss comparisons |
|
| 288 |
+
| WikiText-103 streaming regime | pending | start only after TinyStories and OpenWebText10K streaming reports are reconciled |
|
| 289 |
|
| 290 |
## Current Formula Status
|
| 291 |
|
|
|
|
| 320 |
```
|
| 321 |
|
| 322 |
Use these only as the current TinyStories-regime coefficients. They are not
|
| 323 |
+
assumed to transfer numerically to the OpenWebText10K regime or any future
|
| 324 |
corpus regime. The cross-regime claim we are testing is that the pressure-law
|
| 325 |
structure transfers, while coefficients may be regime-specific.
|
| 326 |
|
|
|
|
| 328 |
|
| 329 |
| Evidence item | Current reading |
|
| 330 |
|---|---|
|
| 331 |
+
| OpenWebText10K static/coefficient regime | backtest complete; interaction MAE `0.0148` on OpenWebText10K+5M versus base ABC MAE `0.0389` |
|
| 332 |
| TinyStories static optima | interaction form fits static dropout optima better than base ABC |
|
| 333 |
| TinyStories held-out prefix | supports pressure dependence on unique tokens |
|
| 334 |
| TinyStories held-out model | supports pressure dependence on model size |
|
| 335 |
| TinyStories streaming, 5 seeds | interaction has best mean final loss; interaction beats best static in 5/5 paired final-loss comparisons |
|
| 336 |
+
| OpenWebText10K streaming, 5 seeds | interaction decay has best mean final loss; top decay schedules beat best static in 5/5 paired comparisons |
|
| 337 |
| cross-regime raw coefficient transfer | weaker than within-regime fit; supports regime-specific coefficients rather than universal numeric coefficients |
|
| 338 |
|
| 339 |
Latest TinyStories 5-seed streaming final-loss table:
|
|
|
|
| 355 |
| `baseabc` | 5/5 |
|
| 356 |
| `smooth_low` | 4/5, with the one miss only `+0.0003` |
|
| 357 |
|
| 358 |
+
The immediate risk is no longer seed count for TinyStories or OpenWebText10K.
|
| 359 |
The main remaining risk is external validity beyond two tested regimes. The
|
| 360 |
current defensible claim is:
|
| 361 |
|
|
|
|
| 370 |
Formula-derived dropout decay beats the best static dropout.
|
| 371 |
```
|
| 372 |
|
| 373 |
+
is supported at `n=5` in both the TinyStories and OpenWebText10K streaming
|
| 374 |
setups, with interaction decay beating the per-seed best static baseline in all
|
| 375 |
five seeds in both regimes.
|
| 376 |
|
| 377 |
+
Latest OpenWebText10K 5-seed streaming final-loss table:
|
| 378 |
|
| 379 |
| Condition | Mean final 4M validation loss | Std |
|
| 380 |
|---|---:|---:|
|
| 381 |
+
| `openwebtext10k_interaction` decay | 4.3981 | 0.0095 |
|
| 382 |
| `hold_30_then_decay` | 4.4052 | 0.0112 |
|
| 383 |
| `mild_30_to_08` | 4.4073 | 0.0085 |
|
| 384 |
| `fitted_l16_static_law` | 4.4124 | 0.0084 |
|
|
|
|
| 391 |
|
| 392 |
| Decay schedule | Paired wins vs best static |
|
| 393 |
|---|---:|
|
| 394 |
+
| `openwebtext10k_interaction` | 5/5 |
|
| 395 |
| `hold_30_then_decay` | 5/5 |
|
| 396 |
| `mild_30_to_08` | 5/5 |
|
| 397 |
| `fitted_l16_static_law` | 5/5 |
|
| 398 |
|
| 399 |
+
The best static baseline in the clean OpenWebText10K run is static dropout
|
| 400 |
`0.14`. The interaction schedule improves mean final validation loss by about
|
| 401 |
+
`0.0473` and wins every paired seed comparison. This promotes OpenWebText10K
|
| 402 |
from exploratory support to a second multi-seed streaming validation regime.
|
| 403 |
|
| 404 |
## Completed Static Backtest Gate
|
|
|
|
| 413 |
Main reading:
|
| 414 |
|
| 415 |
```text
|
| 416 |
+
the interaction pressure-law structure is supported in both the OpenWebText10K
|
| 417 |
regime and the current TinyStories regime, but coefficient values are
|
| 418 |
regime-specific.
|
| 419 |
```
|
|
|
|
| 423 |
|
| 424 |
## Immediate Next Action
|
| 425 |
|
| 426 |
+
Reconcile the TinyStories five-seed report and OpenWebText10K five-seed report
|
| 427 |
into the paper outline. The seed-count gap is now closed. The next empirical
|
| 428 |
weakness is external validity, so the preferred next experiment is a third
|
| 429 |
held-out regime with minimal coefficient calibration followed by narrowed
|
|
|
|
| 432 |
## Next Training After Current Gate
|
| 433 |
|
| 434 |
No MPS training should launch until the two completed five-seed streaming
|
| 435 |
+
reports are read together. Since OpenWebText10K seed count is no longer the
|
| 436 |
limiting issue, use a third held-out regime for the next validation step:
|
| 437 |
|
| 438 |
```text
|
| 439 |
completed: TinyStories 5-seed streaming report
|
| 440 |
+
completed: OpenWebText10K 5-seed clean streaming report
|
| 441 |
next: third held-out regime with minimal calibration
|
| 442 |
avoid: broad new sweep before cross-regime report reconciliation
|
| 443 |
```
|
|
|
|
| 452 |
rank consistency across seeds
|
| 453 |
```
|
| 454 |
|
| 455 |
+
Because OpenWebText10K decay wins across paired seeds, promote the cross-regime
|
| 456 |
streaming claim to "supported in two regimes." Do not yet claim universal
|
| 457 |
numeric coefficients. The next claim to test is whether the pressure-law
|
| 458 |
structure and regime-specific fitting procedure reproduce the win in a third
|
|
|
|
| 461 |
Latest streaming report:
|
| 462 |
|
| 463 |
```text
|
| 464 |
+
docs/tinystories_streaming_report.md
|
| 465 |
+
docs/openwebtext10k_streaming_report.md
|
| 466 |
runs/streaming_tinystories_multiseed_validation_l12/combined_5seed_summary/
|
| 467 |
+
runs/openwebtext10k_streaming_report/l16_updated_formula_clean_5seed/
|
| 468 |
```
|
docs/{streaming_multiseed_validation_report.md → tinystories_streaming_report.md}
RENAMED
|
File without changes
|
paper/dropout_decay_pressure_law.tex
CHANGED
|
@@ -242,7 +242,7 @@ falls as unique-token count grows and rises with model size at fixed prefix.
|
|
| 242 |
|
| 243 |
\begin{table}[h]
|
| 244 |
\centering
|
| 245 |
-
\caption{Static dropout optima in the
|
| 246 |
\label{tab:static_screen}
|
| 247 |
\resizebox{\textwidth}{!}{%
|
| 248 |
\begin{tabular}{lrrrrrr}
|
|
@@ -417,7 +417,7 @@ identity can shift the optimum independently of the two pressure ratios.
|
|
| 417 |
|
| 418 |
The evidence supports a pressure-law structure, not a universal coefficient
|
| 419 |
triplet. The static fit shows that dropout optima can be approximated from
|
| 420 |
-
$\log_{10}(\params/U)$ and $\log_{10}(C/U)$ within the
|
| 421 |
streaming experiments show that a schedule based on those variables can beat
|
| 422 |
fixed dropout across model sizes and architecture-shape holdouts.
|
| 423 |
|
|
|
|
| 242 |
|
| 243 |
\begin{table}[h]
|
| 244 |
\centering
|
| 245 |
+
\caption{Static dropout optima in the OpenWebText10K cached-corpus regime.}
|
| 246 |
\label{tab:static_screen}
|
| 247 |
\resizebox{\textwidth}{!}{%
|
| 248 |
\begin{tabular}{lrrrrrr}
|
|
|
|
| 417 |
|
| 418 |
The evidence supports a pressure-law structure, not a universal coefficient
|
| 419 |
triplet. The static fit shows that dropout optima can be approximated from
|
| 420 |
+
$\log_{10}(\params/U)$ and $\log_{10}(C/U)$ within the OpenWebText10K cached-corpus regime. The
|
| 421 |
streaming experiments show that a schedule based on those variables can beat
|
| 422 |
fixed dropout across model sizes and architecture-shape holdouts.
|
| 423 |
|
runs/coefficient_calibration/cross_regime_backtest/cross_regime_transfer.csv
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
fit,test,feature_set,n,rmse,mae,bias
|
| 2 |
-
|
| 3 |
-
tinystories_all_base,
|
| 4 |
-
|
| 5 |
-
tinystories_all_interaction,
|
| 6 |
pooled_previous_plus_corpus_probes_interaction,tinystories_all_interaction,interaction,16,0.054486145469252165,0.039881828643724325,0.034998333737162594
|
| 7 |
tinystories_all_interaction,pooled_previous_plus_corpus_probes_interaction,interaction,21,0.057481960591220224,0.049286664509871674,-0.00871732248416439
|
|
|
|
| 1 |
fit,test,feature_set,n,rmse,mae,bias
|
| 2 |
+
openwebtext10k_plus_5m_base,tinystories_all_base,base,16,0.07386955299897678,0.06472190875277323,0.03788621409309852
|
| 3 |
+
tinystories_all_base,openwebtext10k_plus_5m_base,base,18,0.06957982081354992,0.054804945072472085,-0.05140145139355316
|
| 4 |
+
openwebtext10k_plus_5m_interaction,tinystories_all_interaction,interaction,16,0.05329059269271824,0.043855722984289955,0.03858432254089361
|
| 5 |
+
tinystories_all_interaction,openwebtext10k_plus_5m_interaction,interaction,18,0.04900067592752971,0.04224224097829212,-0.02542907718141662
|
| 6 |
pooled_previous_plus_corpus_probes_interaction,tinystories_all_interaction,interaction,16,0.054486145469252165,0.039881828643724325,0.034998333737162594
|
| 7 |
tinystories_all_interaction,pooled_previous_plus_corpus_probes_interaction,interaction,21,0.057481960591220224,0.049286664509871674,-0.00871732248416439
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_base → openwebtext10k_main_base}/calibration_cells.csv
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_base → openwebtext10k_main_base}/coefficients.json
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_base → openwebtext10k_main_base}/fit_diagnostics.md
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_base → openwebtext10k_main_base}/next_dropout_suggestions.csv
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_interaction → openwebtext10k_main_interaction}/calibration_cells.csv
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_interaction → openwebtext10k_main_interaction}/coefficients.json
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_interaction → openwebtext10k_main_interaction}/fit_diagnostics.md
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_main_interaction → openwebtext10k_main_interaction}/next_dropout_suggestions.csv
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_base → openwebtext10k_plus_5m_base}/calibration_cells.csv
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_base → openwebtext10k_plus_5m_base}/coefficients.json
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_base → openwebtext10k_plus_5m_base}/fit_diagnostics.md
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_base → openwebtext10k_plus_5m_base}/next_dropout_suggestions.csv
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_interaction → openwebtext10k_plus_5m_interaction}/calibration_cells.csv
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_interaction → openwebtext10k_plus_5m_interaction}/coefficients.json
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_interaction → openwebtext10k_plus_5m_interaction}/fit_diagnostics.md
RENAMED
|
File without changes
|
runs/coefficient_calibration/cross_regime_backtest/{previous_local_plus_5m_interaction → openwebtext10k_plus_5m_interaction}/next_dropout_suggestions.csv
RENAMED
|
File without changes
|
runs/{previous_local_updated_formula_clean_l16 → openwebtext10k_l16_updated_formula_clean_5seed}/locked_stream/20260530-174525/RESULT_SUMMARY.md
RENAMED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# Locked Streaming Dropout Summary
|
| 2 |
|
| 3 |
-
Run directory: `runs/
|
| 4 |
|
| 5 |
Model: `L16_H8_D384` causal Transformer, 31,457,280 parameters, 16 layers, 8 heads, 384 embedding dim.
|
| 6 |
Training per stage: 1,000 steps. Sampled tokens are cumulative in each stage row. Seeds present: 1, 2, 3, 4, 5.
|
|
@@ -11,7 +11,7 @@ Training per stage: 1,000 steps. Sampled tokens are cumulative in each stage row
|
|
| 11 |
|---|---|---:|---:|---:|---:|---|
|
| 12 |
| `mild_30_to_08` | anchor_decay | 0.08 | 4.8509 | 4.4073 | 0.3337 | 0.30 -> 0.24 -> 0.18 -> 0.12 -> 0.08 |
|
| 13 |
| `hold_30_then_decay` | anchor_decay | 0.02 | 4.8512 | 4.4052 | 0.3565 | 0.30 -> 0.30 -> 0.20 -> 0.10 -> 0.02 |
|
| 14 |
-
| `
|
| 15 |
| `static_dropout_0.3` | static | 0.30 | 4.8767 | 4.4668 | 0.2349 | 0.30 -> 0.30 -> 0.30 -> 0.30 -> 0.30 |
|
| 16 |
| `static_dropout_0.14` | static | 0.14 | 4.9051 | 4.4455 | 0.3289 | 0.14 -> 0.14 -> 0.14 -> 0.14 -> 0.14 |
|
| 17 |
| `fitted_l16_static_law` | anchor_decay | 0.02 | 4.9521 | 4.4124 | 0.3137 | 0.60 -> 0.40 -> 0.30 -> 0.14 -> 0.02 |
|
|
@@ -28,7 +28,7 @@ Training per stage: 1,000 steps. Sampled tokens are cumulative in each stage row
|
|
| 28 |
| `hold_30_then_decay` | 0.30 | 5.4483 | 4.4429 | 1.0054 | 5 |
|
| 29 |
| `static_dropout_0.3` | 0.30 | 5.4483 | 4.4429 | 1.0054 | 5 |
|
| 30 |
| `static_dropout_0.14` | 0.14 | 5.4773 | 4.0298 | 1.4475 | 5 |
|
| 31 |
-
| `
|
| 32 |
| `static_dropout_0.02` | 0.02 | 5.7426 | 3.5371 | 2.2055 | 5 |
|
| 33 |
| `fitted_l16_static_law` | 0.60 | 5.7842 | 5.1640 | 0.6202 | 5 |
|
| 34 |
| `static_dropout_0` | 0.00 | 5.8330 | 3.4443 | 2.3887 | 5 |
|
|
@@ -40,7 +40,7 @@ Training per stage: 1,000 steps. Sampled tokens are cumulative in each stage row
|
|
| 40 |
| `mild_30_to_08` | 0.24 | 5.0582 | 4.0349 | 1.0233 | 5 |
|
| 41 |
| `static_dropout_0.3` | 0.30 | 5.0667 | 4.1383 | 0.9284 | 5 |
|
| 42 |
| `hold_30_then_decay` | 0.30 | 5.0667 | 4.1383 | 0.9284 | 5 |
|
| 43 |
-
| `
|
| 44 |
| `static_dropout_0.14` | 0.14 | 5.1492 | 3.7143 | 1.4349 | 5 |
|
| 45 |
| `fitted_l16_static_law` | 0.40 | 5.1507 | 4.4632 | 0.6875 | 5 |
|
| 46 |
| `static_dropout_0.02` | 0.02 | 5.5754 | 3.1246 | 2.4508 | 5 |
|
|
@@ -52,7 +52,7 @@ Training per stage: 1,000 steps. Sampled tokens are cumulative in each stage row
|
|
| 52 |
|---|---:|---:|---:|---:|---:|
|
| 53 |
| `hold_30_then_decay` | 0.20 | 4.7757 | 4.0378 | 0.7379 | 5 |
|
| 54 |
| `mild_30_to_08` | 0.18 | 4.7774 | 3.9886 | 0.7888 | 5 |
|
| 55 |
-
| `
|
| 56 |
| `static_dropout_0.3` | 0.30 | 4.7983 | 4.1501 | 0.6481 | 5 |
|
| 57 |
| `fitted_l16_static_law` | 0.30 | 4.8326 | 4.2632 | 0.5694 | 5 |
|
| 58 |
| `static_dropout_0.14` | 0.14 | 4.8490 | 3.8712 | 0.9779 | 5 |
|
|
@@ -63,7 +63,7 @@ Training per stage: 1,000 steps. Sampled tokens are cumulative in each stage row
|
|
| 63 |
|
| 64 |
| Condition | Dropout | Mean val loss | Mean train loss | Mean gap | N |
|
| 65 |
|---|---:|---:|---:|---:|---:|
|
| 66 |
-
| `
|
| 67 |
| `hold_30_then_decay` | 0.10 | 4.5599 | 4.0445 | 0.5154 | 5 |
|
| 68 |
| `mild_30_to_08` | 0.12 | 4.5631 | 4.0441 | 0.5190 | 5 |
|
| 69 |
| `fitted_l16_static_law` | 0.14 | 4.5806 | 4.1471 | 0.4334 | 5 |
|
|
@@ -76,7 +76,7 @@ Training per stage: 1,000 steps. Sampled tokens are cumulative in each stage row
|
|
| 76 |
|
| 77 |
| Condition | Dropout | Mean val loss | Mean train loss | Mean gap | N |
|
| 78 |
|---|---:|---:|---:|---:|---:|
|
| 79 |
-
| `
|
| 80 |
| `hold_30_then_decay` | 0.02 | 4.4052 | 4.0488 | 0.3565 | 5 |
|
| 81 |
| `mild_30_to_08` | 0.08 | 4.4073 | 4.0736 | 0.3337 | 5 |
|
| 82 |
| `fitted_l16_static_law` | 0.02 | 4.4124 | 4.0987 | 0.3137 | 5 |
|
|
|
|
| 1 |
# Locked Streaming Dropout Summary
|
| 2 |
|
| 3 |
+
Run directory: `runs/openwebtext10k_l16_updated_formula_clean_5seed/locked_stream/20260530-174525`
|
| 4 |
|
| 5 |
Model: `L16_H8_D384` causal Transformer, 31,457,280 parameters, 16 layers, 8 heads, 384 embedding dim.
|
| 6 |
Training per stage: 1,000 steps. Sampled tokens are cumulative in each stage row. Seeds present: 1, 2, 3, 4, 5.
|
|
|
|
| 11 |
|---|---|---:|---:|---:|---:|---|
|
| 12 |
| `mild_30_to_08` | anchor_decay | 0.08 | 4.8509 | 4.4073 | 0.3337 | 0.30 -> 0.24 -> 0.18 -> 0.12 -> 0.08 |
|
| 13 |
| `hold_30_then_decay` | anchor_decay | 0.02 | 4.8512 | 4.4052 | 0.3565 | 0.30 -> 0.30 -> 0.20 -> 0.10 -> 0.02 |
|
| 14 |
+
| `openwebtext10k_interaction` | anchor_decay | 0.07 | 4.8609 | 4.3981 | 0.3177 | 0.39 -> 0.32 -> 0.23 -> 0.14 -> 0.07 |
|
| 15 |
| `static_dropout_0.3` | static | 0.30 | 4.8767 | 4.4668 | 0.2349 | 0.30 -> 0.30 -> 0.30 -> 0.30 -> 0.30 |
|
| 16 |
| `static_dropout_0.14` | static | 0.14 | 4.9051 | 4.4455 | 0.3289 | 0.14 -> 0.14 -> 0.14 -> 0.14 -> 0.14 |
|
| 17 |
| `fitted_l16_static_law` | anchor_decay | 0.02 | 4.9521 | 4.4124 | 0.3137 | 0.60 -> 0.40 -> 0.30 -> 0.14 -> 0.02 |
|
|
|
|
| 28 |
| `hold_30_then_decay` | 0.30 | 5.4483 | 4.4429 | 1.0054 | 5 |
|
| 29 |
| `static_dropout_0.3` | 0.30 | 5.4483 | 4.4429 | 1.0054 | 5 |
|
| 30 |
| `static_dropout_0.14` | 0.14 | 5.4773 | 4.0298 | 1.4475 | 5 |
|
| 31 |
+
| `openwebtext10k_interaction` | 0.39 | 5.4947 | 4.6016 | 0.8930 | 5 |
|
| 32 |
| `static_dropout_0.02` | 0.02 | 5.7426 | 3.5371 | 2.2055 | 5 |
|
| 33 |
| `fitted_l16_static_law` | 0.60 | 5.7842 | 5.1640 | 0.6202 | 5 |
|
| 34 |
| `static_dropout_0` | 0.00 | 5.8330 | 3.4443 | 2.3887 | 5 |
|
|
|
|
| 40 |
| `mild_30_to_08` | 0.24 | 5.0582 | 4.0349 | 1.0233 | 5 |
|
| 41 |
| `static_dropout_0.3` | 0.30 | 5.0667 | 4.1383 | 0.9284 | 5 |
|
| 42 |
| `hold_30_then_decay` | 0.30 | 5.0667 | 4.1383 | 0.9284 | 5 |
|
| 43 |
+
| `openwebtext10k_interaction` | 0.32 | 5.0715 | 4.2065 | 0.8650 | 5 |
|
| 44 |
| `static_dropout_0.14` | 0.14 | 5.1492 | 3.7143 | 1.4349 | 5 |
|
| 45 |
| `fitted_l16_static_law` | 0.40 | 5.1507 | 4.4632 | 0.6875 | 5 |
|
| 46 |
| `static_dropout_0.02` | 0.02 | 5.5754 | 3.1246 | 2.4508 | 5 |
|
|
|
|
| 52 |
|---|---:|---:|---:|---:|---:|
|
| 53 |
| `hold_30_then_decay` | 0.20 | 4.7757 | 4.0378 | 0.7379 | 5 |
|
| 54 |
| `mild_30_to_08` | 0.18 | 4.7774 | 3.9886 | 0.7888 | 5 |
|
| 55 |
+
| `openwebtext10k_interaction` | 0.23 | 4.7811 | 4.0826 | 0.6984 | 5 |
|
| 56 |
| `static_dropout_0.3` | 0.30 | 4.7983 | 4.1501 | 0.6481 | 5 |
|
| 57 |
| `fitted_l16_static_law` | 0.30 | 4.8326 | 4.2632 | 0.5694 | 5 |
|
| 58 |
| `static_dropout_0.14` | 0.14 | 4.8490 | 3.8712 | 0.9779 | 5 |
|
|
|
|
| 63 |
|
| 64 |
| Condition | Dropout | Mean val loss | Mean train loss | Mean gap | N |
|
| 65 |
|---|---:|---:|---:|---:|---:|
|
| 66 |
+
| `openwebtext10k_interaction` | 0.14 | 4.5590 | 4.0802 | 0.4788 | 5 |
|
| 67 |
| `hold_30_then_decay` | 0.10 | 4.5599 | 4.0445 | 0.5154 | 5 |
|
| 68 |
| `mild_30_to_08` | 0.12 | 4.5631 | 4.0441 | 0.5190 | 5 |
|
| 69 |
| `fitted_l16_static_law` | 0.14 | 4.5806 | 4.1471 | 0.4334 | 5 |
|
|
|
|
| 76 |
|
| 77 |
| Condition | Dropout | Mean val loss | Mean train loss | Mean gap | N |
|
| 78 |
|---|---:|---:|---:|---:|---:|
|
| 79 |
+
| `openwebtext10k_interaction` | 0.07 | 4.3981 | 4.0805 | 0.3177 | 5 |
|
| 80 |
| `hold_30_then_decay` | 0.02 | 4.4052 | 4.0488 | 0.3565 | 5 |
|
| 81 |
| `mild_30_to_08` | 0.08 | 4.4073 | 4.0736 | 0.3337 | 5 |
|
| 82 |
| `fitted_l16_static_law` | 0.02 | 4.4124 | 4.0987 | 0.3137 | 5 |
|
runs/{previous_local_updated_formula_clean_l16 → openwebtext10k_l16_updated_formula_clean_5seed}/locked_stream/20260530-174525/config.json
RENAMED
|
@@ -5,7 +5,7 @@
|
|
| 5 |
"corpus_glob": null,
|
| 6 |
"text_column": "text",
|
| 7 |
"use_cached_data": true,
|
| 8 |
-
"output_dir": "runs/
|
| 9 |
"resume_from": null,
|
| 10 |
"cache_dir": ".cache/dropout_decay",
|
| 11 |
"models": [
|
|
@@ -46,7 +46,7 @@
|
|
| 46 |
"decays": [],
|
| 47 |
"anchor_decays": [
|
| 48 |
{
|
| 49 |
-
"name": "
|
| 50 |
"kind": "anchor_decay",
|
| 51 |
"initial": 0.385,
|
| 52 |
"final": 0.066,
|
|
|
|
| 5 |
"corpus_glob": null,
|
| 6 |
"text_column": "text",
|
| 7 |
"use_cached_data": true,
|
| 8 |
+
"output_dir": "runs/openwebtext10k_l16_updated_formula_clean_5seed",
|
| 9 |
"resume_from": null,
|
| 10 |
"cache_dir": ".cache/dropout_decay",
|
| 11 |
"models": [
|
|
|
|
| 46 |
"decays": [],
|
| 47 |
"anchor_decays": [
|
| 48 |
{
|
| 49 |
+
"name": "openwebtext10k_interaction",
|
| 50 |
"kind": "anchor_decay",
|
| 51 |
"initial": 0.385,
|
| 52 |
"final": 0.066,
|
runs/{previous_local_updated_formula_clean_l16 → openwebtext10k_l16_updated_formula_clean_5seed}/locked_stream/20260530-174525/metrics.jsonl
RENAMED
|
@@ -1,28 +1,28 @@
|
|
| 1 |
-
{"condition": "
|
| 2 |
-
{"condition": "
|
| 3 |
-
{"condition": "
|
| 4 |
-
{"condition": "
|
| 5 |
-
{"condition": "
|
| 6 |
-
{"condition": "
|
| 7 |
-
{"condition": "
|
| 8 |
-
{"condition": "
|
| 9 |
-
{"condition": "
|
| 10 |
-
{"condition": "
|
| 11 |
-
{"condition": "
|
| 12 |
-
{"condition": "
|
| 13 |
-
{"condition": "
|
| 14 |
-
{"condition": "
|
| 15 |
-
{"condition": "
|
| 16 |
-
{"condition": "
|
| 17 |
-
{"condition": "
|
| 18 |
-
{"condition": "
|
| 19 |
-
{"condition": "
|
| 20 |
-
{"condition": "
|
| 21 |
-
{"condition": "
|
| 22 |
-
{"condition": "
|
| 23 |
-
{"condition": "
|
| 24 |
-
{"condition": "
|
| 25 |
-
{"condition": "
|
| 26 |
{"condition": "hold_30_then_decay", "condition_kind": "anchor_decay", "dropout_active_final": 0.3, "dropout_final": 0.02, "dropout_initial": 0.3, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.01670002937317, "eval_loss": 5.46150516718626, "generalization_gap": 0.9761984124779701, "model_config": {"block_size": 128, "dropout": 0.3, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 1, "stage": 0, "steps": 1000, "token_limit": 250000, "tokens_seen": 2048000, "train_eval_loss": 4.48530675470829, "train_loss_last": 4.836114883422852, "val_eval_loss": 5.46150516718626}
|
| 27 |
{"condition": "hold_30_then_decay", "condition_kind": "anchor_decay", "dropout_active_final": 0.3, "dropout_final": 0.02, "dropout_initial": 0.3, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.13048791885376, "eval_loss": 5.0823986530303955, "generalization_gap": 0.9970467239618301, "model_config": {"block_size": 128, "dropout": 0.3, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 1, "stage": 1, "steps": 1000, "token_limit": 500000, "tokens_seen": 4096000, "train_eval_loss": 4.085351929068565, "train_loss_last": 4.226001739501953, "val_eval_loss": 5.0823986530303955}
|
| 28 |
{"condition": "hold_30_then_decay", "condition_kind": "anchor_decay", "dropout_active_final": 0.2, "dropout_final": 0.02, "dropout_initial": 0.3, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.02239680290222, "eval_loss": 4.7529231160879135, "generalization_gap": 0.737479031085968, "model_config": {"block_size": 128, "dropout": 0.3, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 1, "stage": 2, "steps": 1000, "token_limit": 1000000, "tokens_seen": 6144000, "train_eval_loss": 4.0154440850019455, "train_loss_last": 4.443882942199707, "val_eval_loss": 4.7529231160879135}
|
|
|
|
| 1 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.385, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 145.00203609466553, "eval_loss": 5.500026352703571, "generalization_gap": 0.8845655247569084, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 1, "stage": 0, "steps": 1000, "token_limit": 250000, "tokens_seen": 2048000, "train_eval_loss": 4.615460827946663, "train_loss_last": 4.844176292419434, "val_eval_loss": 5.500026352703571}
|
| 2 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.319, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 145.0216257572174, "eval_loss": 5.0825527757406235, "generalization_gap": 0.8991758674383163, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 1, "stage": 1, "steps": 1000, "token_limit": 500000, "tokens_seen": 4096000, "train_eval_loss": 4.183376908302307, "train_loss_last": 4.244839668273926, "val_eval_loss": 5.0825527757406235}
|
| 3 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.227, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 148.04349493980408, "eval_loss": 4.776466831564903, "generalization_gap": 0.664736419916153, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 1, "stage": 2, "steps": 1000, "token_limit": 1000000, "tokens_seen": 6144000, "train_eval_loss": 4.11173041164875, "train_loss_last": 4.434404373168945, "val_eval_loss": 4.776466831564903}
|
| 4 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.139, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 154.4656720161438, "eval_loss": 4.57609198987484, "generalization_gap": 0.491582490503788, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 1, "stage": 3, "steps": 1000, "token_limit": 2000000, "tokens_seen": 8192000, "train_eval_loss": 4.084509499371052, "train_loss_last": 4.123593330383301, "val_eval_loss": 4.57609198987484}
|
| 5 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.066, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 156.33115100860596, "eval_loss": 4.402347795665264, "generalization_gap": 0.3080834597349167, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 1, "stage": 4, "steps": 1000, "token_limit": 4000000, "tokens_seen": 10240000, "train_eval_loss": 4.094264335930347, "train_loss_last": 4.194511413574219, "val_eval_loss": 4.402347795665264}
|
| 6 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.385, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 157.64197206497192, "eval_loss": 5.496512919664383, "generalization_gap": 0.8909523785114288, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 2, "stage": 0, "steps": 1000, "token_limit": 250000, "tokens_seen": 2048000, "train_eval_loss": 4.605560541152954, "train_loss_last": 4.836062908172607, "val_eval_loss": 5.496512919664383}
|
| 7 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.319, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 158.58259201049805, "eval_loss": 5.052774332463741, "generalization_gap": 0.8912238702178001, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 2, "stage": 1, "steps": 1000, "token_limit": 500000, "tokens_seen": 4096000, "train_eval_loss": 4.161550462245941, "train_loss_last": 4.347846984863281, "val_eval_loss": 5.052774332463741}
|
| 8 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.227, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 158.94828081130981, "eval_loss": 4.7864382192492485, "generalization_gap": 0.689209908246994, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 2, "stage": 2, "steps": 1000, "token_limit": 1000000, "tokens_seen": 6144000, "train_eval_loss": 4.0972283110022545, "train_loss_last": 4.249028205871582, "val_eval_loss": 4.7864382192492485}
|
| 9 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.139, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 158.9284291267395, "eval_loss": 4.558441236615181, "generalization_gap": 0.4488407149910927, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 2, "stage": 3, "steps": 1000, "token_limit": 2000000, "tokens_seen": 8192000, "train_eval_loss": 4.109600521624088, "train_loss_last": 4.3183135986328125, "val_eval_loss": 4.558441236615181}
|
| 10 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.066, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.1756501197815, "eval_loss": 4.401971310377121, "generalization_gap": 0.32543984055519104, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 2, "stage": 4, "steps": 1000, "token_limit": 4000000, "tokens_seen": 10240000, "train_eval_loss": 4.07653146982193, "train_loss_last": 4.001583576202393, "val_eval_loss": 4.401971310377121}
|
| 11 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.385, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.56775212287903, "eval_loss": 5.48211994022131, "generalization_gap": 0.911006785929203, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 3, "stage": 0, "steps": 1000, "token_limit": 250000, "tokens_seen": 2048000, "train_eval_loss": 4.571113154292107, "train_loss_last": 4.735123634338379, "val_eval_loss": 5.48211994022131}
|
| 12 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.319, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.47346210479736, "eval_loss": 5.078636907041073, "generalization_gap": 0.8553216382861137, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 3, "stage": 1, "steps": 1000, "token_limit": 500000, "tokens_seen": 4096000, "train_eval_loss": 4.223315268754959, "train_loss_last": 4.428989410400391, "val_eval_loss": 5.078636907041073}
|
| 13 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.227, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.37687611579895, "eval_loss": 4.770952560007572, "generalization_gap": 0.6954368501901627, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 3, "stage": 2, "steps": 1000, "token_limit": 1000000, "tokens_seen": 6144000, "train_eval_loss": 4.0755157098174095, "train_loss_last": 4.148219108581543, "val_eval_loss": 4.770952560007572}
|
| 14 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.139, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.01151704788208, "eval_loss": 4.559795759618282, "generalization_gap": 0.5070498064160347, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 3, "stage": 3, "steps": 1000, "token_limit": 2000000, "tokens_seen": 8192000, "train_eval_loss": 4.052745953202248, "train_loss_last": 4.133602142333984, "val_eval_loss": 4.559795759618282}
|
| 15 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.066, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.0887017250061, "eval_loss": 4.402896843850613, "generalization_gap": 0.3256060555577278, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 3, "stage": 4, "steps": 1000, "token_limit": 4000000, "tokens_seen": 10240000, "train_eval_loss": 4.077290788292885, "train_loss_last": 4.316803932189941, "val_eval_loss": 4.402896843850613}
|
| 16 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.385, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.2399332523346, "eval_loss": 5.4856060445308685, "generalization_gap": 0.9066294282674789, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 4, "stage": 0, "steps": 1000, "token_limit": 250000, "tokens_seen": 2048000, "train_eval_loss": 4.57897661626339, "train_loss_last": 4.885035514831543, "val_eval_loss": 5.4856060445308685}
|
| 17 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.319, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.3847131729126, "eval_loss": 5.0676345229148865, "generalization_gap": 0.8415863737463951, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 4, "stage": 1, "steps": 1000, "token_limit": 500000, "tokens_seen": 4096000, "train_eval_loss": 4.226048149168491, "train_loss_last": 4.507782459259033, "val_eval_loss": 5.0676345229148865}
|
| 18 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.227, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.1736650466919, "eval_loss": 4.7791073098778725, "generalization_gap": 0.7003955617547035, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 4, "stage": 2, "steps": 1000, "token_limit": 1000000, "tokens_seen": 6144000, "train_eval_loss": 4.078711748123169, "train_loss_last": 4.1192522048950195, "val_eval_loss": 4.7791073098778725}
|
| 19 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.139, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.0590569972992, "eval_loss": 4.563728243112564, "generalization_gap": 0.47459470480680466, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 4, "stage": 3, "steps": 1000, "token_limit": 2000000, "tokens_seen": 8192000, "train_eval_loss": 4.089133538305759, "train_loss_last": 4.137112140655518, "val_eval_loss": 4.563728243112564}
|
| 20 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.066, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.01433420181274, "eval_loss": 4.381064593791962, "generalization_gap": 0.31129128485918045, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 4, "stage": 4, "steps": 1000, "token_limit": 4000000, "tokens_seen": 10240000, "train_eval_loss": 4.069773308932781, "train_loss_last": 3.990117311477661, "val_eval_loss": 4.381064593791962}
|
| 21 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.385, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.29387283325195, "eval_loss": 5.509035885334015, "generalization_gap": 0.8720249384641647, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 5, "stage": 0, "steps": 1000, "token_limit": 250000, "tokens_seen": 2048000, "train_eval_loss": 4.63701094686985, "train_loss_last": 4.927996635437012, "val_eval_loss": 5.509035885334015}
|
| 22 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.319, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.22580695152283, "eval_loss": 5.07570381462574, "generalization_gap": 0.8376745954155922, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 5, "stage": 1, "steps": 1000, "token_limit": 500000, "tokens_seen": 4096000, "train_eval_loss": 4.238029219210148, "train_loss_last": 4.416054725646973, "val_eval_loss": 5.07570381462574}
|
| 23 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.227, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.1633379459381, "eval_loss": 4.792380161583424, "generalization_gap": 0.7424318492412567, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 5, "stage": 2, "steps": 1000, "token_limit": 1000000, "tokens_seen": 6144000, "train_eval_loss": 4.049948312342167, "train_loss_last": 4.2396697998046875, "val_eval_loss": 4.792380161583424}
|
| 24 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.139, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.0420618057251, "eval_loss": 4.536897249519825, "generalization_gap": 0.47198397666215897, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 5, "stage": 3, "steps": 1000, "token_limit": 2000000, "tokens_seen": 8192000, "train_eval_loss": 4.064913272857666, "train_loss_last": 4.11806583404541, "val_eval_loss": 4.536897249519825}
|
| 25 |
+
{"condition": "openwebtext10k_interaction", "condition_kind": "anchor_decay", "dropout_active_final": 0.066, "dropout_final": 0.066, "dropout_initial": 0.385, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 158.83358788490295, "eval_loss": 4.402371659874916, "generalization_gap": 0.3178385868668556, "model_config": {"block_size": 128, "dropout": 0.385, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 5, "stage": 4, "steps": 1000, "token_limit": 4000000, "tokens_seen": 10240000, "train_eval_loss": 4.0845330730080605, "train_loss_last": 4.270941734313965, "val_eval_loss": 4.402371659874916}
|
| 26 |
{"condition": "hold_30_then_decay", "condition_kind": "anchor_decay", "dropout_active_final": 0.3, "dropout_final": 0.02, "dropout_initial": 0.3, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.01670002937317, "eval_loss": 5.46150516718626, "generalization_gap": 0.9761984124779701, "model_config": {"block_size": 128, "dropout": 0.3, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 1, "stage": 0, "steps": 1000, "token_limit": 250000, "tokens_seen": 2048000, "train_eval_loss": 4.48530675470829, "train_loss_last": 4.836114883422852, "val_eval_loss": 5.46150516718626}
|
| 27 |
{"condition": "hold_30_then_decay", "condition_kind": "anchor_decay", "dropout_active_final": 0.3, "dropout_final": 0.02, "dropout_initial": 0.3, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.13048791885376, "eval_loss": 5.0823986530303955, "generalization_gap": 0.9970467239618301, "model_config": {"block_size": 128, "dropout": 0.3, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 1, "stage": 1, "steps": 1000, "token_limit": 500000, "tokens_seen": 4096000, "train_eval_loss": 4.085351929068565, "train_loss_last": 4.226001739501953, "val_eval_loss": 5.0823986530303955}
|
| 28 |
{"condition": "hold_30_then_decay", "condition_kind": "anchor_decay", "dropout_active_final": 0.2, "dropout_final": 0.02, "dropout_initial": 0.3, "dropout_schedule": "log_prefix_anchor", "elapsed_sec": 159.02239680290222, "eval_loss": 4.7529231160879135, "generalization_gap": 0.737479031085968, "model_config": {"block_size": 128, "dropout": 0.3, "n_embd": 384, "n_head": 8, "n_layer": 16, "vocab_size": 4096}, "model_name": "L16_H8_D384", "n_embd": 384, "n_head": 8, "n_layer": 16, "parameters": 31457280, "run_mode": "locked_stream", "seed": 1, "stage": 2, "steps": 1000, "token_limit": 1000000, "tokens_seen": 6144000, "train_eval_loss": 4.0154440850019455, "train_loss_last": 4.443882942199707, "val_eval_loss": 4.7529231160879135}
|
runs/{previous_local_updated_formula_clean_l16 → openwebtext10k_l16_updated_formula_clean_5seed}/locked_stream/20260530-174525/summary.csv
RENAMED
|
@@ -2,7 +2,7 @@ run_mode,condition,condition_kind,stage,token_limit,model_name,n_layer,n_head,n_
|
|
| 2 |
locked_stream,fitted_l16_static_law,anchor_decay,0,250000,L16_H8_D384,16,8,384,31457280,0.6,0.02,log_prefix_anchor,5,5.164006796479225,0.02748612153330559,5.7842145070433615,0.009632183754286684,0.6202077105641365,0.018181362630120823
|
| 3 |
locked_stream,hold_30_then_decay,anchor_decay,0,250000,L16_H8_D384,16,8,384,31457280,0.3,0.02,log_prefix_anchor,5,4.442901518940926,0.027340510763309508,5.4483301296830176,0.013828501308583057,1.0054286107420922,0.02219730946529904
|
| 4 |
locked_stream,mild_30_to_08,anchor_decay,0,250000,L16_H8_D384,16,8,384,31457280,0.3,0.08,log_prefix_anchor,5,4.442901518940926,0.027340586099289153,5.448330116271973,0.013828516502701142,1.005428597331047,0.022197359989343385
|
| 5 |
-
locked_stream,
|
| 6 |
locked_stream,static_dropout_0,static,0,250000,L16_H8_D384,16,8,384,31457280,0.0,0.0,constant,5,3.4442542552948,0.022358399496724347,5.8329681470990185,0.019809207037273006,2.388713891804218,0.038334133145443657
|
| 7 |
locked_stream,static_dropout_0.02,static,0,250000,L16_H8_D384,16,8,384,31457280,0.02,0.02,constant,5,3.537110958993435,0.008037117123073168,5.742638063430786,0.024161263410536992,2.2055271044373512,0.030737843551395496
|
| 8 |
locked_stream,static_dropout_0.14,static,0,250000,L16_H8_D384,16,8,384,31457280,0.14,0.14,constant,5,4.029827673733235,0.018556819977249093,5.477323499321938,0.02236835486589015,1.4474958255887032,0.03092474074054602
|
|
@@ -10,7 +10,7 @@ locked_stream,static_dropout_0.3,static,0,250000,L16_H8_D384,16,8,384,31457280,0
|
|
| 10 |
locked_stream,fitted_l16_static_law,anchor_decay,1,500000,L16_H8_D384,16,8,384,31457280,0.6,0.02,log_prefix_anchor,5,4.463223123550415,0.029267257511679485,5.150681225955486,0.010164023432481408,0.6874581024050712,0.024496105147219012
|
| 11 |
locked_stream,hold_30_then_decay,anchor_decay,1,500000,L16_H8_D384,16,8,384,31457280,0.3,0.02,log_prefix_anchor,5,4.1383186161518095,0.03875357135925004,5.066737350821495,0.017273545737457947,0.9284187346696854,0.04002925354224623
|
| 12 |
locked_stream,mild_30_to_08,anchor_decay,1,500000,L16_H8_D384,16,8,384,31457280,0.3,0.08,log_prefix_anchor,5,4.034893324971199,0.04033083916799125,5.058184179663658,0.015882720199114145,1.023290854692459,0.04098800520602419
|
| 13 |
-
locked_stream,
|
| 14 |
locked_stream,static_dropout_0,static,1,500000,L16_H8_D384,16,8,384,31457280,0.0,0.0,constant,5,2.958326259255409,0.044781060309162554,5.717529235780239,0.05024223752386389,2.75920297652483,0.07102439530887096
|
| 15 |
locked_stream,static_dropout_0.02,static,1,500000,L16_H8_D384,16,8,384,31457280,0.02,0.02,constant,5,3.124619247019291,0.031814549489392455,5.575391733646393,0.024791398740622035,2.450772486627102,0.030503049251572257
|
| 16 |
locked_stream,static_dropout_0.14,static,1,500000,L16_H8_D384,16,8,384,31457280,0.14,0.14,constant,5,3.714307613670826,0.03238913748160129,5.149166536331177,0.007010026540791338,1.4348589226603508,0.031243440426199517
|
|
@@ -18,7 +18,7 @@ locked_stream,static_dropout_0.3,static,1,500000,L16_H8_D384,16,8,384,31457280,0
|
|
| 18 |
locked_stream,fitted_l16_static_law,anchor_decay,2,1000000,L16_H8_D384,16,8,384,31457280,0.6,0.02,log_prefix_anchor,5,4.263189716637134,0.02333674196202296,4.832601730525494,0.010169544124120607,0.569412013888359,0.023004537548591726
|
| 19 |
locked_stream,hold_30_then_decay,anchor_decay,2,1000000,L16_H8_D384,16,8,384,31457280,0.3,0.02,log_prefix_anchor,5,4.037793649733066,0.02368477035230831,4.775730343163014,0.014352387307903692,0.7379366934299469,0.01882967372675974
|
| 20 |
locked_stream,mild_30_to_08,anchor_decay,2,1000000,L16_H8_D384,16,8,384,31457280,0.3,0.08,log_prefix_anchor,5,3.9886452093720437,0.02349137402419598,4.777442049980164,0.013845858727658497,0.7887968406081199,0.018652082916074838
|
| 21 |
-
locked_stream,
|
| 22 |
locked_stream,static_dropout_0,static,2,1000000,L16_H8_D384,16,8,384,31457280,0.0,0.0,constant,5,3.3260142356157303,0.03607156293344983,5.26366505920887,0.027353946222948587,1.9376508235931396,0.03553067411055354
|
| 23 |
locked_stream,static_dropout_0.02,static,2,1000000,L16_H8_D384,16,8,384,31457280,0.02,0.02,constant,5,3.4615398421883583,0.03992270092195685,5.14697041362524,0.022233878343551068,1.685430571436882,0.04092951267469098
|
| 24 |
locked_stream,static_dropout_0.14,static,2,1000000,L16_H8_D384,16,8,384,31457280,0.14,0.14,constant,5,3.8711691960692405,0.02974306105040781,4.849037018418312,0.020208736415348236,0.9778678223490715,0.023799818088071894
|
|
@@ -26,7 +26,7 @@ locked_stream,static_dropout_0.3,static,2,1000000,L16_H8_D384,16,8,384,31457280,
|
|
| 26 |
locked_stream,fitted_l16_static_law,anchor_decay,3,2000000,L16_H8_D384,16,8,384,31457280,0.6,0.02,log_prefix_anchor,5,4.14712455868721,0.01706029159496315,4.58056578040123,0.01532149630405117,0.4334412217140198,0.019914111395845077
|
| 27 |
locked_stream,hold_30_then_decay,anchor_decay,3,2000000,L16_H8_D384,16,8,384,31457280,0.3,0.02,log_prefix_anchor,5,4.044496415555477,0.019708741233353012,4.559869511425495,0.016051317749301037,0.515373095870018,0.020379012527272283
|
| 28 |
locked_stream,mild_30_to_08,anchor_decay,3,2000000,L16_H8_D384,16,8,384,31457280,0.3,0.08,log_prefix_anchor,5,4.044088624417782,0.020441976517745996,4.563060106337071,0.015509498762185112,0.5189714819192887,0.022529376522631556
|
| 29 |
-
locked_stream,
|
| 30 |
locked_stream,static_dropout_0,static,3,2000000,L16_H8_D384,16,8,384,31457280,0.0,0.0,constant,5,3.778580814599991,0.03536285448761605,4.847234210371971,0.0170992476167825,1.0686533957719804,0.025604091638377884
|
| 31 |
locked_stream,static_dropout_0.02,static,3,2000000,L16_H8_D384,16,8,384,31457280,0.02,0.02,constant,5,3.840523959696293,0.03097454466954304,4.784735175967216,0.019582585992709827,0.9442112162709236,0.02147121638277758
|
| 32 |
locked_stream,static_dropout_0.14,static,3,2000000,L16_H8_D384,16,8,384,31457280,0.14,0.14,constant,5,4.039909638464451,0.025550506633378975,4.6047517821192745,0.013619996903704912,0.5648421436548233,0.015970945478988943
|
|
@@ -34,7 +34,7 @@ locked_stream,static_dropout_0.3,static,3,2000000,L16_H8_D384,16,8,384,31457280,
|
|
| 34 |
locked_stream,fitted_l16_static_law,anchor_decay,4,4000000,L16_H8_D384,16,8,384,31457280,0.6,0.02,log_prefix_anchor,5,4.098657152056694,0.01111204513074185,4.412404176592827,0.00843791675235308,0.3137470245361328,0.007204760471400837
|
| 35 |
locked_stream,hold_30_then_decay,anchor_decay,4,4000000,L16_H8_D384,16,8,384,31457280,0.3,0.02,log_prefix_anchor,5,4.0487526342272755,0.007824452256379268,4.405232906341553,0.011151070705538514,0.3564802721142769,0.01297330703929578
|
| 36 |
locked_stream,mild_30_to_08,anchor_decay,4,4000000,L16_H8_D384,16,8,384,31457280,0.3,0.08,log_prefix_anchor,5,4.07358001768589,0.0063536190340169095,4.40728645324707,0.008502541215009067,0.3337064355611801,0.010359634321755684
|
| 37 |
-
locked_stream,
|
| 38 |
locked_stream,static_dropout_0,static,4,4000000,L16_H8_D384,16,8,384,31457280,0.0,0.0,constant,5,4.041403333842754,0.017193152802814336,4.594272664189338,0.021638340853154137,0.5528693303465844,0.029132548047629703
|
| 39 |
locked_stream,static_dropout_0.02,static,4,4000000,L16_H8_D384,16,8,384,31457280,0.02,0.02,constant,5,4.052870315313339,0.02163703576438587,4.535757505893708,0.00908401354385357,0.48288719058036805,0.020126181497736668
|
| 40 |
locked_stream,static_dropout_0.14,static,4,4000000,L16_H8_D384,16,8,384,31457280,0.14,0.14,constant,5,4.116507206857205,0.014037194709348206,4.44545366615057,0.012017216742245517,0.32894645929336547,0.01603071874172604
|
|
|
|
| 2 |
locked_stream,fitted_l16_static_law,anchor_decay,0,250000,L16_H8_D384,16,8,384,31457280,0.6,0.02,log_prefix_anchor,5,5.164006796479225,0.02748612153330559,5.7842145070433615,0.009632183754286684,0.6202077105641365,0.018181362630120823
|
| 3 |
locked_stream,hold_30_then_decay,anchor_decay,0,250000,L16_H8_D384,16,8,384,31457280,0.3,0.02,log_prefix_anchor,5,4.442901518940926,0.027340510763309508,5.4483301296830176,0.013828501308583057,1.0054286107420922,0.02219730946529904
|
| 4 |
locked_stream,mild_30_to_08,anchor_decay,0,250000,L16_H8_D384,16,8,384,31457280,0.3,0.08,log_prefix_anchor,5,4.442901518940926,0.027340586099289153,5.448330116271973,0.013828516502701142,1.005428597331047,0.022197359989343385
|
| 5 |
+
locked_stream,openwebtext10k_interaction,anchor_decay,0,250000,L16_H8_D384,16,8,384,31457280,0.385,0.066,log_prefix_anchor,5,4.6016244173049925,0.026939774812057612,5.4946602284908295,0.01093302726132647,0.8930358111858367,0.016010040404479016
|
| 6 |
locked_stream,static_dropout_0,static,0,250000,L16_H8_D384,16,8,384,31457280,0.0,0.0,constant,5,3.4442542552948,0.022358399496724347,5.8329681470990185,0.019809207037273006,2.388713891804218,0.038334133145443657
|
| 7 |
locked_stream,static_dropout_0.02,static,0,250000,L16_H8_D384,16,8,384,31457280,0.02,0.02,constant,5,3.537110958993435,0.008037117123073168,5.742638063430786,0.024161263410536992,2.2055271044373512,0.030737843551395496
|
| 8 |
locked_stream,static_dropout_0.14,static,0,250000,L16_H8_D384,16,8,384,31457280,0.14,0.14,constant,5,4.029827673733235,0.018556819977249093,5.477323499321938,0.02236835486589015,1.4474958255887032,0.03092474074054602
|
|
|
|
| 10 |
locked_stream,fitted_l16_static_law,anchor_decay,1,500000,L16_H8_D384,16,8,384,31457280,0.6,0.02,log_prefix_anchor,5,4.463223123550415,0.029267257511679485,5.150681225955486,0.010164023432481408,0.6874581024050712,0.024496105147219012
|
| 11 |
locked_stream,hold_30_then_decay,anchor_decay,1,500000,L16_H8_D384,16,8,384,31457280,0.3,0.02,log_prefix_anchor,5,4.1383186161518095,0.03875357135925004,5.066737350821495,0.017273545737457947,0.9284187346696854,0.04002925354224623
|
| 12 |
locked_stream,mild_30_to_08,anchor_decay,1,500000,L16_H8_D384,16,8,384,31457280,0.3,0.08,log_prefix_anchor,5,4.034893324971199,0.04033083916799125,5.058184179663658,0.015882720199114145,1.023290854692459,0.04098800520602419
|
| 13 |
+
locked_stream,openwebtext10k_interaction,anchor_decay,1,500000,L16_H8_D384,16,8,384,31457280,0.385,0.066,log_prefix_anchor,5,4.20646400153637,0.03245807641301778,5.071460470557213,0.01179360076463939,0.8649964690208435,0.028479060454710485
|
| 14 |
locked_stream,static_dropout_0,static,1,500000,L16_H8_D384,16,8,384,31457280,0.0,0.0,constant,5,2.958326259255409,0.044781060309162554,5.717529235780239,0.05024223752386389,2.75920297652483,0.07102439530887096
|
| 15 |
locked_stream,static_dropout_0.02,static,1,500000,L16_H8_D384,16,8,384,31457280,0.02,0.02,constant,5,3.124619247019291,0.031814549489392455,5.575391733646393,0.024791398740622035,2.450772486627102,0.030503049251572257
|
| 16 |
locked_stream,static_dropout_0.14,static,1,500000,L16_H8_D384,16,8,384,31457280,0.14,0.14,constant,5,3.714307613670826,0.03238913748160129,5.149166536331177,0.007010026540791338,1.4348589226603508,0.031243440426199517
|
|
|
|
| 18 |
locked_stream,fitted_l16_static_law,anchor_decay,2,1000000,L16_H8_D384,16,8,384,31457280,0.6,0.02,log_prefix_anchor,5,4.263189716637134,0.02333674196202296,4.832601730525494,0.010169544124120607,0.569412013888359,0.023004537548591726
|
| 19 |
locked_stream,hold_30_then_decay,anchor_decay,2,1000000,L16_H8_D384,16,8,384,31457280,0.3,0.02,log_prefix_anchor,5,4.037793649733066,0.02368477035230831,4.775730343163014,0.014352387307903692,0.7379366934299469,0.01882967372675974
|
| 20 |
locked_stream,mild_30_to_08,anchor_decay,2,1000000,L16_H8_D384,16,8,384,31457280,0.3,0.08,log_prefix_anchor,5,3.9886452093720437,0.02349137402419598,4.777442049980164,0.013845858727658497,0.7887968406081199,0.018652082916074838
|
| 21 |
+
locked_stream,openwebtext10k_interaction,anchor_decay,2,1000000,L16_H8_D384,16,8,384,31457280,0.385,0.066,log_prefix_anchor,5,4.08262689858675,0.023420093535722695,4.781069016456604,0.008428247627355752,0.698442117869854,0.028148054160246055
|
| 22 |
locked_stream,static_dropout_0,static,2,1000000,L16_H8_D384,16,8,384,31457280,0.0,0.0,constant,5,3.3260142356157303,0.03607156293344983,5.26366505920887,0.027353946222948587,1.9376508235931396,0.03553067411055354
|
| 23 |
locked_stream,static_dropout_0.02,static,2,1000000,L16_H8_D384,16,8,384,31457280,0.02,0.02,constant,5,3.4615398421883583,0.03992270092195685,5.14697041362524,0.022233878343551068,1.685430571436882,0.04092951267469098
|
| 24 |
locked_stream,static_dropout_0.14,static,2,1000000,L16_H8_D384,16,8,384,31457280,0.14,0.14,constant,5,3.8711691960692405,0.02974306105040781,4.849037018418312,0.020208736415348236,0.9778678223490715,0.023799818088071894
|
|
|
|
| 26 |
locked_stream,fitted_l16_static_law,anchor_decay,3,2000000,L16_H8_D384,16,8,384,31457280,0.6,0.02,log_prefix_anchor,5,4.14712455868721,0.01706029159496315,4.58056578040123,0.01532149630405117,0.4334412217140198,0.019914111395845077
|
| 27 |
locked_stream,hold_30_then_decay,anchor_decay,3,2000000,L16_H8_D384,16,8,384,31457280,0.3,0.02,log_prefix_anchor,5,4.044496415555477,0.019708741233353012,4.559869511425495,0.016051317749301037,0.515373095870018,0.020379012527272283
|
| 28 |
locked_stream,mild_30_to_08,anchor_decay,3,2000000,L16_H8_D384,16,8,384,31457280,0.3,0.08,log_prefix_anchor,5,4.044088624417782,0.020441976517745996,4.563060106337071,0.015509498762185112,0.5189714819192887,0.022529376522631556
|
| 29 |
+
locked_stream,openwebtext10k_interaction,anchor_decay,3,2000000,L16_H8_D384,16,8,384,31457280,0.385,0.066,log_prefix_anchor,5,4.080180557072163,0.022080406390158354,4.558990895748138,0.014177173687439483,0.4788103386759758,0.021926835907345392
|
| 30 |
locked_stream,static_dropout_0,static,3,2000000,L16_H8_D384,16,8,384,31457280,0.0,0.0,constant,5,3.778580814599991,0.03536285448761605,4.847234210371971,0.0170992476167825,1.0686533957719804,0.025604091638377884
|
| 31 |
locked_stream,static_dropout_0.02,static,3,2000000,L16_H8_D384,16,8,384,31457280,0.02,0.02,constant,5,3.840523959696293,0.03097454466954304,4.784735175967216,0.019582585992709827,0.9442112162709236,0.02147121638277758
|
| 32 |
locked_stream,static_dropout_0.14,static,3,2000000,L16_H8_D384,16,8,384,31457280,0.14,0.14,constant,5,4.039909638464451,0.025550506633378975,4.6047517821192745,0.013619996903704912,0.5648421436548233,0.015970945478988943
|
|
|
|
| 34 |
locked_stream,fitted_l16_static_law,anchor_decay,4,4000000,L16_H8_D384,16,8,384,31457280,0.6,0.02,log_prefix_anchor,5,4.098657152056694,0.01111204513074185,4.412404176592827,0.00843791675235308,0.3137470245361328,0.007204760471400837
|
| 35 |
locked_stream,hold_30_then_decay,anchor_decay,4,4000000,L16_H8_D384,16,8,384,31457280,0.3,0.02,log_prefix_anchor,5,4.0487526342272755,0.007824452256379268,4.405232906341553,0.011151070705538514,0.3564802721142769,0.01297330703929578
|
| 36 |
locked_stream,mild_30_to_08,anchor_decay,4,4000000,L16_H8_D384,16,8,384,31457280,0.3,0.08,log_prefix_anchor,5,4.07358001768589,0.0063536190340169095,4.40728645324707,0.008502541215009067,0.3337064355611801,0.010359634321755684
|
| 37 |
+
locked_stream,openwebtext10k_interaction,anchor_decay,4,4000000,L16_H8_D384,16,8,384,31457280,0.385,0.066,log_prefix_anchor,5,4.080478595197201,0.009311692964638253,4.3981304407119755,0.009545784836147743,0.3176518455147743,0.007999498965173152
|
| 38 |
locked_stream,static_dropout_0,static,4,4000000,L16_H8_D384,16,8,384,31457280,0.0,0.0,constant,5,4.041403333842754,0.017193152802814336,4.594272664189338,0.021638340853154137,0.5528693303465844,0.029132548047629703
|
| 39 |
locked_stream,static_dropout_0.02,static,4,4000000,L16_H8_D384,16,8,384,31457280,0.02,0.02,constant,5,4.052870315313339,0.02163703576438587,4.535757505893708,0.00908401354385357,0.48288719058036805,0.020126181497736668
|
| 40 |
locked_stream,static_dropout_0.14,static,4,4000000,L16_H8_D384,16,8,384,31457280,0.14,0.14,constant,5,4.116507206857205,0.014037194709348206,4.44545366615057,0.012017216742245517,0.32894645929336547,0.01603071874172604
|
runs/{previous_local_updated_formula_clean_l16 → openwebtext10k_l16_updated_formula_clean_5seed}/locked_stream/20260530-174525/summary.json
RENAMED
|
@@ -67,7 +67,7 @@
|
|
| 67 |
},
|
| 68 |
{
|
| 69 |
"run_mode": "locked_stream",
|
| 70 |
-
"condition": "
|
| 71 |
"condition_kind": "anchor_decay",
|
| 72 |
"stage": 0,
|
| 73 |
"token_limit": 250000,
|
|
@@ -243,7 +243,7 @@
|
|
| 243 |
},
|
| 244 |
{
|
| 245 |
"run_mode": "locked_stream",
|
| 246 |
-
"condition": "
|
| 247 |
"condition_kind": "anchor_decay",
|
| 248 |
"stage": 1,
|
| 249 |
"token_limit": 500000,
|
|
@@ -419,7 +419,7 @@
|
|
| 419 |
},
|
| 420 |
{
|
| 421 |
"run_mode": "locked_stream",
|
| 422 |
-
"condition": "
|
| 423 |
"condition_kind": "anchor_decay",
|
| 424 |
"stage": 2,
|
| 425 |
"token_limit": 1000000,
|
|
@@ -595,7 +595,7 @@
|
|
| 595 |
},
|
| 596 |
{
|
| 597 |
"run_mode": "locked_stream",
|
| 598 |
-
"condition": "
|
| 599 |
"condition_kind": "anchor_decay",
|
| 600 |
"stage": 3,
|
| 601 |
"token_limit": 2000000,
|
|
@@ -771,7 +771,7 @@
|
|
| 771 |
},
|
| 772 |
{
|
| 773 |
"run_mode": "locked_stream",
|
| 774 |
-
"condition": "
|
| 775 |
"condition_kind": "anchor_decay",
|
| 776 |
"stage": 4,
|
| 777 |
"token_limit": 4000000,
|
|
|
|
| 67 |
},
|
| 68 |
{
|
| 69 |
"run_mode": "locked_stream",
|
| 70 |
+
"condition": "openwebtext10k_interaction",
|
| 71 |
"condition_kind": "anchor_decay",
|
| 72 |
"stage": 0,
|
| 73 |
"token_limit": 250000,
|
|
|
|
| 243 |
},
|
| 244 |
{
|
| 245 |
"run_mode": "locked_stream",
|
| 246 |
+
"condition": "openwebtext10k_interaction",
|
| 247 |
"condition_kind": "anchor_decay",
|
| 248 |
"stage": 1,
|
| 249 |
"token_limit": 500000,
|
|
|
|
| 419 |
},
|
| 420 |
{
|
| 421 |
"run_mode": "locked_stream",
|
| 422 |
+
"condition": "openwebtext10k_interaction",
|
| 423 |
"condition_kind": "anchor_decay",
|
| 424 |
"stage": 2,
|
| 425 |
"token_limit": 1000000,
|
|
|
|
| 595 |
},
|
| 596 |
{
|
| 597 |
"run_mode": "locked_stream",
|
| 598 |
+
"condition": "openwebtext10k_interaction",
|
| 599 |
"condition_kind": "anchor_decay",
|
| 600 |
"stage": 3,
|
| 601 |
"token_limit": 2000000,
|
|
|
|
| 771 |
},
|
| 772 |
{
|
| 773 |
"run_mode": "locked_stream",
|
| 774 |
+
"condition": "openwebtext10k_interaction",
|
| 775 |
"condition_kind": "anchor_decay",
|
| 776 |
"stage": 4,
|
| 777 |
"token_limit": 4000000,
|
runs/{previous_local_updated_formula_clean_l16 → openwebtext10k_l16_updated_formula_clean_5seed}/locked_stream/20260530-174525/trace.jsonl
RENAMED
|
@@ -1,103 +1,103 @@
|
|
| 1 |
-
{"condition": "
|
| 2 |
-
{"condition": "
|
| 3 |
-
{"condition": "
|
| 4 |
-
{"condition": "
|
| 5 |
-
{"condition": "
|
| 6 |
-
{"condition": "
|
| 7 |
-
{"condition": "
|
| 8 |
-
{"condition": "
|
| 9 |
-
{"condition": "
|
| 10 |
-
{"condition": "
|
| 11 |
-
{"condition": "
|
| 12 |
-
{"condition": "
|
| 13 |
-
{"condition": "
|
| 14 |
-
{"condition": "
|
| 15 |
-
{"condition": "
|
| 16 |
-
{"condition": "
|
| 17 |
-
{"condition": "
|
| 18 |
-
{"condition": "
|
| 19 |
-
{"condition": "
|
| 20 |
-
{"condition": "
|
| 21 |
-
{"condition": "
|
| 22 |
-
{"condition": "
|
| 23 |
-
{"condition": "
|
| 24 |
-
{"condition": "
|
| 25 |
-
{"condition": "
|
| 26 |
-
{"condition": "
|
| 27 |
-
{"condition": "
|
| 28 |
-
{"condition": "
|
| 29 |
-
{"condition": "
|
| 30 |
-
{"condition": "
|
| 31 |
-
{"condition": "
|
| 32 |
-
{"condition": "
|
| 33 |
-
{"condition": "
|
| 34 |
-
{"condition": "
|
| 35 |
-
{"condition": "
|
| 36 |
-
{"condition": "
|
| 37 |
-
{"condition": "
|
| 38 |
-
{"condition": "
|
| 39 |
-
{"condition": "
|
| 40 |
-
{"condition": "
|
| 41 |
-
{"condition": "
|
| 42 |
-
{"condition": "
|
| 43 |
-
{"condition": "
|
| 44 |
-
{"condition": "
|
| 45 |
-
{"condition": "
|
| 46 |
-
{"condition": "
|
| 47 |
-
{"condition": "
|
| 48 |
-
{"condition": "
|
| 49 |
-
{"condition": "
|
| 50 |
-
{"condition": "
|
| 51 |
-
{"condition": "
|
| 52 |
-
{"condition": "
|
| 53 |
-
{"condition": "
|
| 54 |
-
{"condition": "
|
| 55 |
-
{"condition": "
|
| 56 |
-
{"condition": "
|
| 57 |
-
{"condition": "
|
| 58 |
-
{"condition": "
|
| 59 |
-
{"condition": "
|
| 60 |
-
{"condition": "
|
| 61 |
-
{"condition": "
|
| 62 |
-
{"condition": "
|
| 63 |
-
{"condition": "
|
| 64 |
-
{"condition": "
|
| 65 |
-
{"condition": "
|
| 66 |
-
{"condition": "
|
| 67 |
-
{"condition": "
|
| 68 |
-
{"condition": "
|
| 69 |
-
{"condition": "
|
| 70 |
-
{"condition": "
|
| 71 |
-
{"condition": "
|
| 72 |
-
{"condition": "
|
| 73 |
-
{"condition": "
|
| 74 |
-
{"condition": "
|
| 75 |
-
{"condition": "
|
| 76 |
-
{"condition": "
|
| 77 |
-
{"condition": "
|
| 78 |
-
{"condition": "
|
| 79 |
-
{"condition": "
|
| 80 |
-
{"condition": "
|
| 81 |
-
{"condition": "
|
| 82 |
-
{"condition": "
|
| 83 |
-
{"condition": "
|
| 84 |
-
{"condition": "
|
| 85 |
-
{"condition": "
|
| 86 |
-
{"condition": "
|
| 87 |
-
{"condition": "
|
| 88 |
-
{"condition": "
|
| 89 |
-
{"condition": "
|
| 90 |
-
{"condition": "
|
| 91 |
-
{"condition": "
|
| 92 |
-
{"condition": "
|
| 93 |
-
{"condition": "
|
| 94 |
-
{"condition": "
|
| 95 |
-
{"condition": "
|
| 96 |
-
{"condition": "
|
| 97 |
-
{"condition": "
|
| 98 |
-
{"condition": "
|
| 99 |
-
{"condition": "
|
| 100 |
-
{"condition": "
|
| 101 |
{"condition": "hold_30_then_decay", "dropout": 0.3, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 0, "step": 250, "steps": 1000, "token_limit": 250000, "tokens_seen": 512000, "train_batch_loss": 6.104031562805176}
|
| 102 |
{"condition": "hold_30_then_decay", "dropout": 0.3, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 0, "step": 500, "steps": 1000, "token_limit": 250000, "tokens_seen": 1024000, "train_batch_loss": 5.276653289794922}
|
| 103 |
{"condition": "hold_30_then_decay", "dropout": 0.3, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 0, "step": 750, "steps": 1000, "token_limit": 250000, "tokens_seen": 1536000, "train_batch_loss": 4.9929280281066895}
|
|
|
|
| 1 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 0, "step": 250, "steps": 1000, "token_limit": 250000, "tokens_seen": 512000, "train_batch_loss": 6.171935558319092}
|
| 2 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 0, "step": 500, "steps": 1000, "token_limit": 250000, "tokens_seen": 1024000, "train_batch_loss": 5.48825740814209}
|
| 3 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 0, "step": 750, "steps": 1000, "token_limit": 250000, "tokens_seen": 1536000, "train_batch_loss": 4.841618061065674}
|
| 4 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 0, "step": 1000, "steps": 1000, "token_limit": 250000, "tokens_seen": 2048000, "train_batch_loss": 4.844176292419434}
|
| 5 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 1, "step": 250, "steps": 1000, "token_limit": 500000, "tokens_seen": 2560000, "train_batch_loss": 4.8820390701293945}
|
| 6 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 1, "step": 500, "steps": 1000, "token_limit": 500000, "tokens_seen": 3072000, "train_batch_loss": 4.497588157653809}
|
| 7 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 1, "step": 750, "steps": 1000, "token_limit": 500000, "tokens_seen": 3584000, "train_batch_loss": 4.518710136413574}
|
| 8 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 1, "step": 1000, "steps": 1000, "token_limit": 500000, "tokens_seen": 4096000, "train_batch_loss": 4.244839668273926}
|
| 9 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 2, "step": 250, "steps": 1000, "token_limit": 1000000, "tokens_seen": 4608000, "train_batch_loss": 4.528886795043945}
|
| 10 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 2, "step": 500, "steps": 1000, "token_limit": 1000000, "tokens_seen": 5120000, "train_batch_loss": 4.406923770904541}
|
| 11 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 2, "step": 750, "steps": 1000, "token_limit": 1000000, "tokens_seen": 5632000, "train_batch_loss": 4.215362548828125}
|
| 12 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 2, "step": 1000, "steps": 1000, "token_limit": 1000000, "tokens_seen": 6144000, "train_batch_loss": 4.434404373168945}
|
| 13 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 3, "step": 250, "steps": 1000, "token_limit": 2000000, "tokens_seen": 6656000, "train_batch_loss": 4.3406572341918945}
|
| 14 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 3, "step": 500, "steps": 1000, "token_limit": 2000000, "tokens_seen": 7168000, "train_batch_loss": 4.156265735626221}
|
| 15 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 3, "step": 750, "steps": 1000, "token_limit": 2000000, "tokens_seen": 7680000, "train_batch_loss": 4.173320770263672}
|
| 16 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 3, "step": 1000, "steps": 1000, "token_limit": 2000000, "tokens_seen": 8192000, "train_batch_loss": 4.123593330383301}
|
| 17 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 4, "step": 250, "steps": 1000, "token_limit": 4000000, "tokens_seen": 8704000, "train_batch_loss": 4.2380781173706055}
|
| 18 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 4, "step": 500, "steps": 1000, "token_limit": 4000000, "tokens_seen": 9216000, "train_batch_loss": 4.078634262084961}
|
| 19 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 4, "step": 750, "steps": 1000, "token_limit": 4000000, "tokens_seen": 9728000, "train_batch_loss": 4.236572265625}
|
| 20 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 4, "step": 1000, "steps": 1000, "token_limit": 4000000, "tokens_seen": 10240000, "train_batch_loss": 4.194511413574219}
|
| 21 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 0, "step": 250, "steps": 1000, "token_limit": 250000, "tokens_seen": 512000, "train_batch_loss": 6.254297256469727}
|
| 22 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 0, "step": 500, "steps": 1000, "token_limit": 250000, "tokens_seen": 1024000, "train_batch_loss": 5.503683567047119}
|
| 23 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 0, "step": 750, "steps": 1000, "token_limit": 250000, "tokens_seen": 1536000, "train_batch_loss": 5.088714122772217}
|
| 24 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 0, "step": 1000, "steps": 1000, "token_limit": 250000, "tokens_seen": 2048000, "train_batch_loss": 4.836062908172607}
|
| 25 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 1, "step": 250, "steps": 1000, "token_limit": 500000, "tokens_seen": 2560000, "train_batch_loss": 4.710488319396973}
|
| 26 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 1, "step": 500, "steps": 1000, "token_limit": 500000, "tokens_seen": 3072000, "train_batch_loss": 4.585000991821289}
|
| 27 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 1, "step": 750, "steps": 1000, "token_limit": 500000, "tokens_seen": 3584000, "train_batch_loss": 4.286666393280029}
|
| 28 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 1, "step": 1000, "steps": 1000, "token_limit": 500000, "tokens_seen": 4096000, "train_batch_loss": 4.347846984863281}
|
| 29 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 2, "step": 250, "steps": 1000, "token_limit": 1000000, "tokens_seen": 4608000, "train_batch_loss": 4.413064002990723}
|
| 30 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 2, "step": 500, "steps": 1000, "token_limit": 1000000, "tokens_seen": 5120000, "train_batch_loss": 4.370894432067871}
|
| 31 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 2, "step": 750, "steps": 1000, "token_limit": 1000000, "tokens_seen": 5632000, "train_batch_loss": 4.284385681152344}
|
| 32 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 2, "step": 1000, "steps": 1000, "token_limit": 1000000, "tokens_seen": 6144000, "train_batch_loss": 4.249028205871582}
|
| 33 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 3, "step": 250, "steps": 1000, "token_limit": 2000000, "tokens_seen": 6656000, "train_batch_loss": 4.31729793548584}
|
| 34 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 3, "step": 500, "steps": 1000, "token_limit": 2000000, "tokens_seen": 7168000, "train_batch_loss": 4.12711238861084}
|
| 35 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 3, "step": 750, "steps": 1000, "token_limit": 2000000, "tokens_seen": 7680000, "train_batch_loss": 4.441766738891602}
|
| 36 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 3, "step": 1000, "steps": 1000, "token_limit": 2000000, "tokens_seen": 8192000, "train_batch_loss": 4.3183135986328125}
|
| 37 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 4, "step": 250, "steps": 1000, "token_limit": 4000000, "tokens_seen": 8704000, "train_batch_loss": 4.346931457519531}
|
| 38 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 4, "step": 500, "steps": 1000, "token_limit": 4000000, "tokens_seen": 9216000, "train_batch_loss": 4.198246479034424}
|
| 39 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 4, "step": 750, "steps": 1000, "token_limit": 4000000, "tokens_seen": 9728000, "train_batch_loss": 4.135946273803711}
|
| 40 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 2, "stage": 4, "step": 1000, "steps": 1000, "token_limit": 4000000, "tokens_seen": 10240000, "train_batch_loss": 4.001583576202393}
|
| 41 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 0, "step": 250, "steps": 1000, "token_limit": 250000, "tokens_seen": 512000, "train_batch_loss": 6.144522666931152}
|
| 42 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 0, "step": 500, "steps": 1000, "token_limit": 250000, "tokens_seen": 1024000, "train_batch_loss": 5.652010917663574}
|
| 43 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 0, "step": 750, "steps": 1000, "token_limit": 250000, "tokens_seen": 1536000, "train_batch_loss": 4.912023544311523}
|
| 44 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 0, "step": 1000, "steps": 1000, "token_limit": 250000, "tokens_seen": 2048000, "train_batch_loss": 4.735123634338379}
|
| 45 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 1, "step": 250, "steps": 1000, "token_limit": 500000, "tokens_seen": 2560000, "train_batch_loss": 4.92975378036499}
|
| 46 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 1, "step": 500, "steps": 1000, "token_limit": 500000, "tokens_seen": 3072000, "train_batch_loss": 4.763935089111328}
|
| 47 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 1, "step": 750, "steps": 1000, "token_limit": 500000, "tokens_seen": 3584000, "train_batch_loss": 4.548640251159668}
|
| 48 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 1, "step": 1000, "steps": 1000, "token_limit": 500000, "tokens_seen": 4096000, "train_batch_loss": 4.428989410400391}
|
| 49 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 2, "step": 250, "steps": 1000, "token_limit": 1000000, "tokens_seen": 4608000, "train_batch_loss": 4.43540096282959}
|
| 50 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 2, "step": 500, "steps": 1000, "token_limit": 1000000, "tokens_seen": 5120000, "train_batch_loss": 4.226118087768555}
|
| 51 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 2, "step": 750, "steps": 1000, "token_limit": 1000000, "tokens_seen": 5632000, "train_batch_loss": 4.112758636474609}
|
| 52 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 2, "step": 1000, "steps": 1000, "token_limit": 1000000, "tokens_seen": 6144000, "train_batch_loss": 4.148219108581543}
|
| 53 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 3, "step": 250, "steps": 1000, "token_limit": 2000000, "tokens_seen": 6656000, "train_batch_loss": 4.413056373596191}
|
| 54 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 3, "step": 500, "steps": 1000, "token_limit": 2000000, "tokens_seen": 7168000, "train_batch_loss": 4.175434112548828}
|
| 55 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 3, "step": 750, "steps": 1000, "token_limit": 2000000, "tokens_seen": 7680000, "train_batch_loss": 4.2766265869140625}
|
| 56 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 3, "step": 1000, "steps": 1000, "token_limit": 2000000, "tokens_seen": 8192000, "train_batch_loss": 4.133602142333984}
|
| 57 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 4, "step": 250, "steps": 1000, "token_limit": 4000000, "tokens_seen": 8704000, "train_batch_loss": 4.40939998626709}
|
| 58 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 4, "step": 500, "steps": 1000, "token_limit": 4000000, "tokens_seen": 9216000, "train_batch_loss": 4.337164878845215}
|
| 59 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 4, "step": 750, "steps": 1000, "token_limit": 4000000, "tokens_seen": 9728000, "train_batch_loss": 4.1670074462890625}
|
| 60 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 3, "stage": 4, "step": 1000, "steps": 1000, "token_limit": 4000000, "tokens_seen": 10240000, "train_batch_loss": 4.316803932189941}
|
| 61 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 0, "step": 250, "steps": 1000, "token_limit": 250000, "tokens_seen": 512000, "train_batch_loss": 6.117185592651367}
|
| 62 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 0, "step": 500, "steps": 1000, "token_limit": 250000, "tokens_seen": 1024000, "train_batch_loss": 5.526948928833008}
|
| 63 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 0, "step": 750, "steps": 1000, "token_limit": 250000, "tokens_seen": 1536000, "train_batch_loss": 5.118289947509766}
|
| 64 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 0, "step": 1000, "steps": 1000, "token_limit": 250000, "tokens_seen": 2048000, "train_batch_loss": 4.885035514831543}
|
| 65 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 1, "step": 250, "steps": 1000, "token_limit": 500000, "tokens_seen": 2560000, "train_batch_loss": 4.551980018615723}
|
| 66 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 1, "step": 500, "steps": 1000, "token_limit": 500000, "tokens_seen": 3072000, "train_batch_loss": 4.621177673339844}
|
| 67 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 1, "step": 750, "steps": 1000, "token_limit": 500000, "tokens_seen": 3584000, "train_batch_loss": 4.473027229309082}
|
| 68 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 1, "step": 1000, "steps": 1000, "token_limit": 500000, "tokens_seen": 4096000, "train_batch_loss": 4.507782459259033}
|
| 69 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 2, "step": 250, "steps": 1000, "token_limit": 1000000, "tokens_seen": 4608000, "train_batch_loss": 4.615119934082031}
|
| 70 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 2, "step": 500, "steps": 1000, "token_limit": 1000000, "tokens_seen": 5120000, "train_batch_loss": 4.372061729431152}
|
| 71 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 2, "step": 750, "steps": 1000, "token_limit": 1000000, "tokens_seen": 5632000, "train_batch_loss": 4.355410099029541}
|
| 72 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 2, "step": 1000, "steps": 1000, "token_limit": 1000000, "tokens_seen": 6144000, "train_batch_loss": 4.1192522048950195}
|
| 73 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 3, "step": 250, "steps": 1000, "token_limit": 2000000, "tokens_seen": 6656000, "train_batch_loss": 4.4725799560546875}
|
| 74 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 3, "step": 500, "steps": 1000, "token_limit": 2000000, "tokens_seen": 7168000, "train_batch_loss": 4.226810455322266}
|
| 75 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 3, "step": 750, "steps": 1000, "token_limit": 2000000, "tokens_seen": 7680000, "train_batch_loss": 4.25181770324707}
|
| 76 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 3, "step": 1000, "steps": 1000, "token_limit": 2000000, "tokens_seen": 8192000, "train_batch_loss": 4.137112140655518}
|
| 77 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 4, "step": 250, "steps": 1000, "token_limit": 4000000, "tokens_seen": 8704000, "train_batch_loss": 4.0571441650390625}
|
| 78 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 4, "step": 500, "steps": 1000, "token_limit": 4000000, "tokens_seen": 9216000, "train_batch_loss": 4.087485313415527}
|
| 79 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 4, "step": 750, "steps": 1000, "token_limit": 4000000, "tokens_seen": 9728000, "train_batch_loss": 4.146313190460205}
|
| 80 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 4, "stage": 4, "step": 1000, "steps": 1000, "token_limit": 4000000, "tokens_seen": 10240000, "train_batch_loss": 3.990117311477661}
|
| 81 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 0, "step": 250, "steps": 1000, "token_limit": 250000, "tokens_seen": 512000, "train_batch_loss": 6.368095397949219}
|
| 82 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 0, "step": 500, "steps": 1000, "token_limit": 250000, "tokens_seen": 1024000, "train_batch_loss": 5.487208366394043}
|
| 83 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 0, "step": 750, "steps": 1000, "token_limit": 250000, "tokens_seen": 1536000, "train_batch_loss": 5.186356544494629}
|
| 84 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.385, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 0, "step": 1000, "steps": 1000, "token_limit": 250000, "tokens_seen": 2048000, "train_batch_loss": 4.927996635437012}
|
| 85 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 1, "step": 250, "steps": 1000, "token_limit": 500000, "tokens_seen": 2560000, "train_batch_loss": 4.755383014678955}
|
| 86 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 1, "step": 500, "steps": 1000, "token_limit": 500000, "tokens_seen": 3072000, "train_batch_loss": 4.4085564613342285}
|
| 87 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 1, "step": 750, "steps": 1000, "token_limit": 500000, "tokens_seen": 3584000, "train_batch_loss": 4.513108253479004}
|
| 88 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.319, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 1, "step": 1000, "steps": 1000, "token_limit": 500000, "tokens_seen": 4096000, "train_batch_loss": 4.416054725646973}
|
| 89 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 2, "step": 250, "steps": 1000, "token_limit": 1000000, "tokens_seen": 4608000, "train_batch_loss": 4.688089370727539}
|
| 90 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 2, "step": 500, "steps": 1000, "token_limit": 1000000, "tokens_seen": 5120000, "train_batch_loss": 4.242029190063477}
|
| 91 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 2, "step": 750, "steps": 1000, "token_limit": 1000000, "tokens_seen": 5632000, "train_batch_loss": 4.343423843383789}
|
| 92 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.227, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 2, "step": 1000, "steps": 1000, "token_limit": 1000000, "tokens_seen": 6144000, "train_batch_loss": 4.2396697998046875}
|
| 93 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 3, "step": 250, "steps": 1000, "token_limit": 2000000, "tokens_seen": 6656000, "train_batch_loss": 4.3881611824035645}
|
| 94 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 3, "step": 500, "steps": 1000, "token_limit": 2000000, "tokens_seen": 7168000, "train_batch_loss": 4.271755218505859}
|
| 95 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 3, "step": 750, "steps": 1000, "token_limit": 2000000, "tokens_seen": 7680000, "train_batch_loss": 4.025024890899658}
|
| 96 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.139, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 3, "step": 1000, "steps": 1000, "token_limit": 2000000, "tokens_seen": 8192000, "train_batch_loss": 4.11806583404541}
|
| 97 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 4, "step": 250, "steps": 1000, "token_limit": 4000000, "tokens_seen": 8704000, "train_batch_loss": 3.955113649368286}
|
| 98 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 4, "step": 500, "steps": 1000, "token_limit": 4000000, "tokens_seen": 9216000, "train_batch_loss": 4.345088005065918}
|
| 99 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 4, "step": 750, "steps": 1000, "token_limit": 4000000, "tokens_seen": 9728000, "train_batch_loss": 4.051096439361572}
|
| 100 |
+
{"condition": "openwebtext10k_interaction", "dropout": 0.066, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 5, "stage": 4, "step": 1000, "steps": 1000, "token_limit": 4000000, "tokens_seen": 10240000, "train_batch_loss": 4.270941734313965}
|
| 101 |
{"condition": "hold_30_then_decay", "dropout": 0.3, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 0, "step": 250, "steps": 1000, "token_limit": 250000, "tokens_seen": 512000, "train_batch_loss": 6.104031562805176}
|
| 102 |
{"condition": "hold_30_then_decay", "dropout": 0.3, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 0, "step": 500, "steps": 1000, "token_limit": 250000, "tokens_seen": 1024000, "train_batch_loss": 5.276653289794922}
|
| 103 |
{"condition": "hold_30_then_decay", "dropout": 0.3, "event": "train_step", "model_name": "L16_H8_D384", "run_mode": "locked_stream", "seed": 1, "stage": 0, "step": 750, "steps": 1000, "token_limit": 250000, "tokens_seen": 1536000, "train_batch_loss": 4.9929280281066895}
|
runs/{previous_local_streaming_report → openwebtext10k_streaming_report}/l16_multiseed_confirm/condition_summary.csv
RENAMED
|
File without changes
|
runs/{previous_local_streaming_report → openwebtext10k_streaming_report}/l16_multiseed_confirm/paired_final_deltas.csv
RENAMED
|
File without changes
|
runs/{previous_local_streaming_report → openwebtext10k_streaming_report}/l16_multiseed_confirm/stage_summary.csv
RENAMED
|
File without changes
|
runs/{previous_local_streaming_report → openwebtext10k_streaming_report}/l16_updated_formula_clean_5seed/condition_summary.csv
RENAMED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
condition,kind,n,mean_trajectory_val,std_trajectory_val,mean_final_val,std_final_val,mean_final_gap,std_final_gap,dropout_path
|
| 2 |
-
|
| 3 |
hold_30_then_decay,anchor_decay,5,4.851180048286915,0.0016753687570399134,4.405232906341553,0.011151070705538514,0.3564802721142769,0.01297330703929578,0.30 -> 0.30 -> 0.20 -> 0.10 -> 0.02
|
| 4 |
mild_30_to_08,anchor_decay,5,4.850860581099987,0.0014618995680224028,4.40728645324707,0.008502541215009067,0.3337064355611801,0.010359634321755684,0.30 -> 0.24 -> 0.18 -> 0.12 -> 0.08
|
| 5 |
fitted_l16_static_law,anchor_decay,5,4.952093484103679,0.0038574646544463683,4.412404176592827,0.00843791675235308,0.3137470245361328,0.007204760471400837,0.60 -> 0.40 -> 0.30 -> 0.14 -> 0.02
|
|
|
|
| 1 |
condition,kind,n,mean_trajectory_val,std_trajectory_val,mean_final_val,std_final_val,mean_final_gap,std_final_gap,dropout_path
|
| 2 |
+
openwebtext10k_interaction,anchor_decay,5,4.860862210392952,0.0046364279557658235,4.3981304407119755,0.009545784836147743,0.3176518455147743,0.007999498965173152,0.39 -> 0.32 -> 0.23 -> 0.14 -> 0.07
|
| 3 |
hold_30_then_decay,anchor_decay,5,4.851180048286915,0.0016753687570399134,4.405232906341553,0.011151070705538514,0.3564802721142769,0.01297330703929578,0.30 -> 0.30 -> 0.20 -> 0.10 -> 0.02
|
| 4 |
mild_30_to_08,anchor_decay,5,4.850860581099987,0.0014618995680224028,4.40728645324707,0.008502541215009067,0.3337064355611801,0.010359634321755684,0.30 -> 0.24 -> 0.18 -> 0.12 -> 0.08
|
| 5 |
fitted_l16_static_law,anchor_decay,5,4.952093484103679,0.0038574646544463683,4.412404176592827,0.00843791675235308,0.3137470245361328,0.007204760471400837,0.60 -> 0.40 -> 0.30 -> 0.14 -> 0.02
|
runs/{previous_local_streaming_report → openwebtext10k_streaming_report}/l16_updated_formula_clean_5seed/paired_final_deltas.csv
RENAMED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
seed,condition,final_val,best_static_condition,best_static_final_val,delta_vs_best_static
|
| 2 |
-
1,
|
| 3 |
1,hold_30_then_decay,4.393906190991402,static_dropout_0.14,4.4417688846588135,-0.047862693667411804
|
| 4 |
1,mild_30_to_08,4.399485997855663,static_dropout_0.14,4.4417688846588135,-0.04228288680315018
|
| 5 |
1,fitted_l16_static_law,4.420692957937717,static_dropout_0.14,4.4417688846588135,-0.02107592672109604
|
|
@@ -7,7 +7,7 @@ seed,condition,final_val,best_static_condition,best_static_final_val,delta_vs_be
|
|
| 7 |
1,static_dropout_0.3,4.460195399820805,static_dropout_0.14,4.4417688846588135,0.01842651516199112
|
| 8 |
1,static_dropout_0.02,4.5401855930686,static_dropout_0.14,4.4417688846588135,0.09841670840978622
|
| 9 |
1,static_dropout_0,4.570374272763729,static_dropout_0.14,4.4417688846588135,0.12860538810491562
|
| 10 |
-
2,
|
| 11 |
2,hold_30_then_decay,4.406779877841473,static_dropout_0.14,4.460222490131855,-0.053442612290382385
|
| 12 |
2,mild_30_to_08,4.4080275148153305,static_dropout_0.14,4.460222490131855,-0.052194975316524506
|
| 13 |
2,fitted_l16_static_law,4.41358345746994,static_dropout_0.14,4.460222490131855,-0.046639032661914825
|
|
@@ -15,7 +15,7 @@ seed,condition,final_val,best_static_condition,best_static_final_val,delta_vs_be
|
|
| 15 |
2,static_dropout_0.3,4.4719239845871925,static_dropout_0.14,4.460222490131855,0.011701494455337524
|
| 16 |
2,static_dropout_0.02,4.546629846096039,static_dropout_0.14,4.460222490131855,0.08640735596418381
|
| 17 |
2,static_dropout_0,4.609437867999077,static_dropout_0.14,4.460222490131855,0.14921537786722183
|
| 18 |
-
3,
|
| 19 |
3,hold_30_then_decay,4.417374566197395,static_dropout_0.14,4.43564984947443,-0.01827528327703476
|
| 20 |
3,mild_30_to_08,4.415062002837658,static_dropout_0.14,4.43564984947443,-0.020587846636772156
|
| 21 |
3,fitted_l16_static_law,4.413399815559387,static_dropout_0.14,4.43564984947443,-0.022250033915042877
|
|
@@ -23,7 +23,7 @@ seed,condition,final_val,best_static_condition,best_static_final_val,delta_vs_be
|
|
| 23 |
3,static_dropout_0.3,4.475773207843304,static_dropout_0.14,4.43564984947443,0.040123358368873596
|
| 24 |
3,static_dropout_0.02,4.534482300281525,static_dropout_0.14,4.43564984947443,0.09883245080709457
|
| 25 |
3,static_dropout_0,4.592755533754826,static_dropout_0.14,4.43564984947443,0.1571056842803955
|
| 26 |
-
4,
|
| 27 |
4,hold_30_then_decay,4.3936478942632675,static_dropout_0.14,4.433655060827732,-0.04000716656446457
|
| 28 |
4,mild_30_to_08,4.397788874804974,static_dropout_0.14,4.433655060827732,-0.035866186022758484
|
| 29 |
4,fitted_l16_static_law,4.398257076740265,static_dropout_0.14,4.433655060827732,-0.035397984087467194
|
|
@@ -31,7 +31,7 @@ seed,condition,final_val,best_static_condition,best_static_final_val,delta_vs_be
|
|
| 31 |
4,static_dropout_0.3,4.445499815046787,static_dropout_0.14,4.433655060827732,0.011844754219055176
|
| 32 |
4,static_dropout_0.02,4.52195218205452,static_dropout_0.14,4.433655060827732,0.08829712122678757
|
| 33 |
4,static_dropout_0,4.576848782598972,static_dropout_0.14,4.433655060827732,0.14319372177124023
|
| 34 |
-
5,
|
| 35 |
5,hold_30_then_decay,4.4144560024142265,static_dropout_0.14,4.455972045660019,-0.04151604324579239
|
| 36 |
5,mild_30_to_08,4.416067875921726,static_dropout_0.14,4.455972045660019,-0.039904169738292694
|
| 37 |
5,fitted_l16_static_law,4.4160875752568245,static_dropout_0.14,4.455972045660019,-0.03988447040319443
|
|
|
|
| 1 |
seed,condition,final_val,best_static_condition,best_static_final_val,delta_vs_best_static
|
| 2 |
+
1,openwebtext10k_interaction,4.402347795665264,static_dropout_0.14,4.4417688846588135,-0.03942108899354935
|
| 3 |
1,hold_30_then_decay,4.393906190991402,static_dropout_0.14,4.4417688846588135,-0.047862693667411804
|
| 4 |
1,mild_30_to_08,4.399485997855663,static_dropout_0.14,4.4417688846588135,-0.04228288680315018
|
| 5 |
1,fitted_l16_static_law,4.420692957937717,static_dropout_0.14,4.4417688846588135,-0.02107592672109604
|
|
|
|
| 7 |
1,static_dropout_0.3,4.460195399820805,static_dropout_0.14,4.4417688846588135,0.01842651516199112
|
| 8 |
1,static_dropout_0.02,4.5401855930686,static_dropout_0.14,4.4417688846588135,0.09841670840978622
|
| 9 |
1,static_dropout_0,4.570374272763729,static_dropout_0.14,4.4417688846588135,0.12860538810491562
|
| 10 |
+
2,openwebtext10k_interaction,4.401971310377121,static_dropout_0.14,4.460222490131855,-0.05825117975473404
|
| 11 |
2,hold_30_then_decay,4.406779877841473,static_dropout_0.14,4.460222490131855,-0.053442612290382385
|
| 12 |
2,mild_30_to_08,4.4080275148153305,static_dropout_0.14,4.460222490131855,-0.052194975316524506
|
| 13 |
2,fitted_l16_static_law,4.41358345746994,static_dropout_0.14,4.460222490131855,-0.046639032661914825
|
|
|
|
| 15 |
2,static_dropout_0.3,4.4719239845871925,static_dropout_0.14,4.460222490131855,0.011701494455337524
|
| 16 |
2,static_dropout_0.02,4.546629846096039,static_dropout_0.14,4.460222490131855,0.08640735596418381
|
| 17 |
2,static_dropout_0,4.609437867999077,static_dropout_0.14,4.460222490131855,0.14921537786722183
|
| 18 |
+
3,openwebtext10k_interaction,4.402896843850613,static_dropout_0.14,4.43564984947443,-0.032753005623817444
|
| 19 |
3,hold_30_then_decay,4.417374566197395,static_dropout_0.14,4.43564984947443,-0.01827528327703476
|
| 20 |
3,mild_30_to_08,4.415062002837658,static_dropout_0.14,4.43564984947443,-0.020587846636772156
|
| 21 |
3,fitted_l16_static_law,4.413399815559387,static_dropout_0.14,4.43564984947443,-0.022250033915042877
|
|
|
|
| 23 |
3,static_dropout_0.3,4.475773207843304,static_dropout_0.14,4.43564984947443,0.040123358368873596
|
| 24 |
3,static_dropout_0.02,4.534482300281525,static_dropout_0.14,4.43564984947443,0.09883245080709457
|
| 25 |
3,static_dropout_0,4.592755533754826,static_dropout_0.14,4.43564984947443,0.1571056842803955
|
| 26 |
+
4,openwebtext10k_interaction,4.381064593791962,static_dropout_0.14,4.433655060827732,-0.052590467035770416
|
| 27 |
4,hold_30_then_decay,4.3936478942632675,static_dropout_0.14,4.433655060827732,-0.04000716656446457
|
| 28 |
4,mild_30_to_08,4.397788874804974,static_dropout_0.14,4.433655060827732,-0.035866186022758484
|
| 29 |
4,fitted_l16_static_law,4.398257076740265,static_dropout_0.14,4.433655060827732,-0.035397984087467194
|
|
|
|
| 31 |
4,static_dropout_0.3,4.445499815046787,static_dropout_0.14,4.433655060827732,0.011844754219055176
|
| 32 |
4,static_dropout_0.02,4.52195218205452,static_dropout_0.14,4.433655060827732,0.08829712122678757
|
| 33 |
4,static_dropout_0,4.576848782598972,static_dropout_0.14,4.433655060827732,0.14319372177124023
|
| 34 |
+
5,openwebtext10k_interaction,4.402371659874916,static_dropout_0.14,4.455972045660019,-0.053600385785102844
|
| 35 |
5,hold_30_then_decay,4.4144560024142265,static_dropout_0.14,4.455972045660019,-0.04151604324579239
|
| 36 |
5,mild_30_to_08,4.416067875921726,static_dropout_0.14,4.455972045660019,-0.039904169738292694
|
| 37 |
5,fitted_l16_static_law,4.4160875752568245,static_dropout_0.14,4.455972045660019,-0.03988447040319443
|
runs/{previous_local_streaming_report → openwebtext10k_streaming_report}/l16_updated_formula_clean_5seed/stage_summary.csv
RENAMED
|
@@ -1,9 +1,9 @@
|
|
| 1 |
condition,stage,token_limit,dropout,n,mean_val,std_val,mean_train,std_train,mean_gap,std_gap
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
hold_30_then_decay,0,250000,0.3,5,5.4483301296830176,0.013828501308583057,4.442901518940926,0.027340510763309508,1.0054286107420922,0.02219730946529904
|
| 8 |
hold_30_then_decay,1,500000,0.3,5,5.066737350821495,0.017273545737457947,4.1383186161518095,0.03875357135925004,0.9284187346696854,0.04002925354224623
|
| 9 |
hold_30_then_decay,2,1000000,0.2,5,4.775730343163014,0.014352387307903692,4.037793649733066,0.02368477035230831,0.7379366934299469,0.01882967372675974
|
|
|
|
| 1 |
condition,stage,token_limit,dropout,n,mean_val,std_val,mean_train,std_train,mean_gap,std_gap
|
| 2 |
+
openwebtext10k_interaction,0,250000,0.385,5,5.4946602284908295,0.01093302726132647,4.6016244173049925,0.026939774812057612,0.8930358111858367,0.016010040404479016
|
| 3 |
+
openwebtext10k_interaction,1,500000,0.319,5,5.071460470557213,0.01179360076463939,4.20646400153637,0.03245807641301778,0.8649964690208435,0.028479060454710485
|
| 4 |
+
openwebtext10k_interaction,2,1000000,0.227,5,4.781069016456604,0.008428247627355752,4.08262689858675,0.023420093535722695,0.698442117869854,0.028148054160246055
|
| 5 |
+
openwebtext10k_interaction,3,2000000,0.139,5,4.558990895748138,0.014177173687439483,4.080180557072163,0.022080406390158354,0.4788103386759758,0.021926835907345392
|
| 6 |
+
openwebtext10k_interaction,4,4000000,0.066,5,4.3981304407119755,0.009545784836147743,4.080478595197201,0.009311692964638253,0.3176518455147743,0.007999498965173152
|
| 7 |
hold_30_then_decay,0,250000,0.3,5,5.4483301296830176,0.013828501308583057,4.442901518940926,0.027340510763309508,1.0054286107420922,0.02219730946529904
|
| 8 |
hold_30_then_decay,1,500000,0.3,5,5.066737350821495,0.017273545737457947,4.1383186161518095,0.03875357135925004,0.9284187346696854,0.04002925354224623
|
| 9 |
hold_30_then_decay,2,1000000,0.2,5,4.775730343163014,0.014352387307903692,4.037793649733066,0.02368477035230831,0.7379366934299469,0.01882967372675974
|
scripts/summarize_cross_regime_backtest.py
CHANGED
|
@@ -31,10 +31,10 @@ REPORT_PATH = Path("docs/cross_regime_backtest_report.md")
|
|
| 31 |
|
| 32 |
|
| 33 |
FIT_DIRS = {
|
| 34 |
-
"
|
| 35 |
-
"
|
| 36 |
-
"
|
| 37 |
-
"
|
| 38 |
"pooled_previous_plus_corpus_probes_base": ROOT
|
| 39 |
/ "pooled_previous_plus_corpus_probes_base",
|
| 40 |
"pooled_previous_plus_corpus_probes_interaction": ROOT
|
|
@@ -172,10 +172,10 @@ def write_transfer_csv(items: list[dict[str, float | str]]) -> None:
|
|
| 172 |
|
| 173 |
def main() -> None:
|
| 174 |
transfer_items = [
|
| 175 |
-
error_metrics("
|
| 176 |
-
error_metrics("tinystories_all_base", "
|
| 177 |
-
error_metrics("
|
| 178 |
-
error_metrics("tinystories_all_interaction", "
|
| 179 |
error_metrics(
|
| 180 |
"pooled_previous_plus_corpus_probes_interaction",
|
| 181 |
"tinystories_all_interaction",
|
|
@@ -200,8 +200,8 @@ def main() -> None:
|
|
| 200 |
"",
|
| 201 |
"| Regime/source | Run directories | Role |",
|
| 202 |
"|---|---|---|",
|
| 203 |
-
"|
|
| 204 |
-
"|
|
| 205 |
"| corpus probes | `runs/corpus_difficulty_pressure_{local,tinystories,wikitext103}` | diagnostic rows for corpus sensitivity |",
|
| 206 |
"| current TinyStories | `runs/regime_calibration_tinystories_*` | current coefficient evidence, 16 cells |",
|
| 207 |
"",
|
|
@@ -209,17 +209,17 @@ def main() -> None:
|
|
| 209 |
"",
|
| 210 |
"| Fit | Feature set | Cells | MAE | RMSE | Leave-model MAE | Leave-prefix MAE | Leave-source MAE | Coefficients |",
|
| 211 |
"|---|---|---:|---:|---:|---:|---:|---:|---|",
|
| 212 |
-
metrics_row("
|
| 213 |
-
metrics_row("
|
| 214 |
-
metrics_row("
|
| 215 |
-
metrics_row("
|
| 216 |
metrics_row("tinystories_all_base", "current TinyStories"),
|
| 217 |
metrics_row("tinystories_all_interaction", "current TinyStories"),
|
| 218 |
metrics_row("tinystories_all_quadratic", "current TinyStories"),
|
| 219 |
"",
|
| 220 |
"### Reading",
|
| 221 |
"",
|
| 222 |
-
"- The
|
| 223 |
" reduces MAE from `0.0389` to `0.0148` on the local+5M cells.",
|
| 224 |
"- The current TinyStories regime also supports the interaction form: MAE",
|
| 225 |
" drops from `0.0435` to `0.0180`.",
|
|
@@ -271,7 +271,7 @@ def main() -> None:
|
|
| 271 |
"## Decision",
|
| 272 |
"",
|
| 273 |
"The previous regime validation does not block the current plan. It strengthens",
|
| 274 |
-
"the formula-family claim because both the
|
| 275 |
"TinyStories regime prefer the interaction pressure law over first-order ABC.",
|
| 276 |
"",
|
| 277 |
"The validated claim remains:",
|
|
|
|
| 31 |
|
| 32 |
|
| 33 |
FIT_DIRS = {
|
| 34 |
+
"openwebtext10k_main_base": ROOT / "openwebtext10k_main_base",
|
| 35 |
+
"openwebtext10k_main_interaction": ROOT / "openwebtext10k_main_interaction",
|
| 36 |
+
"openwebtext10k_plus_5m_base": ROOT / "openwebtext10k_plus_5m_base",
|
| 37 |
+
"openwebtext10k_plus_5m_interaction": ROOT / "openwebtext10k_plus_5m_interaction",
|
| 38 |
"pooled_previous_plus_corpus_probes_base": ROOT
|
| 39 |
/ "pooled_previous_plus_corpus_probes_base",
|
| 40 |
"pooled_previous_plus_corpus_probes_interaction": ROOT
|
|
|
|
| 172 |
|
| 173 |
def main() -> None:
|
| 174 |
transfer_items = [
|
| 175 |
+
error_metrics("openwebtext10k_plus_5m_base", "tinystories_all_base"),
|
| 176 |
+
error_metrics("tinystories_all_base", "openwebtext10k_plus_5m_base"),
|
| 177 |
+
error_metrics("openwebtext10k_plus_5m_interaction", "tinystories_all_interaction"),
|
| 178 |
+
error_metrics("tinystories_all_interaction", "openwebtext10k_plus_5m_interaction"),
|
| 179 |
error_metrics(
|
| 180 |
"pooled_previous_plus_corpus_probes_interaction",
|
| 181 |
"tinystories_all_interaction",
|
|
|
|
| 200 |
"",
|
| 201 |
"| Regime/source | Run directories | Role |",
|
| 202 |
"|---|---|---|",
|
| 203 |
+
"| OpenWebText10K main | `runs/screen_static/20260525-133008` | OpenWebText10K static screen, 15 cells |",
|
| 204 |
+
"| OpenWebText10K + 5M | OpenWebText10K main + `runs/screen_static/20260525-122824` | OpenWebText10K regime plus low-pressure 5M extension, 18 cells |",
|
| 205 |
"| corpus probes | `runs/corpus_difficulty_pressure_{local,tinystories,wikitext103}` | diagnostic rows for corpus sensitivity |",
|
| 206 |
"| current TinyStories | `runs/regime_calibration_tinystories_*` | current coefficient evidence, 16 cells |",
|
| 207 |
"",
|
|
|
|
| 209 |
"",
|
| 210 |
"| Fit | Feature set | Cells | MAE | RMSE | Leave-model MAE | Leave-prefix MAE | Leave-source MAE | Coefficients |",
|
| 211 |
"|---|---|---:|---:|---:|---:|---:|---:|---|",
|
| 212 |
+
metrics_row("openwebtext10k_main_base", "OpenWebText10K main"),
|
| 213 |
+
metrics_row("openwebtext10k_main_interaction", "OpenWebText10K main"),
|
| 214 |
+
metrics_row("openwebtext10k_plus_5m_base", "OpenWebText10K + 5M"),
|
| 215 |
+
metrics_row("openwebtext10k_plus_5m_interaction", "OpenWebText10K + 5M"),
|
| 216 |
metrics_row("tinystories_all_base", "current TinyStories"),
|
| 217 |
metrics_row("tinystories_all_interaction", "current TinyStories"),
|
| 218 |
metrics_row("tinystories_all_quadratic", "current TinyStories"),
|
| 219 |
"",
|
| 220 |
"### Reading",
|
| 221 |
"",
|
| 222 |
+
"- The OpenWebText10K regime supports the interaction form: adding `D*x*y`",
|
| 223 |
" reduces MAE from `0.0389` to `0.0148` on the local+5M cells.",
|
| 224 |
"- The current TinyStories regime also supports the interaction form: MAE",
|
| 225 |
" drops from `0.0435` to `0.0180`.",
|
|
|
|
| 271 |
"## Decision",
|
| 272 |
"",
|
| 273 |
"The previous regime validation does not block the current plan. It strengthens",
|
| 274 |
+
"the formula-family claim because both the OpenWebText10K regime and the current",
|
| 275 |
"TinyStories regime prefer the interaction pressure law over first-order ABC.",
|
| 276 |
"",
|
| 277 |
"The validated claim remains:",
|