Baladithya Balamurugan
Wave 2: 4 new modules (kill-switch, EKS/SageMaker executors, DockerSandbox) + B4/B7 completion
7a55e1e
Raw
History Blame Contribute Delete
8.87 kB
{
"area": "composer_replication/safety/kill_switch.py + test_kill_switch.py (Wave-2 C1)",
"verdict": "material-issues",
"findings": [
{
"severity": "high",
"what": "C1 was scoped as 'Held-out disjoint eval + depth/generation kill-switch' but ONLY the kill-switch half (HeldOutGuard) was built. The HeldoutSplit disjointness-enforcer does not exist anywhere in the tree (no composer_replication/safety/holdout.py, no HeldoutSplit class). The guard's heldout_score is an unvalidated caller-supplied float; nothing enforces that the held-out pool is actually disjoint from the train/generator set. The module's own docstring (kill_switch.py:41-43, 214-216) states this is load-bearing: 'if held-out drifts with the train set the gap signal is meaningless.' So the kill-switch's central proxy-real-gap and decline-streak signals can be silently meaningless with no guard rail.",
"where": "composer_replication/safety/ (missing holdout.py / HeldoutSplit); referenced at kill_switch.py:43, kill_switch.py:214-216",
"recommendation": "Build the HeldoutSplit disjointness enforcer (hash/id-based set-difference check that the held-out eval IDs never intersect the generator/train IDs, raising on overlap) as the second half of C1, OR explicitly re-scope C1 to two items and track the disjointness enforcer as a distinct OPEN backlog item. Do not mark C1 done with only the guard built."
},
{
"severity": "high",
"what": "HeldOutGuard is NOT wired into the trainer. Zero references to HeldOutGuard / kill_switch / CollapseStopError / should_halt / raise_if_fired in composer_replication/trainer/composer_trainer.py (or anywhere outside the safety package + its own test). The 'most load-bearing collapse safeguard (#2)' for the self-evolving flywheel exists as dead, never-invoked code. The trainer's GRPO loop never calls update() per checkpoint, so the run-level tripwire cannot fire in production.",
"where": "composer_replication/trainer/composer_trainer.py (no integration); HeldOutGuard defined composer_replication/safety/kill_switch.py:117",
"recommendation": "Wire HeldOutGuard.update(round_idx, in_loop_reward, heldout_score, kl_to_init=token_mean_kl(...)) into the trainer loop at the same checkpoint cadence DifficultyCurriculum.update is called (curriculum.py:78), and convert a fired verdict to a halt via raise_if_fired / should_halt. token_mean_kl already exists (kl_logging.py:53) to supply the per-token KL. Until wired, C1's safety claim is unrealized."
},
{
"severity": "low",
"what": "calibrate_kl_threshold does not re-validate the > 0 invariant that __post_init__ enforces. A negative factor (or negative baseline_kls) yields min(negative, 0.08) = a NEGATIVE kl_hard_stop, after which the KL tripwire fires on EVERY healthy step (any positive KL EMA > negative ceiling). Verified empirically: factor=-3.0 on baseline [0.01] sets kl_hard_stop=-0.03 and a healthy KL of 0.01 then fires. The min() 'tighten-only' clamp is satisfied in the literal numeric sense but violates the documented collapse-band semantics.",
"where": "composer_replication/safety/kill_switch.py:412-418 (calibrate_kl_threshold)",
"recommendation": "Validate factor > 0 and all(k >= 0 for k in baseline_kls) at the top of calibrate_kl_threshold, and/or clamp the result to a small positive floor (e.g. assert calibrated > 0). KL values are non-negative by definition so a negative factor is nonsensical input, but the invariant should be guarded since the method mutates a field __post_init__ otherwise protects."
},
{
"severity": "low",
"what": "Dangling cross-references in docstrings to artifacts that do not exist: safety/__init__.py:17-18 cites 'docs/adrs/ADR-015-*.md' (highest existing ADR is ADR-014; no ADR-015 file) and a \"'holdout-killswitch' research digest\" (no such file under research/). kill_switch.py:43,214 cite composer_replication.safety.holdout / HeldoutSplit 'design notes' that do not exist (same missing module as the high finding).",
"where": "composer_replication/safety/__init__.py:17-18; composer_replication/safety/kill_switch.py:43, 214-216",
"recommendation": "Either author ADR-015 documenting the kill-switch design decision (the module is substantial enough to warrant one and the docstring already promises it), or drop the dangling citations. Keep doc references honest to avoid the stale-cross-ref foot-guns the backlog (B5/B6/B8) is already cleaning up."
},
{
"severity": "low",
"what": "Gap-blowout path (c) fires when the proxy gain exceeds real gain by max_proxy_real_gap EVEN WHEN the held-out (real) score is still genuinely RISING. Verified: with both rising but proxy faster, it halts the run while real improvement is ongoing. This is defensible per the docstring ('fast single-generation divergence', lines 144-145), and the reason string is accurate, but it is a potential false-positive halt on a healthy-but-fast-proxy run and is not covered by a test asserting the desired behavior in the both-rising case (only the proxy-flat-real case is tested at test_kill_switch.py:143).",
"where": "composer_replication/safety/kill_switch.py:326-335 (path c); test gap test_kill_switch.py:143-158 only exercises real-flat",
"recommendation": "Add a test pinning the intended behavior when BOTH rise but proxy outpaces real beyond the ceiling (assert whether it should fire), and document in the docstring that path (c) is a divergence-RATE gate, not a real-decline gate, so future readers do not mistake a fired path-(c) for confirmed real regression."
}
],
"confirmed_good": [
"All 23 tests in composer_replication/safety pass (.venv pytest, 23 passed).",
"Latched-fire is correct and cannot un-halt: _fired flips True in update() (line 277-278) and _evaluate() short-circuits with a 'latched:' verdict carrying the original reason before any threshold re-check (lines 294-296). Verified a full KL/gap recovery after fire stays fire=True.",
"Three halt conditions are individually correct: (b) KL EMA > kl_hard_stop checked first; (a) held-out-declines-while-in-loop-rises only increments the streak when BOTH conditions hold (a both-declining 'hard batch' correctly does NOT count, verified), fires at decline_patience; (c) proxy-real gap > ceiling. min_steps warm-up gate uses the internal _n counter (robust to non-contiguous round_idx, tested).",
"EMA denoising is sound: _fold seeds on first sample (no warm-up bias), alpha is weight-on-prior validated to [0,1); first-sample baseline seeding makes proxy_real_gap a gain-since-baseline quantity exactly matching the RSI Hacking-Gap definition. proxy_real_gap math verified (0.15 expected case) and returns 0.0 before first update.",
"CollapseStopError raise path: raise_if_fired raises the typed exception carrying .status only when fired, is a no-op when clean, and is a safe no-op before any update (last_status None). Strict > boundary on gap/KL confirmed (gap==ceiling does not fire).",
"calibrate tighten-only works for the intended (positive) inputs: min(3x baseline, current) so a drifting baseline cannot loosen past 0.08 (tested), and only tightens for a clean low baseline.",
"kl_token_trust_filter boundary correct (strict >, so threshold value itself is not masked).",
"Docstring cross-refs that DO resolve: DifficultyCurriculum.update (curriculum.py:78) and token_mean_kl (kl_logging.py:53) both exist, so the claimed cadence and KL-units convention are anchored to real code.",
"No false claim anywhere in examples/ or docs that the kill-switch is already wired/used (grep clean)."
],
"new_backlog_items": [
"Build composer_replication/safety/holdout.py with a HeldoutSplit disjointness enforcer (id/hash set-difference, raises on train/held-out overlap) — the un-built second half of C1 that the kill-switch's gap/decline signals depend on for validity.",
"Wire HeldOutGuard into composer_replication/trainer/composer_trainer.py at the per-checkpoint cadence (alongside DifficultyCurriculum.update), feeding token_mean_kl as kl_to_init and converting a fired verdict to a halt via raise_if_fired/should_halt — the C1 safeguard is currently dead code.",
"Guard calibrate_kl_threshold against factor<=0 / negative baseline_kls (or clamp result to a positive floor) so calibration cannot drive kl_hard_stop negative and make the KL tripwire fire on every healthy step.",
"Author docs/adrs/ADR-015 for the held-out kill-switch (referenced by safety/__init__.py:17 but nonexistent) or remove the dangling ADR-015 + 'holdout-killswitch research digest' citations.",
"Add a test pinning path-(c) gap-blowout behavior in the BOTH-rising case (proxy outpaces a still-rising real) to lock the intended false-positive/true-positive decision."
]
}