"""composer_replication.safety — run-level collapse safeguards. The #2 collapse safeguard for the self-evolving RL flywheel: a held-out disjoint eval + a depth/generation kill-switch. The per-task controls live in ``composer_replication.datagen`` (4-gate validator, ``HackMonitor`` provenance, sandbox denylist); this package adds the missing ACROSS-GENERATION / run-level control that watches in-loop (proxy) reward against a disjoint held-out (real) eval and HALTS the run when collapse / reward-hacking is caught in the act. Public surface: - HeldOutGuard — the stateful kill-switch (kill_switch.py) - TripwireStatus — the structured per-update verdict (.fire / .halt / .reason / .proxy_real_gap) - CollapseStopError — typed exception for exception-based trainer control flow - kl_token_trust_filter — per-token KL trust-region mask (torchrl KL-Mask analog) - HeldoutSplit / HeldoutOverlapError — the train/held-out set-disjointness enforcer (holdout.py) that keeps the guard's proxy-real gap signal meaningful (a held-out set that drifts into the train set makes the gap meaningless). Pure-Python, no torch / cloud deps. See docs/adrs/ADR-015-holdout-killswitch.md. """ from __future__ import annotations from composer_replication.safety.holdout import ( HeldoutOverlapError, HeldoutSplit, ) from composer_replication.safety.kill_switch import ( CollapseStopError, HeldOutGuard, TripwireStatus, kl_token_trust_filter, ) __all__ = [ "HeldOutGuard", "TripwireStatus", "CollapseStopError", "kl_token_trust_filter", "HeldoutSplit", "HeldoutOverlapError", ]