File size: 1,720 Bytes
7a55e1e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bd0c358
 
 
 
7a55e1e
bd0c358
7a55e1e
 
 
bd0c358
 
 
 
7a55e1e
 
 
 
 
 
 
 
 
 
 
 
bd0c358
 
7a55e1e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
"""composer_replication.safety — run-level collapse safeguards.

The #2 collapse safeguard for the self-evolving RL flywheel: a held-out disjoint
eval + a depth/generation kill-switch. The per-task controls live in
``composer_replication.datagen`` (4-gate validator, ``HackMonitor`` provenance,
sandbox denylist); this package adds the missing ACROSS-GENERATION / run-level
control that watches in-loop (proxy) reward against a disjoint held-out (real)
eval and HALTS the run when collapse / reward-hacking is caught in the act.

Public surface:
  - HeldOutGuard   — the stateful kill-switch (kill_switch.py)
  - TripwireStatus — the structured per-update verdict (.fire / .halt / .reason /
                     .proxy_real_gap)
  - CollapseStopError   — typed exception for exception-based trainer control flow
  - kl_token_trust_filter — per-token KL trust-region mask (torchrl KL-Mask analog)
  - HeldoutSplit / HeldoutOverlapError — the train/held-out set-disjointness
                     enforcer (holdout.py) that keeps the guard's proxy-real gap
                     signal meaningful (a held-out set that drifts into the train
                     set makes the gap meaningless).

Pure-Python, no torch / cloud deps. See docs/adrs/ADR-015-holdout-killswitch.md.
"""
from __future__ import annotations

from composer_replication.safety.holdout import (
    HeldoutOverlapError,
    HeldoutSplit,
)
from composer_replication.safety.kill_switch import (
    CollapseStopError,
    HeldOutGuard,
    TripwireStatus,
    kl_token_trust_filter,
)

__all__ = [
    "HeldOutGuard",
    "TripwireStatus",
    "CollapseStopError",
    "kl_token_trust_filter",
    "HeldoutSplit",
    "HeldoutOverlapError",
]