Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Baladithya Balamurugan
Wave 3: close the HIGH review findings (kill-switch wiring, HeldoutSplit, EKS entrypoint bug)
bd0c358 | """composer_replication.safety — run-level collapse safeguards. | |
| The #2 collapse safeguard for the self-evolving RL flywheel: a held-out disjoint | |
| eval + a depth/generation kill-switch. The per-task controls live in | |
| ``composer_replication.datagen`` (4-gate validator, ``HackMonitor`` provenance, | |
| sandbox denylist); this package adds the missing ACROSS-GENERATION / run-level | |
| control that watches in-loop (proxy) reward against a disjoint held-out (real) | |
| eval and HALTS the run when collapse / reward-hacking is caught in the act. | |
| Public surface: | |
| - HeldOutGuard — the stateful kill-switch (kill_switch.py) | |
| - TripwireStatus — the structured per-update verdict (.fire / .halt / .reason / | |
| .proxy_real_gap) | |
| - CollapseStopError — typed exception for exception-based trainer control flow | |
| - kl_token_trust_filter — per-token KL trust-region mask (torchrl KL-Mask analog) | |
| - HeldoutSplit / HeldoutOverlapError — the train/held-out set-disjointness | |
| enforcer (holdout.py) that keeps the guard's proxy-real gap | |
| signal meaningful (a held-out set that drifts into the train | |
| set makes the gap meaningless). | |
| Pure-Python, no torch / cloud deps. See docs/adrs/ADR-015-holdout-killswitch.md. | |
| """ | |
| from __future__ import annotations | |
| from composer_replication.safety.holdout import ( | |
| HeldoutOverlapError, | |
| HeldoutSplit, | |
| ) | |
| from composer_replication.safety.kill_switch import ( | |
| CollapseStopError, | |
| HeldOutGuard, | |
| TripwireStatus, | |
| kl_token_trust_filter, | |
| ) | |
| __all__ = [ | |
| "HeldOutGuard", | |
| "TripwireStatus", | |
| "CollapseStopError", | |
| "kl_token_trust_filter", | |
| "HeldoutSplit", | |
| "HeldoutOverlapError", | |
| ] | |