{ "area": "Wave-1+2 broad sweep for NEW gaps (imports/laziness, unfinished-work markers, doc-debt, ADR-015, optional-dep eager-load)", "verdict": "minor-issues", "findings": [ { "severity": "medium", "what": "Doc-debt: the 4 NEW Wave-2 public symbols are entirely undocumented in docs/API_REFERENCE.md. grep for EKSExecutor / SageMakerExecutor / DockerSandbox / HeldOutGuard / TripwireStatus / CollapseStopError / kl_token_trust_filter all return 0 hits. API_REFERENCE §12 (serverless) header (line 23) lists `.modal`, `.hf_jobs` but not `.eks` / `.sagemaker`, and documents the loud-failing ModalExecutor/HFJobsExecutor stubs while omitting the two NEW *production* executors. There is no `safety` section at all, and no `datagen` section (DockerSandbox + its LocalSubprocessSandbox/FakeSandbox siblings are all undocumented). All four are real, exported public API (in their package __all__) and Protocol-conformant (isinstance(eks, ServerlessExecutor) == True).", "where": "docs/API_REFERENCE.md (§12 line 1153-1376; header line 23); new public symbols in composer_replication/diloco/serverless/{eks,sagemaker}.py, composer_replication/datagen/docker_sandbox.py, composer_replication/safety/kill_switch.py", "recommendation": "Add API_REFERENCE entries: under §12 add `class EKSExecutor` and `class SageMakerExecutor` (and update the §12 line-23 module list to include `.eks`, `.sagemaker`); add a `composer_replication.safety` section documenting HeldOutGuard / TripwireStatus / CollapseStopError / kl_token_trust_filter; and a `composer_replication.datagen` section documenting DockerSandbox (alongside the existing-but-also-undocumented LocalSubprocessSandbox/FakeSandbox)." }, { "severity": "low", "what": "Dangling ADR reference: composer_replication/safety/__init__.py:17 says 'See docs/adrs/ADR-015-*.md' but no ADR-015 file exists (docs/adrs/ stops at ADR-014). The research plan called for ADR-015 to document the safety/kill-switch design decision; the module docstring already cites the literature (Zhao et al. RSI, EvilGenie, Gao self-evolving survey, Shumailov collapse, Catastrophic Goodhart, GRPO KL band) so the design rationale exists in-code but is not captured as an ADR, and the __init__ points readers to a file that isn't there.", "where": "composer_replication/safety/__init__.py:17 (the dangling 'docs/adrs/ADR-015-*.md' pointer); docs/adrs/ (ADR-015 absent)", "recommendation": "Either author docs/adrs/ADR-015-holdout-killswitch.md (the kill_switch.py module docstring is effectively the ADR draft already — proxy_real_gap Hacking-Gap, KL 0.08 nats/token hard stop, decline-patience collapse signature, defense-in-depth-over-HackMonitor) and index it in docs/adrs/README.md, OR remove the forward reference from safety/__init__.py until the ADR lands." }, { "severity": "low", "what": "Test-count drift re-introduced by Wave 2. docs/V1_V8_COVERAGE.md:117 still states the canonical count as '266 passed / 62 skipped / 328 collected (measured 2026-06-09)' — that was the Wave-1 figure. Wave 2 added 93 tests across 4 new files (test_kill_switch 23, test_eks_executor 28, test_sagemaker_executor 14, test_docker_sandbox 28); the tree now collects 420 tests (328 -> 420, +92 net). B4 closed test-drift in Wave 1 but the doc is stale again post-Wave-2.", "where": "docs/V1_V8_COVERAGE.md:117-134 (canonical count claim) vs actual `pytest --collect-only` = 420 collected", "recommendation": "Re-run `.venv/bin/python -m pytest` to get the post-Wave-2 passed/skipped split and update the single canonical figure in V1_V8_COVERAGE.md (the doc explicitly says this line is 'the one canonical figure' that other docs reference)." } ], "confirmed_good": [ "Required import smoke test passes: `import composer_replication; from composer_replication.diloco.serverless import EKSExecutor, SageMakerExecutor; from composer_replication.datagen import DockerSandbox; from composer_replication.safety import HeldOutGuard` -> exit 0, 'ALL IMPORTS OK'.", "Optional-dep laziness (question 5) is CORRECT for all 4 new modules: no top-level `import kubernetes/boto3/docker` in eks.py / sagemaker.py / docker_sandbox.py / kill_switch.py (grep for eager imports returns empty). Blocking kubernetes+docker at import time and importing the new modules in isolation succeeds. EKSExecutor lazy-imports `kubernetes` only when no api injected / per-method; SageMakerExecutor lazy-imports boto3 in _make_boto3_client (construction-time, not import-time); DockerSandbox lazy-imports docker via _require_docker() inside methods.", "NOTE on the whole-package blocked-import failure: blocking boto3 breaks `import composer_replication`, but the cause is PRE-EXISTING and NOT a Wave-2 regression — composer_replication/__init__.py:98 imports the trainer, which imports `trl.GRPOTrainer` -> accelerate.commands.config.sagemaker -> `import boto3`. boto3 is already a hard transitive dependency of the base trainer stack on main; Wave 2 did not introduce it.", "No NEW unfinished-work markers (question 2): all NotImplementedError/TODO/FIXME/STUB hits in composer_replication/ are PRE-EXISTING and intentional (prime_rl/composer_loss.py deferred SDPO channel-2, recipes/monarch/actors.py v0 skeleton per ADR-006, diloco/serverless/{modal,hf_jobs,modal_spawn}.py documented loud-failing stubs). The 4 new modules contain ZERO NotImplementedError/TODO/FIXME/STUB — they are finished, not skeletons. SageMakerExecutor's docstring explicitly contrasts itself as 'fully-working, not the loud-failing modal.py/hf_jobs.py skeletons'.", "Both new executors satisfy the runtime_checkable ServerlessExecutor Protocol (isinstance checks pass), expose correct backend_name ('eks'/'sagemaker') and supports_inter_replica_network=False (S3-only rendezvous).", "All 90 collectable Wave-2 tests pass (3 skipped, the live-docker-daemon gated ones) via `pytest composer_replication/safety/tests composer_replication/diloco/serverless/tests/test_{eks,sagemaker}_executor.py composer_replication/datagen/tests/test_docker_sandbox.py`. Whole suite still collects cleanly (420 tests, no collection errors).", "DockerSandbox.run_tests pytest-pass heuristic (`f\"{t} PASSED\" in out or (returncode==0 and not failed)`) is a faithful copy of the established LocalSubprocessSandbox.run_tests (sandbox.py:214) — not a new bug, consistent with the documented sibling behavior.", "safety/ not being in the top-level composer_replication.__all__ is consistent with existing structure (datagen/diloco subpackages aren't fully surfaced at top level either); `composer_replication.safety` imports correctly as a subpackage." ], "new_backlog_items": [ "DOC: Document the 4 NEW Wave-2 public symbols in docs/API_REFERENCE.md — add EKSExecutor + SageMakerExecutor under §12 (and add .eks/.sagemaker to the §12 module list at line 23), add a new `composer_replication.safety` section (HeldOutGuard, TripwireStatus, CollapseStopError, kl_token_trust_filter), and a `composer_replication.datagen` section covering DockerSandbox (+ the also-undocumented LocalSubprocessSandbox/FakeSandbox).", "ADR: Author docs/adrs/ADR-015-holdout-killswitch.md (the safety kill-switch / held-out-guard design) — currently referenced by composer_replication/safety/__init__.py:17 as 'docs/adrs/ADR-015-*.md' but the file does not exist; index it in docs/adrs/README.md. The kill_switch.py module docstring is the ready-made draft.", "DOC: Refresh the canonical test count in docs/V1_V8_COVERAGE.md:117 — Wave 2 added 93 tests (collection 328 -> 420); the stated '266 passed / 62 skipped / 328 collected' is the Wave-1 figure and is now stale." ] }