Baladithya Balamurugan
docs: commit the Wave-2/3 review + Phase-8 verification audit trail (JSON findings)
b1adfc9
Raw
History Blame Contribute Delete
7.86 kB
{
"scope": "Phase-8 FINAL-VERIFICATION (read-only) of backlog items B1-B8 + D1 (Wave 1) and C1-C3 + E3 (Wave 2) on branch backlog/goal-resolution-2026-06 (commits c11cf49, 7a55e1e, bd0c358, 7d9dbbc, ace4fac on main 4e6e82e). Independently verified against actual code/tests/git, not the status column.",
"overall_verdict": "minor-open",
"item_verdicts": [
{
"id": "B1",
"status": "resolved",
"evidence": "Fixture is committed + tracked at spikes/007-real-trace-ingestion/fixtures/synthetic_session_with_error.jsonl (git ls-files hits it; git check-ignore reports NOT IGNORED). The adapter test resolves ERROR_FIXTURE from that spikes/007 path (test_trace_examples_adapter.py:26-27). `.venv/bin/python -m pytest composer_replication/ingestion/tests/test_trace_examples_adapter.py -q` => 19 passed in 12.24s (gate of 8 met; the file grew to 19 tests since Wave 19). Fixture added in commit c11cf49."
},
{
"id": "B2",
"status": "resolved",
"evidence": "`uv pip install --python .venv/bin/python -e '.[dev]' --dry-run` => 'Resolved 62 packages', exit 0 on arm64 host (uname -m=arm64). dev extra now = [pytest>=8.0, ruff>=0.6, composer-replication[replay,train]] with NO torchft/torchft-nightly. torchft-nightly is isolated in a separate dev-full extra (pyproject.toml:141-142, Linux-x86_64 only) with explanatory comments at lines 131-135."
},
{
"id": "B3",
"status": "resolved",
"evidence": "pyproject.toml serverless extra = ['fsspec>=2024.6','huggingface_hub>=0.27','s3fs>=2024.6','boto3>=1.34','kubernetes>=29.0'] (verified via tomllib load). All three required deps (s3fs, boto3, kubernetes) present at lines 68-70."
},
{
"id": "B4",
"status": "partial",
"evidence": "Canonical reconciliation DONE: README.md:209 and docs/V1_V8_COVERAGE.md state 266 passing / 62 skipped / 328 collected (measured 2026-06-09); 115/176/210/232 appear only in explicitly-labeled historical sequences. Within the EXACT named grep scope (README/OVERVIEW/VISION/PROJECT_STATE/BACKLOG/TROUBLESHOOTING) all hits are dated-historical or are the backlog item text itself; VISION.md/PROJECT_STATE.md don't exist; VISION_VALIDATION.md is clean. RESIDUAL (outside named scope but in living docs): docs/USER_GUIDE.md:678 and docs/INTEGRATION_RECIPES.md:926 still use present-tense 'The framework's 115-test suite (post-Wave-15)' framing — current-ish-framed though wave-qualified. Also the canonical '328 collected' is itself now stale: actual `pytest --co` collects 445 tests (suite grew with Wave 2/3 + spikes), but README dates the figure. Reconciliation deliverable met; two minor present-tense 115 mentions linger."
},
{
"id": "B5",
"status": "resolved",
"evidence": "Zero /mnt/e in the three cited files (docs/API_REFERENCE.md, docs/USER_GUIDE.md, docs/INTEGRATION_RECIPES.md) and zero /mnt/e across README.md + docs/ excluding research/. WSL absolute-path footers removed."
},
{
"id": "B6",
"status": "resolved",
"evidence": "examples/gsm8k_grpo_with_sdpo/README.md:66 now links to docs/adrs/ADR-008-drgrpo-sdpo-live-channel.md (correct target). Remaining ADR-002-channel2 mentions are only in describing/historical files (docs/_refine-2026-06-SUMMARY.md documenting the fix, and the BACKLOG item text) — satisfies the 'non-describing files' scope."
},
{
"id": "B7",
"status": "resolved",
"evidence": "All three importable from top-level: `from composer_replication import make_dr_grpo_config, make_po_config, PO_OBJECTIVES` => IMPORT OK (PO_OBJECTIVES keys = grpo,dr_grpo,bnpo,dapo,gspo,cispo). All three in composer_replication.__all__ (verified True/True/True). Documented in docs/API_REFERENCE.md: make_dr_grpo_config (line 932), make_po_config (line 955), PO_OBJECTIVES (line 971) with usage examples."
},
{
"id": "B8",
"status": "resolved",
"evidence": "docs/_refine-2026-06-SUMMARY.md header (lines 4-8) corrected: now states 'MERGED into main as of 4e6e82e ... 6 commits total in range fb13ea3..4e6e82e, not the 3 this summary originally listed. (This header was updated 2026-06-09 to reflect the merged reality.)'. The OVERVIEW->TROUBLESHOOTING cross-refs (docs/OVERVIEW.md:72,99) resolve to existing docs/TROUBLESHOOTING.md. README has no dangling TROUBLESHOOTING ref. Fix committed in c11cf49."
},
{
"id": "D1",
"status": "resolved",
"evidence": "Docker available (docker info ok). `.venv/bin/python -m pytest composer_replication/datagen/tests/test_docker_substrate_e2e.py -q` => 2 passed in 16.59s GREEN on a real python:3.11-slim container. Previously skipif-gated on docker info."
},
{
"id": "C1",
"status": "resolved",
"evidence": "Module composer_replication/safety/ exists with kill_switch.py + holdout.py. Imports OK: HeldOutGuard, HeldoutSplit, CollapseStopError, HeldoutOverlapError, TripwireStatus. Tests: `.venv/bin/python -m pytest composer_replication/safety/tests/` => 37 passed in 12.38s (test_kill_switch.py + test_holdout.py). Built in commit 7a55e1e, HeldoutSplit + wiring in bd0c358."
},
{
"id": "C2",
"status": "resolved",
"evidence": "EKSExecutor lives at composer_replication/diloco/serverless/eks.py (NOT the composer_replication.serverless path the status text implied). `from composer_replication.diloco.serverless.eks import EKSExecutor` => OK. Tests: composer_replication/diloco/serverless/tests/test_eks_executor.py => 27 passed, 1 skipped in 15.13s. Built in 7a55e1e, entrypoint bug fixed in bd0c358, R5/R6 in 7d9dbbc."
},
{
"id": "C3",
"status": "resolved",
"evidence": "composer_replication/datagen/docker_sandbox.py exists; imports DockerSandbox + scrub_tree OK. Tests pass on isolation and on clean full-module re-run: `pytest composer_replication/datagen/tests/test_docker_sandbox.py` => 27 passed, 1 skipped. NOTE: an initial full-module run showed 2 transient failures (test_live_four_inversion_gates_in_hardened_container, test_live_network_is_disabled) due to concurrent live-Docker container resource contention; both PASS in isolation (2 passed in 13.13s) and the full module passes on re-run (27 passed) — flaky-under-Docker-contention, not a code defect (same pattern as backlog R11). Built in 7a55e1e."
},
{
"id": "E3",
"status": "resolved",
"evidence": "SageMakerExecutor at composer_replication/diloco/serverless/sagemaker.py; `from composer_replication.diloco.serverless.sagemaker import SageMakerExecutor` => OK. Tests: composer_replication/diloco/serverless/tests/test_sagemaker_executor.py => 14 passed, 1 skipped in 13.98s. Built in 7a55e1e, cancel-narrowing (R5) in 7d9dbbc."
}
],
"remaining_open": [
"B4 (minor): docs/USER_GUIDE.md:678 and docs/INTEGRATION_RECIPES.md:926 retain present-tense 'The framework's 115-test suite (post-Wave-15)' framing (outside the named grep scope but still in living docs). The canonical README '328 collected' figure is itself now stale vs the actual 445 tests collected today, though it is explicitly dated 2026-06-09.",
"C3 (caveat, not open): 2 live-Docker tests are flaky under concurrent container resource contention (pass in isolation + on clean re-run); consider marking/serializing them, but no code defect.",
"Note (out of B1-E3 scope): the Wave-2 'top-level re-export' status claim does NOT hold at package root — EKSExecutor/SageMakerExecutor/DockerSandbox/HeldOutGuard are NOT in composer_replication.__all__ nor attributes of the package root; they import only from their submodule paths. This is an R7-area discrepancy, not part of the C1-C3/E3 'modules exist/import/tested' gate, which is satisfied."
],
"confirmed_resolved": ["B1", "B2", "B3", "B5", "B6", "B7", "B8", "D1", "C1", "C2", "C3", "E3"]
}