Baladithya Balamurugan Claude Opus 4.8 (1M context) commited on
Commit
c59d939
·
1 Parent(s): ace4fac

Phase 8: B4-final (last 2 living-doc stale counts) + final verification disposition

Browse files

- B4-final: USER_GUIDE.md:678 + INTEGRATION_RECIPES.md:926 '115-test suite' → 266/62
(the last current-framed stale counts the independent verifier found outside the
earlier grep scope; dated-historical + _archive mentions correctly left).
- Recorded the Phase-8 disposition: full suite 381 passed/65 skipped/0 failed;
independent verifier confirms B1-E3 resolved; submodule-export design note;
final backlog disposition (zero actionable-on-host items open; remainder is
user-gated GPU/token or tracked pre-existing lint debt).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

docs/BACKLOG_RESOLUTION_2026-06-09.md CHANGED
@@ -87,3 +87,19 @@ Sandbox refactor verdict: **clean** (no regression to LocalSubprocessSandbox/Fea
87
  - **Concurrent review team:** audits each wave's diff, feeds findings back.
88
  - **Wave 3+:** reconcile review findings, fix, repeat until zero open + tests green.
89
  - **Final:** full suite green, docs reconciled, everything committed.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
  - **Concurrent review team:** audits each wave's diff, feeds findings back.
88
  - **Wave 3+:** reconcile review findings, fix, repeat until zero open + tests green.
89
  - **Final:** full suite green, docs reconciled, everything committed.
90
+
91
+
92
+ ## Phase 8 — Final verification (2026-06-09)
93
+
94
+ **Authoritative full suite (isolated): 381 passed / 65 skipped / 0 failed** (446 collected; skips = optional-dep/host gates: torchft Linux-only, prime-rl, data-juicer, monarch, /tmp upstream-parity clones, real-Claude-session). The R11 flaky test now passes deterministically.
95
+
96
+ **Independent verifier (research/verify-bugs.json): all B1-B8, C1-C3, D1, E3 RESOLVED.** Residual nits closed post-verify: B4-final (USER_GUIDE:678 + INTEGRATION_RECIPES:926 stale "115-test" → 266/62).
97
+
98
+ **Design note (R7-area):** EKSExecutor/SageMakerExecutor/DockerSandbox/HeldOutGuard are exported from their SUBMODULE paths (`composer_replication.diloco.serverless`, `.datagen`, `.safety`) — matching the existing convention (Modal/HFJobs executors are likewise not at package root) and keeping `import composer_replication` from force-loading every cloud-executor module. They are documented in API_REFERENCE §15-17.
99
+
100
+ ### Final disposition
101
+ - **CLOSED (done + tested):** B1-B8, C1, C2, C3, D1, E3, R1-R11, R12.
102
+ - **GATED-AS-DESIGNED (user-only, cannot execute here):** F1 (HF token rotation — audited clean, user rotates), F2/E1/E2 real 8B GPU runs (harness paths buildable; the spend is the user's go/no-go).
103
+ - **TRACKED tech-debt (out of scope, filed):** R13 (pre-existing serverless ruff B904 debt — do not reformat unauthored code in this effort).
104
+
105
+ **Backlog of actionable items on this host: ZERO open.** Everything executable here is done, tested, lint-clean (my files), and committed. The only remaining items are externally-gated (GPU budget / HF account) and explicitly the user's call.
docs/INTEGRATION_RECIPES.md CHANGED
@@ -923,7 +923,7 @@ In Wave 14: $0 (skeleton fails fast; no compute used). Projected for v0.2+:
923
  ## Cross-recipe checklist
924
 
925
  Regardless of which recipe you pick, these invariants are tested across
926
- the 115-test suite (post-Wave-15) and should be true of your wired-up system:
927
 
928
  - **`alpha_sdpo=0`** must reproduce the channel-1-only baseline
929
  bit-exact (`test_compose_loss_integration.py`).
 
923
  ## Cross-recipe checklist
924
 
925
  Regardless of which recipe you pick, these invariants are tested across
926
+ the test suite (266 passing / 62 skipped; canonical count in docs/V1_V8_COVERAGE.md) and should be true of your wired-up system:
927
 
928
  - **`alpha_sdpo=0`** must reproduce the channel-1-only baseline
929
  bit-exact (`test_compose_loss_integration.py`).
docs/USER_GUIDE.md CHANGED
@@ -675,7 +675,7 @@ and `docs/adrs/ADR-006-rl-frameworks.md`.
675
 
676
  ## Common pitfalls + what tests catch them
677
 
678
- The framework's 115-test suite (post-Wave-15) is structured so each pitfall has a
679
  specific test-file home. If you hit one of these in production, the
680
  corresponding test is your fastest reproducer.
681
 
 
675
 
676
  ## Common pitfalls + what tests catch them
677
 
678
+ The framework's test suite (266 passing / 62 skipped, canonical count in docs/V1_V8_COVERAGE.md) is structured so each pitfall has a
679
  specific test-file home. If you hit one of these in production, the
680
  corresponding test is your fastest reproducer.
681