Phase 8: B4-final (last 2 living-doc stale counts) + final verification disposition

- B4-final: USER_GUIDE.md:678 + INTEGRATION_RECIPES.md:926 '115-test suite' → 266/62
(the last current-framed stale counts the independent verifier found outside the
earlier grep scope; dated-historical + _archive mentions correctly left).
- Recorded the Phase-8 disposition: full suite 381 passed/65 skipped/0 failed;
independent verifier confirms B1-E3 resolved; submodule-export design note;
final backlog disposition (zero actionable-on-host items open; remainder is
user-gated GPU/token or tracked pre-existing lint debt).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Files changed (3) hide show

docs/BACKLOG_RESOLUTION_2026-06-09.md +16 -0
docs/INTEGRATION_RECIPES.md +1 -1
docs/USER_GUIDE.md +1 -1

docs/BACKLOG_RESOLUTION_2026-06-09.md CHANGED Viewed

@@ -87,3 +87,19 @@ Sandbox refactor verdict: **clean** (no regression to LocalSubprocessSandbox/Fea
 - **Concurrent review team:** audits each wave's diff, feeds findings back.
 - **Wave 3+:** reconcile review findings, fix, repeat until zero open + tests green.
 - **Final:** full suite green, docs reconciled, everything committed.

 - **Concurrent review team:** audits each wave's diff, feeds findings back.
 - **Wave 3+:** reconcile review findings, fix, repeat until zero open + tests green.
 - **Final:** full suite green, docs reconciled, everything committed.
+## Phase 8 — Final verification (2026-06-09)
+**Authoritative full suite (isolated): 381 passed / 65 skipped / 0 failed** (446 collected; skips = optional-dep/host gates: torchft Linux-only, prime-rl, data-juicer, monarch, /tmp upstream-parity clones, real-Claude-session). The R11 flaky test now passes deterministically.
+**Independent verifier (research/verify-bugs.json): all B1-B8, C1-C3, D1, E3 RESOLVED.** Residual nits closed post-verify: B4-final (USER_GUIDE:678 + INTEGRATION_RECIPES:926 stale "115-test" → 266/62).
+**Design note (R7-area):** EKSExecutor/SageMakerExecutor/DockerSandbox/HeldOutGuard are exported from their SUBMODULE paths (`composer_replication.diloco.serverless`, `.datagen`, `.safety`) — matching the existing convention (Modal/HFJobs executors are likewise not at package root) and keeping `import composer_replication` from force-loading every cloud-executor module. They are documented in API_REFERENCE §15-17.
+### Final disposition
+- **CLOSED (done + tested):** B1-B8, C1, C2, C3, D1, E3, R1-R11, R12.
+- **GATED-AS-DESIGNED (user-only, cannot execute here):** F1 (HF token rotation — audited clean, user rotates), F2/E1/E2 real 8B GPU runs (harness paths buildable; the spend is the user's go/no-go).
+- **TRACKED tech-debt (out of scope, filed):** R13 (pre-existing serverless ruff B904 debt — do not reformat unauthored code in this effort).
+**Backlog of actionable items on this host: ZERO open.** Everything executable here is done, tested, lint-clean (my files), and committed. The only remaining items are externally-gated (GPU budget / HF account) and explicitly the user's call.

docs/INTEGRATION_RECIPES.md CHANGED Viewed

@@ -923,7 +923,7 @@ In Wave 14: $0 (skeleton fails fast; no compute used). Projected for v0.2+:
 ## Cross-recipe checklist
 Regardless of which recipe you pick, these invariants are tested across
-the 115-test suite (post-Wave-15) and should be true of your wired-up system:
 - **`alpha_sdpo=0`** must reproduce the channel-1-only baseline
   bit-exact (`test_compose_loss_integration.py`).

 ## Cross-recipe checklist
 Regardless of which recipe you pick, these invariants are tested across
+the test suite (266 passing / 62 skipped; canonical count in docs/V1_V8_COVERAGE.md) and should be true of your wired-up system:
 - **`alpha_sdpo=0`** must reproduce the channel-1-only baseline
   bit-exact (`test_compose_loss_integration.py`).

docs/USER_GUIDE.md CHANGED Viewed

@@ -675,7 +675,7 @@ and `docs/adrs/ADR-006-rl-frameworks.md`.
 ## Common pitfalls + what tests catch them
-The framework's 115-test suite (post-Wave-15) is structured so each pitfall has a
 specific test-file home. If you hit one of these in production, the
 corresponding test is your fastest reproducer.

 ## Common pitfalls + what tests catch them
+The framework's test suite (266 passing / 62 skipped, canonical count in docs/V1_V8_COVERAGE.md) is structured so each pitfall has a
 specific test-file home. If you hit one of these in production, the
 corresponding test is your fastest reproducer.