replicalab / docs /kian /task_breakdown.md
maxxie114's picture
Initial HF Spaces deployment
80d8c84

Kian (Person A) Task Breakdown

Source of truth: ReplicaLab_Comprehensive_Task_Division.md


Current status

  • FND 04, FND 08, FND 09, MOD 01 to MOD 05, MOD 11, MOD 12 are complete
  • Shared AGT 05 is now complete, so the deterministic feasibility layer exists for both the Lab Manager path and the judge feasibility score
  • SCN 01 to SCN 10 are complete, so the deterministic scenario layer exists in code
  • ENV 01 to ENV 08 are all complete — the full environment lifecycle (reset, step, validate, Lab Manager response, termination, judge scoring, state snapshot, close) works end-to-end
  • JDG 01 to JDG 06 plus JDG 08 are complete — the deterministic reward pipeline is wired, the plain-English explanation layer exists, and the reward stack now has stronger regression coverage for ordering, substitution behavior, partial feasibility credit, and breakdown determinism
  • TST 01 to TST 05 are complete with 36 env tests and 40 reward tests passing
  • MOD 06, SCN 13, AGT 09, JDG 11, ENV 11, ENV 10, and OBS 04 are now complete, so the remaining Kian work is the blocked schema follow-on

Bounded-tool scope note:

  1. Kian-owned scenario, judge, and environment tasks now need to support bounded search, code_check, and image_inspection traces without changing the outer action contract.
  2. Training reward must remain deterministic and must not depend on live web.
  3. Frozen evidence packs are the default training-time source of tool inputs.
  4. Audio remains out of scope.

Recommended execution order

  1. MOD 08 -- add schema and validator unit-test expansion

Why this order

  • SCN 13 is complete, so the normalized scenario layer now carries booking and scheduling conflicts as structured deterministic data.
  • AGT 09 is complete, so the grounded Lab Manager checker, suggestion, and response stack now has deterministic regression coverage.
  • JDG 11 is complete and ENV 11 is now integrated, so terminal env outputs and replay-facing state carry the canonical audit payload end to end.
  • ENV 10 and OBS 04 are now complete, so the environment stack has deterministic replay and broader regression coverage on top of the completed ENV 01-08 and ENV 11 lifecycle.
  • MOD 08 is the only remaining Kian-owned implementation task, and it is now fully unblocked.