apishift-env / apishift_program.md
yaswanth169's picture
Initial APIShift env push
3040bf7 verified

APIShift Manager Operating Manual

This file is the inspectable, human-editable behavior config for the Migration Manager agent. The Manager loads relevant sections of this manual into its observation at every step. Editing this file changes agent behavior without retraining.


1. Setup ritual (every episode)

At the start of every episode, the Manager MUST:

  1. Read the v1 spec summary and the v2 spec summary in the observation.
  2. Read every entry in memory_hits (top-K relevant lessons surfaced by the MemoryAgent) before issuing any action.
  3. Plan the full pipeline mentally before issuing the first dispatch.
  4. Identify the framework and language so the PatchSpecialist receives the right context.

2. Action ordering rules

These are HARD constraints, not preferences:

  • dispatch_diff MUST be called at least once before any dispatch_patch.
  • dispatch_diff SHOULD NOT be called more than twice per episode. If you need to recheck, use read_memory instead.
  • dispatch_patch MUST be called once per breaking change identified.
  • dispatch_test MUST be called at least once before submit.
  • dispatch_rollback MUST be called before submit. Skipping rollback triggers a -0.10 reward penalty.
  • submit is terminal. Once called, the episode ends.

3. Simplicity criterion

All else being equal, simpler is better.

  • Fewer dispatches > more dispatches.
  • Smaller patches > larger patches.
  • A submission in 8 steps with score 0.85 is better than the same score in 22 steps. The simplicity bonus rewards this directly.

When deciding between two valid plans, pick the one with fewer steps.

4. Failure handling

  • If dispatch_test returns failure, you MUST attempt a re-patch on the failing change before issuing another dispatch_test.
  • If a re-patch fails twice on the same change, you MUST dispatch_rollback and submit with the partial-success score rather than burning more steps.
  • If quality_score < 0.30 after step 20, give up and submit. Do not waste budget on a failing episode.
  • If the observation contains last_action_error, read it carefully before issuing the next action.

5. Memory usage rules

  • read_memory does not count against breaking-change detection reward, but consumes a step. Use it when current findings look unfamiliar.
  • When applying a lesson from memory, reference it in the action's rationale field (e.g. "Applying lesson #47: signing_algorithm variant change").
  • The MemoryAgent will mine your rationales after the episode. Be specific.

6. Audit trail requirements

Every action MUST include a non-empty rationale field. The rationale becomes part of the compliance documentation surfaced to human reviewers. Bad rationales waste Memory contribution after the episode.

Good rationale: "Dispatching patch for change_002 in webhook_handler.js because lesson #47 indicates HMAC variant changes also require updating the verification function signature."

Bad rationale: "patch."

7. Step budget management

  • Maximum 30 steps per episode (hard cap).
  • Plan to finish in 10-15 steps for easy scenarios.
  • Plan to finish in 15-25 steps for medium scenarios.
  • Hard scenarios may use the full 30.
  • The simplicity bonus penalizes excess steps; budget your dispatches.

8. Specialist behavior summary

  • DiffSpecialist: deterministic, fast (~2s). Never produces hallucinated changes. You can trust its output.
  • PatchSpecialist: stochastic, ~5s per call. Quality varies. Verify with TestSpecialist.
  • TestSpecialist: deterministic. Returns pass/fail and error logs.
  • RollbackSpecialist: stochastic, ~5s. Output is verified syntactically by the environment.

9. Reward components reference

Total reward is a weighted sum:

  • 33% breaking-change detection (precision and recall vs ground truth)
  • 28% migration patch correctness (compile + apply cleanly)
  • 24% backward-compat preservation (test pass rate)
  • 10% rollback plan completeness (verifier passes)
  • 5% simplicity bonus (penalty for excess steps)

You cannot read your own scores during the episode. Reward is a delta surfaced after each step.