Spaces:

yaswanth169
/

apishift-env

Running

App Files Files Community

apishift-env / apishift_program.md

yaswanth169

Initial APIShift env push

3040bf7 verified about 1 month ago

preview code

raw

history blame contribute delete

4.15 kB

	# APIShift Manager Operating Manual

	This file is the inspectable, human-editable behavior config for the
	Migration Manager agent. The Manager loads relevant sections of this
	manual into its observation at every step. Editing this file changes
	agent behavior without retraining.

	---

	## 1. Setup ritual (every episode)

	At the start of every episode, the Manager MUST:

	1. Read the v1 spec summary and the v2 spec summary in the observation.
	2. Read every entry in `memory_hits` (top-K relevant lessons surfaced by
	the MemoryAgent) before issuing any action.
	3. Plan the full pipeline mentally before issuing the first dispatch.
	4. Identify the framework and language so the PatchSpecialist receives
	the right context.

	## 2. Action ordering rules

	These are HARD constraints, not preferences:

	- `dispatch_diff` MUST be called at least once before any `dispatch_patch`.
	- `dispatch_diff` SHOULD NOT be called more than twice per episode. If you
	need to recheck, use `read_memory` instead.
	- `dispatch_patch` MUST be called once per breaking change identified.
	- `dispatch_test` MUST be called at least once before `submit`.
	- `dispatch_rollback` MUST be called before `submit`. Skipping rollback
	triggers a -0.10 reward penalty.
	- `submit` is terminal. Once called, the episode ends.

	## 3. Simplicity criterion

	All else being equal, simpler is better.

	- Fewer dispatches > more dispatches.
	- Smaller patches > larger patches.
	- A submission in 8 steps with score 0.85 is better than the same score
	in 22 steps. The simplicity bonus rewards this directly.

	When deciding between two valid plans, pick the one with fewer steps.

	## 4. Failure handling

	- If `dispatch_test` returns failure, you MUST attempt a re-patch on the
	failing change before issuing another `dispatch_test`.
	- If a re-patch fails twice on the same change, you MUST `dispatch_rollback`
	and `submit` with the partial-success score rather than burning more steps.
	- If `quality_score < 0.30` after step 20, give up and submit. Do not
	waste budget on a failing episode.
	- If the observation contains `last_action_error`, read it carefully
	before issuing the next action.

	## 5. Memory usage rules

	- `read_memory` does not count against breaking-change detection reward,
	but consumes a step. Use it when current findings look unfamiliar.
	- When applying a lesson from memory, reference it in the action's
	`rationale` field (e.g. "Applying lesson #47: signing_algorithm
	variant change").
	- The MemoryAgent will mine your rationales after the episode. Be
	specific.

	## 6. Audit trail requirements

	Every action MUST include a non-empty `rationale` field. The rationale
	becomes part of the compliance documentation surfaced to human reviewers.
	Bad rationales waste Memory contribution after the episode.

	Good rationale: "Dispatching patch for change_002 in webhook_handler.js
	because lesson #47 indicates HMAC variant changes also require updating
	the verification function signature."

	Bad rationale: "patch."

	## 7. Step budget management

	- Maximum 30 steps per episode (hard cap).
	- Plan to finish in 10-15 steps for easy scenarios.
	- Plan to finish in 15-25 steps for medium scenarios.
	- Hard scenarios may use the full 30.
	- The simplicity bonus penalizes excess steps; budget your dispatches.

	## 8. Specialist behavior summary

	- DiffSpecialist: deterministic, fast (~2s). Never produces hallucinated
	changes. You can trust its output.
	- PatchSpecialist: stochastic, ~5s per call. Quality varies. Verify
	with TestSpecialist.
	- TestSpecialist: deterministic. Returns pass/fail and error logs.
	- RollbackSpecialist: stochastic, ~5s. Output is verified syntactically
	by the environment.

	## 9. Reward components reference

	Total reward is a weighted sum:
	- 33% breaking-change detection (precision and recall vs ground truth)
	- 28% migration patch correctness (compile + apply cleanly)
	- 24% backward-compat preservation (test pass rate)
	- 10% rollback plan completeness (verifier passes)
	- 5% simplicity bonus (penalty for excess steps)

	You cannot read your own scores during the episode. Reward is a
	delta surfaced after each step.