# APIShift Architecture This document is the design reference for APIShift. For the high-level story see the root README. For the operating-manual the trained Manager reads at every step see `apishift_program.md`. ## System Diagram ``` ┌──────────────────────────────────────────────────────────────────────┐ │ APISHIFT SYSTEM │ │ │ │ CurriculumAgent │ │ reads results.tsv (last 40 episodes) │ │ edits curriculum.md (sampling weights) │ │ │ │ │ ▼ │ │ DifficultySampler ──► Scenario library (Layer 1/2/3) │ │ │ │ │ ▼ │ │ APIShiftEnvironment (OpenEnv server) │ │ reset(scenario_id) → Observation │ │ step(action) → Observation + reward │ │ state() → episode metadata │ │ │ │ │ ▼ │ │ Migration Manager (Qwen2.5-7B + LoRA, GRPO-trained) │ │ reads at every step: │ │ 1. apishift_program.md (operating manual) │ │ 2. v1/v2 spec summary + client code │ │ 3. top-K lessons from lessons.md │ │ 4. recent_episode_summary from results.tsv │ │ │ │ │ ┌───────┴────────┬─────────────┬──────────────┬─────────┐ │ │ ▼ ▼ ▼ ▼ ▼ │ │ Diff Patch Test Rollback │ │ Specialist Specialist Specialist Specialist │ │ (deterministic (template- (verifier- (template + │ │ walker) driven) based) verifier) │ │ │ │ │ ▼ │ │ Multi-component reward (5 verifiers) │ │ │ │ │ ┌───────┴────────────┐ │ │ ▼ ▼ │ │ TRL GRPO MemoryAgent + EpisodeLogger │ │ updates LoRA edits lessons.md, appends results.tsv │ └──────────────────────────────────────────────────────────────────────┘ ``` ## Self-Improvement Mechanisms APIShift improves through FOUR independent loops: 1. **GRPO weight updates.** TRL's GRPOTrainer updates the LoRA adapters on the Manager every batch. 2. **Markdown memory.** The MemoryAgent appends/updates lessons.md after each successful episode. The Manager retrieves top-K lessons before acting on a new scenario. 3. **Adaptive curriculum.** The CurriculumAgent re-reads results.tsv every 20 episodes and shifts the sampling weights to keep training near the Manager's frontier of mastery. 4. **Operating manual evolution.** Humans (or downstream tooling) can edit apishift_program.md to change agent behavior without retraining. ## File Responsibilities | File | Purpose | |------|---------| | `apishift_program.md` | Manager operating manual (sections injected into observations) | | `lessons.md` | Cross-episode memory artifact, edited by MemoryAgent | | `curriculum.md` | Difficulty sampling weights, edited by CurriculumAgent | | `results.tsv` | Per-episode log, append-only | | `models.py` | Pydantic Action/Observation/State | | `server/environment.py` | OpenEnv Environment implementation | | `server/reward.py` | 5-component reward function | | `server/specialists/*.py` | Diff, Patch, Test, Rollback specialists | | `server/memory/*.py` | MemoryAgent + retrieval | | `server/curriculum/*.py` | CurriculumAgent + sampler | | `server/program_loader.py` | Loads relevant section of apishift_program.md | | `server/episode_logger.py` | Appends rows to results.tsv | ## Why this design Inspectability is non-negotiable. Banking compliance, OSS users, and hackathon judges all need to be able to read what the agent learned and how its training distribution evolved. Markdown + TSV files satisfy this; vector stores and binary checkpoints do not. The autoresearch-style "agent reads markdown that it (or a peer agent) also writes" pattern keeps the system inspectable without sacrificing self-improvement capability.