Spaces:
Running
Running
| # APIShift Architecture | |
| This document is the design reference for APIShift. For the high-level | |
| story see the root README. For the operating-manual the trained Manager | |
| reads at every step see `apishift_program.md`. | |
| ## System Diagram | |
| ``` | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β APISHIFT SYSTEM β | |
| β β | |
| β CurriculumAgent β | |
| β reads results.tsv (last 40 episodes) β | |
| β edits curriculum.md (sampling weights) β | |
| β β β | |
| β βΌ β | |
| β DifficultySampler βββΊ Scenario library (Layer 1/2/3) β | |
| β β β | |
| β βΌ β | |
| β APIShiftEnvironment (OpenEnv server) β | |
| β reset(scenario_id) β Observation β | |
| β step(action) β Observation + reward β | |
| β state() β episode metadata β | |
| β β β | |
| β βΌ β | |
| β Migration Manager (Qwen2.5-7B + LoRA, GRPO-trained) β | |
| β reads at every step: β | |
| β 1. apishift_program.md (operating manual) β | |
| β 2. v1/v2 spec summary + client code β | |
| β 3. top-K lessons from lessons.md β | |
| β 4. recent_episode_summary from results.tsv β | |
| β β β | |
| β βββββββββ΄βββββββββ¬ββββββββββββββ¬βββββββββββββββ¬ββββββββββ β | |
| β βΌ βΌ βΌ βΌ βΌ β | |
| β Diff Patch Test Rollback β | |
| β Specialist Specialist Specialist Specialist β | |
| β (deterministic (template- (verifier- (template + β | |
| β walker) driven) based) verifier) β | |
| β β β | |
| β βΌ β | |
| β Multi-component reward (5 verifiers) β | |
| β β β | |
| β βββββββββ΄βββββββββββββ β | |
| β βΌ βΌ β | |
| β TRL GRPO MemoryAgent + EpisodeLogger β | |
| β updates LoRA edits lessons.md, appends results.tsv β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ## Self-Improvement Mechanisms | |
| APIShift improves through FOUR independent loops: | |
| 1. **GRPO weight updates.** TRL's GRPOTrainer updates the LoRA adapters | |
| on the Manager every batch. | |
| 2. **Markdown memory.** The MemoryAgent appends/updates lessons.md after | |
| each successful episode. The Manager retrieves top-K lessons before | |
| acting on a new scenario. | |
| 3. **Adaptive curriculum.** The CurriculumAgent re-reads results.tsv | |
| every 20 episodes and shifts the sampling weights to keep training | |
| near the Manager's frontier of mastery. | |
| 4. **Operating manual evolution.** Humans (or downstream tooling) can | |
| edit apishift_program.md to change agent behavior without retraining. | |
| ## File Responsibilities | |
| | File | Purpose | | |
| |------|---------| | |
| | `apishift_program.md` | Manager operating manual (sections injected into observations) | | |
| | `lessons.md` | Cross-episode memory artifact, edited by MemoryAgent | | |
| | `curriculum.md` | Difficulty sampling weights, edited by CurriculumAgent | | |
| | `results.tsv` | Per-episode log, append-only | | |
| | `models.py` | Pydantic Action/Observation/State | | |
| | `server/environment.py` | OpenEnv Environment implementation | | |
| | `server/reward.py` | 5-component reward function | | |
| | `server/specialists/*.py` | Diff, Patch, Test, Rollback specialists | | |
| | `server/memory/*.py` | MemoryAgent + retrieval | | |
| | `server/curriculum/*.py` | CurriculumAgent + sampler | | |
| | `server/program_loader.py` | Loads relevant section of apishift_program.md | | |
| | `server/episode_logger.py` | Appends rows to results.tsv | | |
| ## Why this design | |
| Inspectability is non-negotiable. Banking compliance, OSS users, and | |
| hackathon judges all need to be able to read what the agent learned and | |
| how its training distribution evolved. Markdown + TSV files satisfy | |
| this; vector stores and binary checkpoints do not. | |
| The autoresearch-style "agent reads markdown that it (or a peer agent) | |
| also writes" pattern keeps the system inspectable without sacrificing | |
| self-improvement capability. | |