Spaces:
Running
Running
APIShift Architecture
This document is the design reference for APIShift. For the high-level
story see the root README. For the operating-manual the trained Manager
reads at every step see apishift_program.md.
System Diagram
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β APISHIFT SYSTEM β
β β
β CurriculumAgent β
β reads results.tsv (last 40 episodes) β
β edits curriculum.md (sampling weights) β
β β β
β βΌ β
β DifficultySampler βββΊ Scenario library (Layer 1/2/3) β
β β β
β βΌ β
β APIShiftEnvironment (OpenEnv server) β
β reset(scenario_id) β Observation β
β step(action) β Observation + reward β
β state() β episode metadata β
β β β
β βΌ β
β Migration Manager (Qwen2.5-7B + LoRA, GRPO-trained) β
β reads at every step: β
β 1. apishift_program.md (operating manual) β
β 2. v1/v2 spec summary + client code β
β 3. top-K lessons from lessons.md β
β 4. recent_episode_summary from results.tsv β
β β β
β βββββββββ΄βββββββββ¬ββββββββββββββ¬βββββββββββββββ¬ββββββββββ β
β βΌ βΌ βΌ βΌ βΌ β
β Diff Patch Test Rollback β
β Specialist Specialist Specialist Specialist β
β (deterministic (template- (verifier- (template + β
β walker) driven) based) verifier) β
β β β
β βΌ β
β Multi-component reward (5 verifiers) β
β β β
β βββββββββ΄βββββββββββββ β
β βΌ βΌ β
β TRL GRPO MemoryAgent + EpisodeLogger β
β updates LoRA edits lessons.md, appends results.tsv β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Self-Improvement Mechanisms
APIShift improves through FOUR independent loops:
- GRPO weight updates. TRL's GRPOTrainer updates the LoRA adapters on the Manager every batch.
- Markdown memory. The MemoryAgent appends/updates lessons.md after each successful episode. The Manager retrieves top-K lessons before acting on a new scenario.
- Adaptive curriculum. The CurriculumAgent re-reads results.tsv every 20 episodes and shifts the sampling weights to keep training near the Manager's frontier of mastery.
- Operating manual evolution. Humans (or downstream tooling) can edit apishift_program.md to change agent behavior without retraining.
File Responsibilities
| File | Purpose |
|---|---|
apishift_program.md |
Manager operating manual (sections injected into observations) |
lessons.md |
Cross-episode memory artifact, edited by MemoryAgent |
curriculum.md |
Difficulty sampling weights, edited by CurriculumAgent |
results.tsv |
Per-episode log, append-only |
models.py |
Pydantic Action/Observation/State |
server/environment.py |
OpenEnv Environment implementation |
server/reward.py |
5-component reward function |
server/specialists/*.py |
Diff, Patch, Test, Rollback specialists |
server/memory/*.py |
MemoryAgent + retrieval |
server/curriculum/*.py |
CurriculumAgent + sampler |
server/program_loader.py |
Loads relevant section of apishift_program.md |
server/episode_logger.py |
Appends rows to results.tsv |
Why this design
Inspectability is non-negotiable. Banking compliance, OSS users, and hackathon judges all need to be able to read what the agent learned and how its training distribution evolved. Markdown + TSV files satisfy this; vector stores and binary checkpoints do not.
The autoresearch-style "agent reads markdown that it (or a peer agent) also writes" pattern keeps the system inspectable without sacrificing self-improvement capability.