apishift-env / docs /ARCHITECTURE.md
yaswanth169's picture
Initial APIShift env push
3040bf7 verified

APIShift Architecture

This document is the design reference for APIShift. For the high-level story see the root README. For the operating-manual the trained Manager reads at every step see apishift_program.md.

System Diagram

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         APISHIFT SYSTEM                              β”‚
β”‚                                                                      β”‚
β”‚   CurriculumAgent                                                    β”‚
β”‚     reads results.tsv (last 40 episodes)                             β”‚
β”‚     edits curriculum.md (sampling weights)                           β”‚
β”‚           β”‚                                                          β”‚
β”‚           β–Ό                                                          β”‚
β”‚   DifficultySampler ──► Scenario library (Layer 1/2/3)              β”‚
β”‚           β”‚                                                          β”‚
β”‚           β–Ό                                                          β”‚
β”‚   APIShiftEnvironment (OpenEnv server)                               β”‚
β”‚     reset(scenario_id) β†’ Observation                                 β”‚
β”‚     step(action)        β†’ Observation + reward                       β”‚
β”‚     state()             β†’ episode metadata                           β”‚
β”‚           β”‚                                                          β”‚
β”‚           β–Ό                                                          β”‚
β”‚   Migration Manager (Qwen2.5-7B + LoRA, GRPO-trained)                β”‚
β”‚     reads at every step:                                             β”‚
β”‚       1. apishift_program.md (operating manual)                      β”‚
β”‚       2. v1/v2 spec summary + client code                            β”‚
β”‚       3. top-K lessons from lessons.md                               β”‚
β”‚       4. recent_episode_summary from results.tsv                     β”‚
β”‚           β”‚                                                          β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚   β–Ό                β–Ό             β–Ό              β–Ό         β–Ό         β”‚
β”‚ Diff           Patch          Test          Rollback                 β”‚
β”‚ Specialist     Specialist     Specialist    Specialist               β”‚
β”‚ (deterministic (template-     (verifier-    (template +              β”‚
β”‚  walker)        driven)        based)        verifier)               β”‚
β”‚           β”‚                                                          β”‚
β”‚           β–Ό                                                          β”‚
β”‚   Multi-component reward (5 verifiers)                               β”‚
β”‚           β”‚                                                          β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                            β”‚
β”‚   β–Ό                    β–Ό                                            β”‚
β”‚ TRL GRPO          MemoryAgent + EpisodeLogger                       β”‚
β”‚ updates LoRA      edits lessons.md, appends results.tsv             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Self-Improvement Mechanisms

APIShift improves through FOUR independent loops:

  1. GRPO weight updates. TRL's GRPOTrainer updates the LoRA adapters on the Manager every batch.
  2. Markdown memory. The MemoryAgent appends/updates lessons.md after each successful episode. The Manager retrieves top-K lessons before acting on a new scenario.
  3. Adaptive curriculum. The CurriculumAgent re-reads results.tsv every 20 episodes and shifts the sampling weights to keep training near the Manager's frontier of mastery.
  4. Operating manual evolution. Humans (or downstream tooling) can edit apishift_program.md to change agent behavior without retraining.

File Responsibilities

File Purpose
apishift_program.md Manager operating manual (sections injected into observations)
lessons.md Cross-episode memory artifact, edited by MemoryAgent
curriculum.md Difficulty sampling weights, edited by CurriculumAgent
results.tsv Per-episode log, append-only
models.py Pydantic Action/Observation/State
server/environment.py OpenEnv Environment implementation
server/reward.py 5-component reward function
server/specialists/*.py Diff, Patch, Test, Rollback specialists
server/memory/*.py MemoryAgent + retrieval
server/curriculum/*.py CurriculumAgent + sampler
server/program_loader.py Loads relevant section of apishift_program.md
server/episode_logger.py Appends rows to results.tsv

Why this design

Inspectability is non-negotiable. Banking compliance, OSS users, and hackathon judges all need to be able to read what the agent learned and how its training distribution evolved. Markdown + TSV files satisfy this; vector stores and binary checkpoints do not.

The autoresearch-style "agent reads markdown that it (or a peer agent) also writes" pattern keeps the system inspectable without sacrificing self-improvement capability.