Spaces:

yaswanth169
/

apishift-env

Running

App Files Files Community

apishift-env / docs /ARCHITECTURE.md

yaswanth169

Initial APIShift env push

3040bf7 verified about 1 month ago

preview code

raw

history blame contribute delete

5.7 kB

APIShift Architecture

This document is the design reference for APIShift. For the high-level story see the root README. For the operating-manual the trained Manager reads at every step see apishift_program.md.

System Diagram

┌──────────────────────────────────────────────────────────────────────┐
│                         APISHIFT SYSTEM                              │
│                                                                      │
│   CurriculumAgent                                                    │
│     reads results.tsv (last 40 episodes)                             │
│     edits curriculum.md (sampling weights)                           │
│           │                                                          │
│           ▼                                                          │
│   DifficultySampler ──► Scenario library (Layer 1/2/3)              │
│           │                                                          │
│           ▼                                                          │
│   APIShiftEnvironment (OpenEnv server)                               │
│     reset(scenario_id) → Observation                                 │
│     step(action)        → Observation + reward                       │
│     state()             → episode metadata                           │
│           │                                                          │
│           ▼                                                          │
│   Migration Manager (Qwen2.5-7B + LoRA, GRPO-trained)                │
│     reads at every step:                                             │
│       1. apishift_program.md (operating manual)                      │
│       2. v1/v2 spec summary + client code                            │
│       3. top-K lessons from lessons.md                               │
│       4. recent_episode_summary from results.tsv                     │
│           │                                                          │
│   ┌───────┴────────┬─────────────┬──────────────┬─────────┐         │
│   ▼                ▼             ▼              ▼         ▼         │
│ Diff           Patch          Test          Rollback                 │
│ Specialist     Specialist     Specialist    Specialist               │
│ (deterministic (template-     (verifier-    (template +              │
│  walker)        driven)        based)        verifier)               │
│           │                                                          │
│           ▼                                                          │
│   Multi-component reward (5 verifiers)                               │
│           │                                                          │
│   ┌───────┴────────────┐                                            │
│   ▼                    ▼                                            │
│ TRL GRPO          MemoryAgent + EpisodeLogger                       │
│ updates LoRA      edits lessons.md, appends results.tsv             │
└──────────────────────────────────────────────────────────────────────┘

Self-Improvement Mechanisms

APIShift improves through FOUR independent loops:

GRPO weight updates. TRL's GRPOTrainer updates the LoRA adapters on the Manager every batch.
Markdown memory. The MemoryAgent appends/updates lessons.md after each successful episode. The Manager retrieves top-K lessons before acting on a new scenario.
Adaptive curriculum. The CurriculumAgent re-reads results.tsv every 20 episodes and shifts the sampling weights to keep training near the Manager's frontier of mastery.
Operating manual evolution. Humans (or downstream tooling) can edit apishift_program.md to change agent behavior without retraining.

File Responsibilities

File	Purpose
`apishift_program.md`	Manager operating manual (sections injected into observations)
`lessons.md`	Cross-episode memory artifact, edited by MemoryAgent
`curriculum.md`	Difficulty sampling weights, edited by CurriculumAgent
`results.tsv`	Per-episode log, append-only
`models.py`	Pydantic Action/Observation/State
`server/environment.py`	OpenEnv Environment implementation
`server/reward.py`	5-component reward function
`server/specialists/*.py`	Diff, Patch, Test, Rollback specialists
`server/memory/*.py`	MemoryAgent + retrieval
`server/curriculum/*.py`	CurriculumAgent + sampler
`server/program_loader.py`	Loads relevant section of apishift_program.md
`server/episode_logger.py`	Appends rows to results.tsv

Why this design

Inspectability is non-negotiable. Banking compliance, OSS users, and hackathon judges all need to be able to read what the agent learned and how its training distribution evolved. Markdown + TSV files satisfy this; vector stores and binary checkpoints do not.

The autoresearch-style "agent reads markdown that it (or a peer agent) also writes" pattern keeps the system inspectable without sacrificing self-improvement capability.