# APIShift Architecture

This document is the design reference for APIShift. For the high-level
story see the root README. For the operating-manual the trained Manager
reads at every step see `apishift_program.md`.

## System Diagram

```
┌──────────────────────────────────────────────────────────────────────┐
│                         APISHIFT SYSTEM                              │
│                                                                      │
│   CurriculumAgent                                                    │
│     reads results.tsv (last 40 episodes)                             │
│     edits curriculum.md (sampling weights)                           │
│           │                                                          │
│           ▼                                                          │
│   DifficultySampler ──► Scenario library (Layer 1/2/3)              │
│           │                                                          │
│           ▼                                                          │
│   APIShiftEnvironment (OpenEnv server)                               │
│     reset(scenario_id) → Observation                                 │
│     step(action)        → Observation + reward                       │
│     state()             → episode metadata                           │
│           │                                                          │
│           ▼                                                          │
│   Migration Manager (Qwen2.5-7B + LoRA, GRPO-trained)                │
│     reads at every step:                                             │
│       1. apishift_program.md (operating manual)                      │
│       2. v1/v2 spec summary + client code                            │
│       3. top-K lessons from lessons.md                               │
│       4. recent_episode_summary from results.tsv                     │
│           │                                                          │
│   ┌───────┴────────┬─────────────┬──────────────┬─────────┐         │
│   ▼                ▼             ▼              ▼         ▼         │
│ Diff           Patch          Test          Rollback                 │
│ Specialist     Specialist     Specialist    Specialist               │
│ (deterministic (template-     (verifier-    (template +              │
│  walker)        driven)        based)        verifier)               │
│           │                                                          │
│           ▼                                                          │
│   Multi-component reward (5 verifiers)                               │
│           │                                                          │
│   ┌───────┴────────────┐                                            │
│   ▼                    ▼                                            │
│ TRL GRPO          MemoryAgent + EpisodeLogger                       │
│ updates LoRA      edits lessons.md, appends results.tsv             │
└──────────────────────────────────────────────────────────────────────┘
```

## Self-Improvement Mechanisms

APIShift improves through FOUR independent loops:

1. **GRPO weight updates.** TRL's GRPOTrainer updates the LoRA adapters
   on the Manager every batch.
2. **Markdown memory.** The MemoryAgent appends/updates lessons.md after
   each successful episode. The Manager retrieves top-K lessons before
   acting on a new scenario.
3. **Adaptive curriculum.** The CurriculumAgent re-reads results.tsv
   every 20 episodes and shifts the sampling weights to keep training
   near the Manager's frontier of mastery.
4. **Operating manual evolution.** Humans (or downstream tooling) can
   edit apishift_program.md to change agent behavior without retraining.

## File Responsibilities

| File | Purpose |
|------|---------|
| `apishift_program.md` | Manager operating manual (sections injected into observations) |
| `lessons.md` | Cross-episode memory artifact, edited by MemoryAgent |
| `curriculum.md` | Difficulty sampling weights, edited by CurriculumAgent |
| `results.tsv` | Per-episode log, append-only |
| `models.py` | Pydantic Action/Observation/State |
| `server/environment.py` | OpenEnv Environment implementation |
| `server/reward.py` | 5-component reward function |
| `server/specialists/*.py` | Diff, Patch, Test, Rollback specialists |
| `server/memory/*.py` | MemoryAgent + retrieval |
| `server/curriculum/*.py` | CurriculumAgent + sampler |
| `server/program_loader.py` | Loads relevant section of apishift_program.md |
| `server/episode_logger.py` | Appends rows to results.tsv |

## Why this design

Inspectability is non-negotiable. Banking compliance, OSS users, and
hackathon judges all need to be able to read what the agent learned and
how its training distribution evolved. Markdown + TSV files satisfy
this; vector stores and binary checkpoints do not.

The autoresearch-style "agent reads markdown that it (or a peer agent)
also writes" pattern keeps the system inspectable without sacrificing
self-improvement capability.