Spaces:

yaswanth169
/

apishift-env

Running

App Files Files Community

apishift-env / docs /ARCHITECTURE.md

yaswanth169

Initial APIShift env push

3040bf7 verified about 1 month ago

preview code

raw

history blame contribute delete

5.7 kB

	# APIShift Architecture

	This document is the design reference for APIShift. For the high-level
	story see the root README. For the operating-manual the trained Manager
	reads at every step see `apishift_program.md`.

	## System Diagram

	```
	┌──────────────────────────────────────────────────────────────────────┐
	│ APISHIFT SYSTEM │
	│ │
	│ CurriculumAgent │
	│ reads results.tsv (last 40 episodes) │
	│ edits curriculum.md (sampling weights) │
	│ │ │
	│ ▼ │
	│ DifficultySampler ──► Scenario library (Layer 1/2/3) │
	│ │ │
	│ ▼ │
	│ APIShiftEnvironment (OpenEnv server) │
	│ reset(scenario_id) → Observation │
	│ step(action) → Observation + reward │
	│ state() → episode metadata │
	│ │ │
	│ ▼ │
	│ Migration Manager (Qwen2.5-7B + LoRA, GRPO-trained) │
	│ reads at every step: │
	│ 1. apishift_program.md (operating manual) │
	│ 2. v1/v2 spec summary + client code │
	│ 3. top-K lessons from lessons.md │
	│ 4. recent_episode_summary from results.tsv │
	│ │ │
	│ ┌───────┴────────┬─────────────┬──────────────┬─────────┐ │
	│ ▼ ▼ ▼ ▼ ▼ │
	│ Diff Patch Test Rollback │
	│ Specialist Specialist Specialist Specialist │
	│ (deterministic (template- (verifier- (template + │
	│ walker) driven) based) verifier) │
	│ │ │
	│ ▼ │
	│ Multi-component reward (5 verifiers) │
	│ │ │
	│ ┌───────┴────────────┐ │
	│ ▼ ▼ │
	│ TRL GRPO MemoryAgent + EpisodeLogger │
	│ updates LoRA edits lessons.md, appends results.tsv │
	└──────────────────────────────────────────────────────────────────────┘
	```

	## Self-Improvement Mechanisms

	APIShift improves through FOUR independent loops:

	1. GRPO weight updates. TRL's GRPOTrainer updates the LoRA adapters
	on the Manager every batch.
	2. Markdown memory. The MemoryAgent appends/updates lessons.md after
	each successful episode. The Manager retrieves top-K lessons before
	acting on a new scenario.
	3. Adaptive curriculum. The CurriculumAgent re-reads results.tsv
	every 20 episodes and shifts the sampling weights to keep training
	near the Manager's frontier of mastery.
	4. Operating manual evolution. Humans (or downstream tooling) can
	edit apishift_program.md to change agent behavior without retraining.

	## File Responsibilities

	\| File \| Purpose \|
	\|------\|---------\|
	\| `apishift_program.md` \| Manager operating manual (sections injected into observations) \|
	\| `lessons.md` \| Cross-episode memory artifact, edited by MemoryAgent \|
	\| `curriculum.md` \| Difficulty sampling weights, edited by CurriculumAgent \|
	\| `results.tsv` \| Per-episode log, append-only \|
	\| `models.py` \| Pydantic Action/Observation/State \|
	\| `server/environment.py` \| OpenEnv Environment implementation \|
	\| `server/reward.py` \| 5-component reward function \|
	\| `server/specialists/*.py` \| Diff, Patch, Test, Rollback specialists \|
	\| `server/memory/*.py` \| MemoryAgent + retrieval \|
	\| `server/curriculum/*.py` \| CurriculumAgent + sampler \|
	\| `server/program_loader.py` \| Loads relevant section of apishift_program.md \|
	\| `server/episode_logger.py` \| Appends rows to results.tsv \|

	## Why this design

	Inspectability is non-negotiable. Banking compliance, OSS users, and
	hackathon judges all need to be able to read what the agent learned and
	how its training distribution evolved. Markdown + TSV files satisfy
	this; vector stores and binary checkpoints do not.

	The autoresearch-style "agent reads markdown that it (or a peer agent)
	also writes" pattern keeps the system inspectable without sacrificing
	self-improvement capability.