# The Recursive Self-Improving Loop How tradewatch's soft reflection (events → MEMORY.md prompt text) becomes a real gradient loop on Laguna XS.2, where improvement compounds across iterations through **both adapter weights and curriculum**. ## The two improvement channels 1. **Weights (parametric continuation):** each hosted RL run warm-starts from the prior iteration's adapter via `checkpoint_id`. The model is never reset — discipline learned in iter N carries into iter N+1. This is the thing tradewatch never had. 2. **Curriculum (reflection-driven):** between runs, `recursive_loop.py reflect` reads the prior adapter's OOS eval and shifts the next run's **objective** (sharpe → min_drawdown → balanced) and **focus symbols** (the weakest performers). This is tradewatch's `summarize_session_events` reflection — repurposed to steer RL instead of prompt notes. ## One iteration ``` ┌──────────────────────────────────────────────┐ │ configs/rl/iter_N.toml │ │ model=poolside/Laguna-XS.2 │ │ checkpoint_id= ← weights │ │ [[env]] objective=..., symbols=[weak...] │ └───────────────────┬──────────────────────────┘ prime train run iter_N.toml │ (FREE hosted RL, GRPO, 128 rollouts/step) ▼ LoRA adapter ──► prime deployments create │ base:adapter_id, OpenAI-compatible ▼ python scripts/laguna_eval.py --model base:adapter_id --split oos_symbols (writes strategy per HELD-OUT symbol, scores via rubric) │ logs/eval_*.json ▼ python scripts/recursive_loop.py reflect --checkpoint-id (curriculum policy → objective + weak-symbol focus) │ ▼ configs/rl/iter_{N+1}.toml ──► loop repeats ``` Note Prime enforces **max 1 concurrent run/user**, so iterations are sequential — which is exactly what warm-starting requires anyway (iter N+1 needs iter N's adapter to exist). ## Curriculum policy (`_choose_objective`, inspectable & deterministic) - valid≥0.8 but mean_total<0.5 → `min_drawdown` (strategies run but lose → control risk) - pct_wrote_code<0.7 → `sharpe` + more steps (model still learning to code) - otherwise → `balanced` (competent → broaden) - always: next run focuses the 3 weakest OOS symbols, rotates `seed` for fresh task mixes, lengthens to 75 steps if learning stalled (<0.5). ## Closing to tradewatch (the demo) The deployed adapter is OpenAI-compatible, so tradewatch's existing `HybrieClient` runs it live with one config change: ``` base_url = https://api.pinference.ai/api/v1 model = poolside/Laguna-XS.2: ``` **Ablation money-shot:** run the adapter with MEMORY.md stripped from the prompt. If the discipline holds, it's provably in the weights — the memory became the adapter. ## Run it ```bash # bootstrap iteration 1 python scripts/recursive_loop.py init --env-id /stock-strategy-env --model poolside/Laguna-XS.2 prime train run configs/rl/iter_1.toml prime deployments create export PRIME_API_KEY=... python scripts/laguna_eval.py --model poolside/Laguna-XS.2: --split oos_symbols python scripts/recursive_loop.py reflect logs/eval_*.json --checkpoint-id # -> configs/rl/iter_2.toml ready; repeat ```