trade-pool / LOOP.md
tosi-n's picture
Upload folder using huggingface_hub
ce6b50a verified
|
Raw
History Blame Contribute Delete
4.01 kB
# The Recursive Self-Improving Loop
How tradewatch's soft reflection (events β†’ MEMORY.md prompt text) becomes a real
gradient loop on Laguna XS.2, where improvement compounds across iterations through
**both adapter weights and curriculum**.
## The two improvement channels
1. **Weights (parametric continuation):** each hosted RL run warm-starts from the prior
iteration's adapter via `checkpoint_id`. The model is never reset β€” discipline learned
in iter N carries into iter N+1. This is the thing tradewatch never had.
2. **Curriculum (reflection-driven):** between runs, `recursive_loop.py reflect` reads the
prior adapter's OOS eval and shifts the next run's **objective** (sharpe β†’ min_drawdown
β†’ balanced) and **focus symbols** (the weakest performers). This is tradewatch's
`summarize_session_events` reflection β€” repurposed to steer RL instead of prompt notes.
## One iteration
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ configs/rl/iter_N.toml β”‚
β”‚ model=poolside/Laguna-XS.2 β”‚
β”‚ checkpoint_id=<iter N-1 adapter> ← weights β”‚
β”‚ [[env]] objective=..., symbols=[weak...] β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
prime train run iter_N.toml β”‚ (FREE hosted RL, GRPO, 128 rollouts/step)
β–Ό
LoRA adapter ──► prime deployments create
β”‚ base:adapter_id, OpenAI-compatible
β–Ό
python scripts/laguna_eval.py --model base:adapter_id --split oos_symbols
(writes strategy per HELD-OUT symbol, scores via rubric)
β”‚ logs/eval_*.json
β–Ό
python scripts/recursive_loop.py reflect <eval.json> --checkpoint-id <adapter>
(curriculum policy β†’ objective + weak-symbol focus)
β”‚
β–Ό
configs/rl/iter_{N+1}.toml ──► loop repeats
```
Note Prime enforces **max 1 concurrent run/user**, so iterations are sequential β€” which
is exactly what warm-starting requires anyway (iter N+1 needs iter N's adapter to exist).
## Curriculum policy (`_choose_objective`, inspectable & deterministic)
- validβ‰₯0.8 but mean_total<0.5 β†’ `min_drawdown` (strategies run but lose β†’ control risk)
- pct_wrote_code<0.7 β†’ `sharpe` + more steps (model still learning to code)
- otherwise β†’ `balanced` (competent β†’ broaden)
- always: next run focuses the 3 weakest OOS symbols, rotates `seed` for fresh task mixes,
lengthens to 75 steps if learning stalled (<0.5).
## Closing to tradewatch (the demo)
The deployed adapter is OpenAI-compatible, so tradewatch's existing `HybrieClient` runs it
live with one config change:
```
base_url = https://api.pinference.ai/api/v1
model = poolside/Laguna-XS.2:<adapter_id>
```
**Ablation money-shot:** run the adapter with MEMORY.md stripped from the prompt. If the
discipline holds, it's provably in the weights β€” the memory became the adapter.
## Run it
```bash
# bootstrap iteration 1
python scripts/recursive_loop.py init --env-id <you>/stock-strategy-env --model poolside/Laguna-XS.2
prime train run configs/rl/iter_1.toml
prime deployments create <adapter_id>
export PRIME_API_KEY=...
python scripts/laguna_eval.py --model poolside/Laguna-XS.2:<adapter_id> --split oos_symbols
python scripts/recursive_loop.py reflect logs/eval_*.json --checkpoint-id <adapter_id>
# -> configs/rl/iter_2.toml ready; repeat
```