Instructions to use poolside-laguna-hackathon/trade-pool with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use poolside-laguna-hackathon/trade-pool with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
| # The Recursive Self-Improving Loop | |
| How tradewatch's soft reflection (events β MEMORY.md prompt text) becomes a real | |
| gradient loop on Laguna XS.2, where improvement compounds across iterations through | |
| **both adapter weights and curriculum**. | |
| ## The two improvement channels | |
| 1. **Weights (parametric continuation):** each hosted RL run warm-starts from the prior | |
| iteration's adapter via `checkpoint_id`. The model is never reset β discipline learned | |
| in iter N carries into iter N+1. This is the thing tradewatch never had. | |
| 2. **Curriculum (reflection-driven):** between runs, `recursive_loop.py reflect` reads the | |
| prior adapter's OOS eval and shifts the next run's **objective** (sharpe β min_drawdown | |
| β balanced) and **focus symbols** (the weakest performers). This is tradewatch's | |
| `summarize_session_events` reflection β repurposed to steer RL instead of prompt notes. | |
| ## One iteration | |
| ``` | |
| ββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β configs/rl/iter_N.toml β | |
| β model=poolside/Laguna-XS.2 β | |
| β checkpoint_id=<iter N-1 adapter> β weights β | |
| β [[env]] objective=..., symbols=[weak...] β | |
| βββββββββββββββββββββ¬βββββββββββββββββββββββββββ | |
| prime train run iter_N.toml β (FREE hosted RL, GRPO, 128 rollouts/step) | |
| βΌ | |
| LoRA adapter βββΊ prime deployments create | |
| β base:adapter_id, OpenAI-compatible | |
| βΌ | |
| python scripts/laguna_eval.py --model base:adapter_id --split oos_symbols | |
| (writes strategy per HELD-OUT symbol, scores via rubric) | |
| β logs/eval_*.json | |
| βΌ | |
| python scripts/recursive_loop.py reflect <eval.json> --checkpoint-id <adapter> | |
| (curriculum policy β objective + weak-symbol focus) | |
| β | |
| βΌ | |
| configs/rl/iter_{N+1}.toml βββΊ loop repeats | |
| ``` | |
| Note Prime enforces **max 1 concurrent run/user**, so iterations are sequential β which | |
| is exactly what warm-starting requires anyway (iter N+1 needs iter N's adapter to exist). | |
| ## Curriculum policy (`_choose_objective`, inspectable & deterministic) | |
| - validβ₯0.8 but mean_total<0.5 β `min_drawdown` (strategies run but lose β control risk) | |
| - pct_wrote_code<0.7 β `sharpe` + more steps (model still learning to code) | |
| - otherwise β `balanced` (competent β broaden) | |
| - always: next run focuses the 3 weakest OOS symbols, rotates `seed` for fresh task mixes, | |
| lengthens to 75 steps if learning stalled (<0.5). | |
| ## Closing to tradewatch (the demo) | |
| The deployed adapter is OpenAI-compatible, so tradewatch's existing `HybrieClient` runs it | |
| live with one config change: | |
| ``` | |
| base_url = https://api.pinference.ai/api/v1 | |
| model = poolside/Laguna-XS.2:<adapter_id> | |
| ``` | |
| **Ablation money-shot:** run the adapter with MEMORY.md stripped from the prompt. If the | |
| discipline holds, it's provably in the weights β the memory became the adapter. | |
| ## Run it | |
| ```bash | |
| # bootstrap iteration 1 | |
| python scripts/recursive_loop.py init --env-id <you>/stock-strategy-env --model poolside/Laguna-XS.2 | |
| prime train run configs/rl/iter_1.toml | |
| prime deployments create <adapter_id> | |
| export PRIME_API_KEY=... | |
| python scripts/laguna_eval.py --model poolside/Laguna-XS.2:<adapter_id> --split oos_symbols | |
| python scripts/recursive_loop.py reflect logs/eval_*.json --checkpoint-id <adapter_id> | |
| # -> configs/rl/iter_2.toml ready; repeat | |
| ``` | |