Instructions to use poolside-laguna-hackathon/trade-pool with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use poolside-laguna-hackathon/trade-pool with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
The Recursive Self-Improving Loop
How tradewatch's soft reflection (events β MEMORY.md prompt text) becomes a real gradient loop on Laguna XS.2, where improvement compounds across iterations through both adapter weights and curriculum.
The two improvement channels
- Weights (parametric continuation): each hosted RL run warm-starts from the prior
iteration's adapter via
checkpoint_id. The model is never reset β discipline learned in iter N carries into iter N+1. This is the thing tradewatch never had. - Curriculum (reflection-driven): between runs,
recursive_loop.py reflectreads the prior adapter's OOS eval and shifts the next run's objective (sharpe β min_drawdown β balanced) and focus symbols (the weakest performers). This is tradewatch'ssummarize_session_eventsreflection β repurposed to steer RL instead of prompt notes.
One iteration
ββββββββββββββββββββββββββββββββββββββββββββββββ
β configs/rl/iter_N.toml β
β model=poolside/Laguna-XS.2 β
β checkpoint_id=<iter N-1 adapter> β weights β
β [[env]] objective=..., symbols=[weak...] β
βββββββββββββββββββββ¬βββββββββββββββββββββββββββ
prime train run iter_N.toml β (FREE hosted RL, GRPO, 128 rollouts/step)
βΌ
LoRA adapter βββΊ prime deployments create
β base:adapter_id, OpenAI-compatible
βΌ
python scripts/laguna_eval.py --model base:adapter_id --split oos_symbols
(writes strategy per HELD-OUT symbol, scores via rubric)
β logs/eval_*.json
βΌ
python scripts/recursive_loop.py reflect <eval.json> --checkpoint-id <adapter>
(curriculum policy β objective + weak-symbol focus)
β
βΌ
configs/rl/iter_{N+1}.toml βββΊ loop repeats
Note Prime enforces max 1 concurrent run/user, so iterations are sequential β which is exactly what warm-starting requires anyway (iter N+1 needs iter N's adapter to exist).
Curriculum policy (_choose_objective, inspectable & deterministic)
- validβ₯0.8 but mean_total<0.5 β
min_drawdown(strategies run but lose β control risk) - pct_wrote_code<0.7 β
sharpe+ more steps (model still learning to code) - otherwise β
balanced(competent β broaden) - always: next run focuses the 3 weakest OOS symbols, rotates
seedfor fresh task mixes, lengthens to 75 steps if learning stalled (<0.5).
Closing to tradewatch (the demo)
The deployed adapter is OpenAI-compatible, so tradewatch's existing HybrieClient runs it
live with one config change:
base_url = https://api.pinference.ai/api/v1
model = poolside/Laguna-XS.2:<adapter_id>
Ablation money-shot: run the adapter with MEMORY.md stripped from the prompt. If the discipline holds, it's provably in the weights β the memory became the adapter.
Run it
# bootstrap iteration 1
python scripts/recursive_loop.py init --env-id <you>/stock-strategy-env --model poolside/Laguna-XS.2
prime train run configs/rl/iter_1.toml
prime deployments create <adapter_id>
export PRIME_API_KEY=...
python scripts/laguna_eval.py --model poolside/Laguna-XS.2:<adapter_id> --split oos_symbols
python scripts/recursive_loop.py reflect logs/eval_*.json --checkpoint-id <adapter_id>
# -> configs/rl/iter_2.toml ready; repeat