Instructions to use poolside-laguna-hackathon/trade-pool with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use poolside-laguna-hackathon/trade-pool with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| base_model: poolside/Laguna-XS.2 | |
| tags: | |
| - reinforcement-learning | |
| - lora | |
| - trading | |
| - coding-agent | |
| - verifiers | |
| - prime-intellect | |
| - poolside-hackathon | |
| library_name: peft | |
| # TradePool β a self-improving trading coding-agent (Laguna XS.2 LoRA) | |
| **Poolside Γ Prime Intellect Research Hackathon β Foundations track.** | |
| A LoRA adapter for `poolside/Laguna-XS.2`, trained with reinforcement learning so the | |
| model becomes a **coding agent that writes causal crypto trading-strategy functions**, | |
| scored by a leak-proof out-of-sample backtest. | |
| ## The idea in one line | |
| > Trading discipline that normally lives as *prompt text* (a memory file of rules) is | |
| > turned into **adapter weights** by rewarding disciplined, profitable behaviour on | |
| > held-out market data. The verifier *is* the backtest. | |
| ## How it works | |
| 1. **Environment** (`verifiers`, v0 `SingleTurnEnv`, pushed to `stimulir/trade-pool`): | |
| the agent is given a Base-chain token's in-sample price history + a library of causal | |
| indicators (RSI, MACD, MAs, z-score, Bollinger, volatility) and must write | |
| `def strategy(features, position) -> target_position`. | |
| 2. **Verifier / reward** β the strategy runs bar-by-bar over a **held-out** window | |
| (lookahead is structurally impossible; the function never sees future bars), scored by | |
| a weighted rubric: | |
| - OOS Sharpe (0.40) Β· beats buy-and-hold (0.20) Β· drawdown control (0.15) Β· | |
| sane exposure (0.10) Β· transaction cost (0.05) Β· valid+actually-trades (0.10) | |
| - Hard gates β reward 0: invalid code, lookahead, NaN equity, **do-nothing strategies**. | |
| 3. **Training** β Prime Hosted RL (GRPO), `poolside/Laguna-XS.2`, 50 steps, batch 128, | |
| `rollouts_per_example=8`, `enable_thinking=false`. FREE hosted Laguna run. | |
| ## Results | |
| RL produced a clean, monotonic reward climb on the training environment: | |
| | Stage | Total reward | | |
| |---|---| | |
| | step ~0 (baseline) | ~0.15 | | |
| | step ~8 | 0.19 | | |
| | step ~11 | 0.28 | | |
| | step ~13 (peak) | ~0.42 | | |
| | step ~50 (final) | ~0.34β0.41 | | |
| Every rubric component improved together (not single-metric gaming): | |
| `reward_valid` 0.30 β ~0.70 (writes valid trading code far more often), | |
| `reward_sharpe` 0.10 β 0.33, drawdown/exposure/cost all up. Held-out-symbol eval on base | |
| Laguna scored `reward_valid` 0.75 / `reward_sharpe` 0.45, confirming the env is in the | |
| healthy trainable band before training. | |
| ## The novel contribution: closing the self-improvement loop | |
| - **Weights channel:** each RL iteration warm-starts from the prior adapter | |
| (`checkpoint_id`) β genuine parametric continuation. | |
| - **Curriculum channel:** a reflection step reads the prior adapter's out-of-sample eval | |
| and shifts the next run's objective (sharpe β min-drawdown β balanced) and focuses the | |
| weakest symbols β the agent's own results drive its next curriculum. | |
| - **Falsifiable proof ("memory is the adapter"):** the discipline block (distilled from | |
| 618 real prior trading decisions) can be **stripped from the prompt** | |
| (`use_seed_principles=false`); if the trained adapter stays disciplined, the rules now | |
| live in the weights, not the prompt. | |
| ## Files | |
| - `trade_pool/` β the full `verifiers` environment (features, causal backtester, executor, | |
| rubric, data) β installable, builds to a wheel, bundles its own OHLCV tape. | |
| - `adapter/` β the trained LoRA adapter weights for `poolside/Laguna-XS.2`. | |
| - `configs/` β the RL training config(s). | |
| - `reward_curve.txt`, `eval_*.json` β training + eval metrics. | |
| ## Reproduce | |
| ```bash | |
| prime env push --path ./trade_pool --visibility PRIVATE # -> <you>/trade-pool | |
| prime eval run <you>/trade-pool -m poolside/laguna-xs.2 -n 8 -r 1 | |
| prime train run configs/iter_1.toml # FREE hosted Laguna RL | |
| prime deployments create <adapter_id> # serve the adapter | |
| ``` | |
| Built at the Poolside London hackathon, 29β30 May 2026. Team: **TradePool** (Tosin Dairo). | |