File size: 8,562 Bytes
69c6b58 edc6f97 69c6b58 edc6f97 69c6b58 9de6932 8148d3a 69c6b58 7cfc480 69c6b58 7cfc480 69c6b58 7cfc480 69c6b58 59bb64b 69c6b58 59bb64b 69c6b58 5f9a3f2 69c6b58 7cfc480 69c6b58 7cfc480 69c6b58 7cfc480 69c6b58 7cfc480 69c6b58 59bb64b 69c6b58 59bb64b 69c6b58 59bb64b 69c6b58 59bb64b 69c6b58 59bb64b 69c6b58 59bb64b 69c6b58 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
---
license: mit
tags:
- reinforcement-learning
- ppo
- trading
- prediction-markets
- polymarket
- crypto
- cross-market
- temporal-encoding
language:
- en
library_name: pytorch
pipeline_tag: reinforcement-learning
thumbnail: lacuna-end.png
---
<div align="center">

# LACUNA
**Cross-Market Data Fusion for Prediction Market Trading**
[](https://humanplane.com/lacuna)
[](https://opensource.org/licenses/MIT)
[](https://pytorch.org/)
</div>
---
An experiment in cross-market data fusion. A reinforcement learning agent trained to trade Polymarket's 15-minute crypto prediction markets by fusing Binance futures order flow with Polymarket orderbook data.
**The thesis**: read the "fast" market (Binance) and trade the "slow" market (Polymarket) before the price adjusts.
> **Note**: This represents ~10 hours of paper trading data from a single run on New Year's Eve 2025. The model traded with a $500 fixed position size and $2,000 max exposure (up to 4 concurrent positions).
## Results
| Metric | Value |
|--------|-------|
| Total PnL | $50,195 |
| Return on Exposure | 2,510% |
| Sharpe Ratio | 4.13 |
| Profit Factor | 1.21 |
| Total Trades | 29,176 |
| Win Rate | 23.9% |
| Runtime | ~10 hours |
### Learning Progression
Comparing first 25% vs last 25% of trades:
| Phase | Avg PnL/Trade | Win Rate |
|-------|---------------|----------|
| First 25% | +$1.27 | 22.5% |
| Last 25% | +$3.56 | 25.3% |
**2.8x improvement** in avg PnL per trade. Last 25% of trades generated **52%** of total profit.
> **Limitations**: Single 10-hour run. No out-of-sample validation. Results could reflect market regime, not learned behavior. We're sharing the raw data—draw your own conclusions.
### Performance by Asset
| Asset | PnL | Trades | Win Rate |
|-------|-----|--------|----------|
| BTC | +$38,794 | 8,257 | 32.6% |
| ETH | +$9,978 | 7,859 | 27.0% |
| SOL | +$1,752 | 6,310 | 16.3% |
| XRP | -$328 | 6,750 | 16.7% |
## Architecture
LACUNA (v5) uses a temporal PPO architecture:
- **Temporal Encoder**: Sees last 5 states instead of just the present
- **Asymmetric Actor-Critic**: Separate networks for policy and value
- **Feature Normalization**: Stabilizes training across different market conditions
### Model Constraints
- Fixed position size: $500 per trade
- Max exposure: $2,000 (up to 4 concurrent positions)
- Markets: 15-minute crypto prediction markets (BTC, ETH, SOL, XRP)
### Observation Space (18 dimensions)
Fuses data from two sources into an 18-dimensional state:
| Category | Features |
|----------|----------|
| Momentum | 1m/5m/10m returns |
| Order flow | L1/L5 imbalance, trade flow, CVD acceleration |
| Microstructure | Spread %, trade intensity, large trade flag |
| Volatility | 5m vol, vol expansion ratio |
| Position | Has position, side, PnL, time remaining |
| Regime | Vol regime, trend regime |
## Training Evolution
Five phases over three days. Each taught us something. Only the last earned a name.
### Phase 1: Shaped Rewards (Failed)
**Duration**: ~52 min | **Trades**: 1,545 | **Result**: Policy collapse
Started with micro-bonuses to guide learning:
- +0.002 for trading with momentum
- +0.001 for larger positions
- -0.001 for fighting momentum
**What happened**: Entropy collapsed from 1.09 → 0.36. The agent learned to game the reward function—collect bonuses while ignoring actual profitability. Buffer showed 90% win rate while real trade win rate was 20%.
**Lesson**: Reward shaping backfired here. When shaping rewards were gameable and similar magnitude to the real signal, the agent optimized the wrong thing.
### Phase 2: Pure Realized PnL
**Duration**: ~1 hour | **Trades**: 2,000+ | **Result**: 55% ROI
Stripped everything back:
- Reward ONLY on position close
- Increased entropy coefficient (0.05 → 0.10)
- Simplified actions (7 → 3)
- Smaller buffer (2048 → 512)
| Update | Entropy | PnL | Win Rate |
|--------|---------|-----|----------|
| 1 | 0.68 | $5.20 | 33.3% |
| 36 | 1.05 | $10.93 | 21.2% |
Win rate settled at 21%—below random (33%)—but profitable. Binary markets have asymmetric payoffs. (Still using probability-based PnL at this point.)
### Phase 3: Scaled Up ($50 trades)
**Duration**: ~50 min | **Trades**: 4,133 | **Result**: -$64 → +$23
First update hit -$64 drawdown. But the agent recovered:
| Update | PnL | Win Rate |
|--------|-----|----------|
| 1 | -$63.75 | 29.5% |
| 36 | +$23.10 | 15.6% |
**Observation**: The agent recovered from -$64 to +$23 without policy collapse.
### Phase 4: Share-Based PnL ($500 trades)
**Duration**: ~1 hour | **Trades**: 4,873 | **Result**: 170% ROI
Changed reward signal to reflect actual market economics:
```python
# Old: probability-based
pnl = (exit_price - entry_price) * dollars
# New: share-based
shares = dollars / entry_price
pnl = (exit_price - entry_price) * shares
```
| Update | PnL | Win Rate |
|--------|-----|----------|
| 1 | -$197 | 18.9% |
| 20 | -$465 | 18.5% |
| 46 | +$3,392 | 19.0% |
**4.5x improvement** over Phase 3's reward signal.
### Phase 5: LACUNA (Final)
**Duration**: ~10 hours | **Trades**: 29,176 | **Result**: 2,510% ROI
Architecture rethink:
- **Temporal encoder**: 5-state history instead of single-frame
- **Asymmetric actor-critic**: Separate network capacities
- **Feature normalization**: Stable across market regimes
It started with a big loss. Seemed broken. Left it running on New Year's Eve while counting down to midnight—not out of hope, just neglect.
Checked back hours later. The equity curve had inflected. By morning: **+$50,195**.
Only this version earned a name.
---
## Observed Patterns
These patterns emerged in the data. Whether they represent learned behavior or market regime effects is unclear without further validation.
| Pattern | Observation |
|---------|-------------|
| **Low volatility preference** | $4.07/trade on calm markets vs -$1.44 on volatile |
| **Cheap outcome bias** | Cheap entries (<30¢) yield $8.63/trade vs $1.53 for expensive |
| **DOWN momentum** | 77% of trades bet DOWN when prob is falling |
| **Short hold times on winners** | 0.35x hold time vs losers |
These could reflect genuine learned strategies or simply profitable patterns in this specific market window.
## What We Observed
1. **Reward shaping backfired** - Phase 1 collapsed when the agent gamed micro-bonuses. Pure realized PnL worked better for us.
2. **Reward signal design mattered** - Share-based PnL outperformed probability-based by 4.5x. Match actual market economics.
3. **Entropy coefficient mattered** - 0.05 caused policy collapse; 0.10 maintained exploration.
4. **Buffer/trade divergence was a warning sign** - When buffer win rate diverged from actual trades, the agent was optimizing the wrong thing.
5. **Give it time** - LACUNA started deep in the red. Early performance wasn't indicative.
---
## The Story
This is our final checkpoint. We're done experimenting with LACUNA, but you don't have to be.
## Usage
```python
import torch
from safetensors.torch import load_file
import numpy as np
import json
# Load model weights
weights = load_file("model.safetensors")
# Load normalization stats (for preprocessing observations)
stats = np.load("normalization_stats.npz")
obs_mean = stats["obs_mean"]
obs_std = stats["obs_std"]
# Load config for architecture details
with open("config.json") as f:
config = json.load(f)
# Normalize observations before inference
def normalize_obs(obs):
return (obs - obs_mean) / (obs_std + 1e-8)
```
## Files
- `README.md` - This documentation
- `config.json` - Model configuration and architecture details
- `model.safetensors` - Model weights in SafeTensors format
- `normalization_stats.npz` - Observation normalization statistics
- `trades.csv` - All 29,176 trades with full details
- `updates.csv` - Training updates with metrics over time
## Links
- [Live Results](https://humanplane.com/lacuna) - Interactive visualization
- [Training Code](https://github.com/humanplane/cross-market-state-fusion) - GitHub repository
## License
MIT
## Citation
```bibtex
@misc{lacuna2025,
author = {HumanPlane},
title = {LACUNA: Cross-Market Data Fusion for Prediction Market Trading},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/HumanPlane/LACUNA}
}
```
|