|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- reinforcement-learning |
|
|
- ppo |
|
|
- trading |
|
|
- prediction-markets |
|
|
- polymarket |
|
|
- crypto |
|
|
- cross-market |
|
|
- temporal-encoding |
|
|
language: |
|
|
- en |
|
|
library_name: pytorch |
|
|
pipeline_tag: reinforcement-learning |
|
|
thumbnail: lacuna-end.png |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
 |
|
|
|
|
|
# LACUNA |
|
|
|
|
|
**Cross-Market Data Fusion for Prediction Market Trading** |
|
|
|
|
|
[](https://humanplane.com/lacuna) |
|
|
[](https://opensource.org/licenses/MIT) |
|
|
[](https://pytorch.org/) |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
An experiment in cross-market data fusion. A reinforcement learning agent trained to trade Polymarket's 15-minute crypto prediction markets by fusing Binance futures order flow with Polymarket orderbook data. |
|
|
|
|
|
**The thesis**: read the "fast" market (Binance) and trade the "slow" market (Polymarket) before the price adjusts. |
|
|
|
|
|
> **Note**: This represents ~10 hours of paper trading data from a single run on New Year's Eve 2025. The model traded with a $500 fixed position size and $2,000 max exposure (up to 4 concurrent positions). |
|
|
|
|
|
## Results |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| Total PnL | $50,195 | |
|
|
| Return on Exposure | 2,510% | |
|
|
| Sharpe Ratio | 4.13 | |
|
|
| Profit Factor | 1.21 | |
|
|
| Total Trades | 29,176 | |
|
|
| Win Rate | 23.9% | |
|
|
| Runtime | ~10 hours | |
|
|
|
|
|
### Learning Progression |
|
|
|
|
|
Comparing first 25% vs last 25% of trades: |
|
|
|
|
|
| Phase | Avg PnL/Trade | Win Rate | |
|
|
|-------|---------------|----------| |
|
|
| First 25% | +$1.27 | 22.5% | |
|
|
| Last 25% | +$3.56 | 25.3% | |
|
|
|
|
|
**2.8x improvement** in avg PnL per trade. Last 25% of trades generated **52%** of total profit. |
|
|
|
|
|
> **Limitations**: Single 10-hour run. No out-of-sample validation. Results could reflect market regime, not learned behavior. We're sharing the raw data—draw your own conclusions. |
|
|
|
|
|
### Performance by Asset |
|
|
|
|
|
| Asset | PnL | Trades | Win Rate | |
|
|
|-------|-----|--------|----------| |
|
|
| BTC | +$38,794 | 8,257 | 32.6% | |
|
|
| ETH | +$9,978 | 7,859 | 27.0% | |
|
|
| SOL | +$1,752 | 6,310 | 16.3% | |
|
|
| XRP | -$328 | 6,750 | 16.7% | |
|
|
|
|
|
## Architecture |
|
|
|
|
|
LACUNA (v5) uses a temporal PPO architecture: |
|
|
|
|
|
- **Temporal Encoder**: Sees last 5 states instead of just the present |
|
|
- **Asymmetric Actor-Critic**: Separate networks for policy and value |
|
|
- **Feature Normalization**: Stabilizes training across different market conditions |
|
|
|
|
|
### Model Constraints |
|
|
|
|
|
- Fixed position size: $500 per trade |
|
|
- Max exposure: $2,000 (up to 4 concurrent positions) |
|
|
- Markets: 15-minute crypto prediction markets (BTC, ETH, SOL, XRP) |
|
|
|
|
|
### Observation Space (18 dimensions) |
|
|
|
|
|
Fuses data from two sources into an 18-dimensional state: |
|
|
|
|
|
| Category | Features | |
|
|
|----------|----------| |
|
|
| Momentum | 1m/5m/10m returns | |
|
|
| Order flow | L1/L5 imbalance, trade flow, CVD acceleration | |
|
|
| Microstructure | Spread %, trade intensity, large trade flag | |
|
|
| Volatility | 5m vol, vol expansion ratio | |
|
|
| Position | Has position, side, PnL, time remaining | |
|
|
| Regime | Vol regime, trend regime | |
|
|
|
|
|
## Training Evolution |
|
|
|
|
|
Five phases over three days. Each taught us something. Only the last earned a name. |
|
|
|
|
|
### Phase 1: Shaped Rewards (Failed) |
|
|
|
|
|
**Duration**: ~52 min | **Trades**: 1,545 | **Result**: Policy collapse |
|
|
|
|
|
Started with micro-bonuses to guide learning: |
|
|
- +0.002 for trading with momentum |
|
|
- +0.001 for larger positions |
|
|
- -0.001 for fighting momentum |
|
|
|
|
|
**What happened**: Entropy collapsed from 1.09 → 0.36. The agent learned to game the reward function—collect bonuses while ignoring actual profitability. Buffer showed 90% win rate while real trade win rate was 20%. |
|
|
|
|
|
**Lesson**: Reward shaping backfired here. When shaping rewards were gameable and similar magnitude to the real signal, the agent optimized the wrong thing. |
|
|
|
|
|
### Phase 2: Pure Realized PnL |
|
|
|
|
|
**Duration**: ~1 hour | **Trades**: 2,000+ | **Result**: 55% ROI |
|
|
|
|
|
Stripped everything back: |
|
|
- Reward ONLY on position close |
|
|
- Increased entropy coefficient (0.05 → 0.10) |
|
|
- Simplified actions (7 → 3) |
|
|
- Smaller buffer (2048 → 512) |
|
|
|
|
|
| Update | Entropy | PnL | Win Rate | |
|
|
|--------|---------|-----|----------| |
|
|
| 1 | 0.68 | $5.20 | 33.3% | |
|
|
| 36 | 1.05 | $10.93 | 21.2% | |
|
|
|
|
|
Win rate settled at 21%—below random (33%)—but profitable. Binary markets have asymmetric payoffs. (Still using probability-based PnL at this point.) |
|
|
|
|
|
### Phase 3: Scaled Up ($50 trades) |
|
|
|
|
|
**Duration**: ~50 min | **Trades**: 4,133 | **Result**: -$64 → +$23 |
|
|
|
|
|
First update hit -$64 drawdown. But the agent recovered: |
|
|
|
|
|
| Update | PnL | Win Rate | |
|
|
|--------|-----|----------| |
|
|
| 1 | -$63.75 | 29.5% | |
|
|
| 36 | +$23.10 | 15.6% | |
|
|
|
|
|
**Observation**: The agent recovered from -$64 to +$23 without policy collapse. |
|
|
|
|
|
### Phase 4: Share-Based PnL ($500 trades) |
|
|
|
|
|
**Duration**: ~1 hour | **Trades**: 4,873 | **Result**: 170% ROI |
|
|
|
|
|
Changed reward signal to reflect actual market economics: |
|
|
|
|
|
```python |
|
|
# Old: probability-based |
|
|
pnl = (exit_price - entry_price) * dollars |
|
|
|
|
|
# New: share-based |
|
|
shares = dollars / entry_price |
|
|
pnl = (exit_price - entry_price) * shares |
|
|
``` |
|
|
|
|
|
| Update | PnL | Win Rate | |
|
|
|--------|-----|----------| |
|
|
| 1 | -$197 | 18.9% | |
|
|
| 20 | -$465 | 18.5% | |
|
|
| 46 | +$3,392 | 19.0% | |
|
|
|
|
|
**4.5x improvement** over Phase 3's reward signal. |
|
|
|
|
|
### Phase 5: LACUNA (Final) |
|
|
|
|
|
**Duration**: ~10 hours | **Trades**: 29,176 | **Result**: 2,510% ROI |
|
|
|
|
|
Architecture rethink: |
|
|
- **Temporal encoder**: 5-state history instead of single-frame |
|
|
- **Asymmetric actor-critic**: Separate network capacities |
|
|
- **Feature normalization**: Stable across market regimes |
|
|
|
|
|
It started with a big loss. Seemed broken. Left it running on New Year's Eve while counting down to midnight—not out of hope, just neglect. |
|
|
|
|
|
Checked back hours later. The equity curve had inflected. By morning: **+$50,195**. |
|
|
|
|
|
Only this version earned a name. |
|
|
|
|
|
--- |
|
|
|
|
|
## Observed Patterns |
|
|
|
|
|
These patterns emerged in the data. Whether they represent learned behavior or market regime effects is unclear without further validation. |
|
|
|
|
|
| Pattern | Observation | |
|
|
|---------|-------------| |
|
|
| **Low volatility preference** | $4.07/trade on calm markets vs -$1.44 on volatile | |
|
|
| **Cheap outcome bias** | Cheap entries (<30¢) yield $8.63/trade vs $1.53 for expensive | |
|
|
| **DOWN momentum** | 77% of trades bet DOWN when prob is falling | |
|
|
| **Short hold times on winners** | 0.35x hold time vs losers | |
|
|
|
|
|
These could reflect genuine learned strategies or simply profitable patterns in this specific market window. |
|
|
|
|
|
## What We Observed |
|
|
|
|
|
1. **Reward shaping backfired** - Phase 1 collapsed when the agent gamed micro-bonuses. Pure realized PnL worked better for us. |
|
|
|
|
|
2. **Reward signal design mattered** - Share-based PnL outperformed probability-based by 4.5x. Match actual market economics. |
|
|
|
|
|
3. **Entropy coefficient mattered** - 0.05 caused policy collapse; 0.10 maintained exploration. |
|
|
|
|
|
4. **Buffer/trade divergence was a warning sign** - When buffer win rate diverged from actual trades, the agent was optimizing the wrong thing. |
|
|
|
|
|
5. **Give it time** - LACUNA started deep in the red. Early performance wasn't indicative. |
|
|
|
|
|
--- |
|
|
|
|
|
## The Story |
|
|
|
|
|
This is our final checkpoint. We're done experimenting with LACUNA, but you don't have to be. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from safetensors.torch import load_file |
|
|
import numpy as np |
|
|
import json |
|
|
|
|
|
# Load model weights |
|
|
weights = load_file("model.safetensors") |
|
|
|
|
|
# Load normalization stats (for preprocessing observations) |
|
|
stats = np.load("normalization_stats.npz") |
|
|
obs_mean = stats["obs_mean"] |
|
|
obs_std = stats["obs_std"] |
|
|
|
|
|
# Load config for architecture details |
|
|
with open("config.json") as f: |
|
|
config = json.load(f) |
|
|
|
|
|
# Normalize observations before inference |
|
|
def normalize_obs(obs): |
|
|
return (obs - obs_mean) / (obs_std + 1e-8) |
|
|
``` |
|
|
|
|
|
## Files |
|
|
|
|
|
- `README.md` - This documentation |
|
|
- `config.json` - Model configuration and architecture details |
|
|
- `model.safetensors` - Model weights in SafeTensors format |
|
|
- `normalization_stats.npz` - Observation normalization statistics |
|
|
- `trades.csv` - All 29,176 trades with full details |
|
|
- `updates.csv` - Training updates with metrics over time |
|
|
|
|
|
## Links |
|
|
|
|
|
- [Live Results](https://humanplane.com/lacuna) - Interactive visualization |
|
|
- [Training Code](https://github.com/humanplane/cross-market-state-fusion) - GitHub repository |
|
|
|
|
|
## License |
|
|
|
|
|
MIT |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{lacuna2025, |
|
|
author = {HumanPlane}, |
|
|
title = {LACUNA: Cross-Market Data Fusion for Prediction Market Trading}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
url = {https://huggingface.co/HumanPlane/LACUNA} |
|
|
} |
|
|
``` |
|
|
|