Polymarket Edge Bot

Bot vs Polymarket crowd by iteration

A LightGBM classifier on top of frozen MiniLM embeddings + market metadata. Trained on 568M Polymarket trades from 734K resolved markets. Designed to find profitable disagreements with the Polymarket crowd at the moment a market opens.


TL;DR

  • What it does: given a Polymarket binary market's question text, opening price, category, and event-level features, predicts the calibrated probability that the market resolves YES.
  • What it's for: identifying YES/NO bets where the model's probability diverges enough from the live market price to expect a positive expected return.
  • How well it works (honest, capital-constrained, with fees):
    • +7.7% return on a $1,000 bankroll over a 3-month out-of-sample test window (β‰ˆ 30% annualized) with 5% max per bet, 30% max deployed risk controls and 2% Polymarket fees.
    • 100% of 500 bootstrap resamples produce a positive PnL β€” signal is not a single lucky test split.
    • +8.5% median ROI per bet across resamples (5th percentile +7.0%, 95th +10.2%) β€” tight distribution.
    • Profitable on 82% of trading days in the test window.
    • Tied with the Polymarket crowd on overall Brier calibration; the edge comes from selectively betting on cheap underdogs and expensive favorites where asymmetric payouts make even modest accuracy profitable.

Performance

Honest, capital-constrained backtest

Honest backtest stats

The chart above is what you'd actually have experienced trading this bot with $1,000 over the 3-month test window: 79 bets placed (most markets either didn't meet the edge threshold or were skipped because we were already at the 30% deployed cap), final equity $1,077 (+7.7%), and 100% of 500 bootstrap resamples positive.

Full honest backtest

Why the honest numbers are smaller than what you might see elsewhere

Original vs honest backtest

An earlier "raw" backtest summed PnL across every bet that crossed the edge threshold (5,777 bets, +9.4% ROI per bet β†’ equity $1,593 from $1,000). That number is misleading: with $1,000 you can't actually place 5,777 simultaneous $1 bets on overlapping markets. The honest, capital-constrained simulation caps single bets at 5% of bankroll and total deployed at 30%, which limits you to ~80 bets per quarter on this universe β€” but the per-bet edge survives.

Win-rate decomposition

Win rate by quartile

The bot is approximately calibrated, not magical. For BUY YES bets at entry $0.08, the model predicts 24% YES and the actual rate is 21%. The edge isn't from outpredicting the crowd by a wide margin β€” it's from asymmetric payouts: a 21% chance to win $0.92 and 79% chance to lose $0.08 has positive expected value. The bot finds these spots and bets on them at a small scale.

Robustness across time windows (after 2% Polymarket fees)

Robustness check

Time window Brier-skill (+ = beats crowd) ROI @ edge > 10% ROI @ edge > 20%
Train ≀ 50% (Jan-Mar 2026 test, n = 103K) +0.50% +9.6% +22.4%
Train ≀ 65% (Feb-Mar 2026 test, n = 65K) +0.09% +8.4% +20.6%
Train ≀ 80% (last-10% test, n = 26K) +0.10% +8.1% +16.5%

How we got here (ablation study)

ROI progression

Iter Change Brier-skill ROI @ edge > 10%
4 Metadata-only (TF-IDF + tabular) -12.91% +3.1%
5 + market opening price as feature -1.94% +4.4%
6 + event-level features + per-category models -2.22% +5.4%
7 + frozen MiniLM sentence embeddings -0.02% +9.4%
8 + fine-tuning MiniLM end-to-end -0.11% +9.7%

Fine-tuning the encoder (iter 8) didn't beat off-the-shelf embeddings β€” model capacity wasn't the bottleneck. Further gains likely require new feature sources (real-time price velocity, news, cross-market arbitrage), not bigger language models.


Architecture

Architecture

The bot is a two-stage pipeline:

  1. Feature extraction
    • Question text β†’ 384-dim embedding from sentence-transformers/all-MiniLM-L6-v2 (frozen, mean-pooled).
    • Tabular features (40-ish dims): question length / phrase indicators, market opening price + log-odds + transformations, event-level aggregates (sibling-market open prices, count of markets in event), volume, duration, creation calendar features, neg_risk flag, category one-hots.
  2. Probabilistic head
    • LightGBM binary classifier on the concatenated feature vector.
    • Isotonic regression post-hoc calibration on a chronologically-held-out fold so the output P(YES) is a real probability.

Final feature dimension: 424 (= 40 tabular + 384 embedding).


What worked vs what didn't

What worked vs failed


Lessons

Lessons learned


How to use

Installation

pip install lightgbm scikit-learn sentence-transformers numpy huggingface_hub

Minimal inference

import json, pickle, numpy as np, lightgbm as lgb
from sentence_transformers import SentenceTransformer
from huggingface_hub import hf_hub_download

REPO = "jc-builds/polymarket-edge-bot"

spec = json.loads(open(hf_hub_download(REPO, "feature_spec.json")).read())
gbm = lgb.Booster(model_file=hf_hub_download(REPO, "lightgbm_model.txt"))
iso = pickle.load(open(hf_hub_download(REPO, "isotonic_calibrator.pkl"), "rb"))
encoder = SentenceTransformer(spec["embedding_model"])

# Build features for a market.
tabular_row = {col: 0.0 for col in spec["tabular_columns_in_order"]}
tabular_row["first_yes_price"] = 0.30
tabular_row["first_yes_price_log_odds"] = float(np.log(0.30 / 0.70))
tabular_row["first_yes_price_distance_from_half"] = 0.20
tabular_row["log_total_usd"] = float(np.log1p(50_000))
tabular_row["duration_days"] = 7
question = "Will Bitcoin close above $100,000 on Friday?"

x = np.concatenate([
    np.array([tabular_row[c] for c in spec["tabular_columns_in_order"]], dtype=np.float32),
    encoder.encode([question], normalize_embeddings=True)[0],
])[None, :]

p_raw = gbm.predict(x)[0]
p_yes = float(iso.predict([p_raw])[0])
edge = p_yes - tabular_row["first_yes_price"]
print(f"P(YES) = {p_yes:.3f}, market = {tabular_row['first_yes_price']:.2f}, edge = {edge:+.3f}")

A complete end-to-end script (including computing all event-level features from the live Polymarket Gamma API) is in inference_example.py.


Training data

  • Source: SII-WANGZJ/Polymarket_data β€” 568M trades and 734K markets sourced from the Polygon blockchain.
  • Universe filter: resolved binary markets with β‰₯ $1,000 lifetime volume and an opening YES price in [0.05, 0.95] (i.e., open with non-trivial uncertainty). ~258K markets after filters.
  • Per-market trade aggregates: computed by streaming the 28GB trades parquet via PyArrow (no full load), keeping last-60 trades per market and computing VWAPs at 1h / 6h / 24h / 72h / 168h windows before close.
  • Train/calibration split: chronological 85/15 on created_at. Calibration fold is the most-recent 15%, used only to fit the isotonic regressor.

Honest limitations

  • The bot does not "beat the crowd" on every market. Its Brier-skill is near zero. The edge comes from selectively betting where it disagrees with the market price by enough to be likely correct in aggregate.
  • The headline number is +7.7% over 3 months on $1,000. Annualized that's ~30%, which is excellent β€” but the test sample is one 3-month window. Pretty small N. Bootstrap helps confirm robustness but does not replace live forward-testing.
  • Capital constraints matter. With $1,000 you'll only place ~80 bets a quarter on this universe. The per-bet edge is real but absolute returns are throttled by the deployed-capital cap. Larger bankroll β†’ more parallel bets β†’ smoother returns, up to liquidity limits.
  • Aggregate edge is real, individual edges lie. Live picks regularly show big edges on factual markets like "Will measles cases exceed 10,000 by EOY?" (market 10%, model 45%). The model doesn't know current real-world case counts β€” it's pattern-matching the question text. Bet small on many, never large on one obvious-looking mispricing.
  • Test markets skew toward 2026. The training data spans 2022–2026 but the volume is dominated by 2025–2026. Behavior on novel market types or long-future regimes is unknown.
  • No slippage / no fill modeling. At $1 bets the impact is negligible; scaled to $1,000 bets, real Polymarket fill assumptions matter and the ROI numbers will haircut.
  • Polymarket fees are 2% on resolution, applied in the robustness numbers and the honest backtest, but not in some legacy tables. The bot survives this haircut at edge β‰₯ 10%.

Files

File Description
lightgbm_model.txt LightGBM booster dump (text format, framework-agnostic)
isotonic_calibrator.pkl sklearn IsotonicRegression for probability calibration
feature_spec.json Exact column order + dtypes the model expects
training_meta.json Training run metadata + headline performance
inference_example.py Minimal end-to-end inference snippet
charts/ The visuals on this card

Citation

If you use this model, please cite the underlying dataset and reference the release:

@misc{polymarket_edge_bot_2026,
  author = {jc-builds},
  title  = {Polymarket Edge Bot},
  year   = {2026},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/jc-builds/polymarket-edge-bot}
}

@misc{wang_2025_polymarket_data,
  author = {Wang, Zhengjie and Chao, Leiyu and Bao, Yu and Cheng, Lian and Liao, Jianhan and Li, Yikang},
  title  = {Polymarket Data: Complete Data Infrastructure for Polymarket},
  year   = {2025},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/datasets/SII-WANGZJ/Polymarket_data}
}

License

MIT.

This is a research artifact. It is not financial advice. Prediction markets are gambling in many jurisdictions; check your local laws before using this model to place real bets.


Built by @jc_builds β€” see the full iteration log, backtest code, and live-picks scoring at the project repo.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jc-builds/polymarket-edge-bot

Finetuned
(857)
this model

Dataset used to train jc-builds/polymarket-edge-bot