license: apache-2.0
base_model:
- Qwen/Qwen3-32B
Foresight-32B
A 32-billion parameter language model fine-tuned for probabilistic forecasting of real-world events.
Overview
Foresight-32B is a general-purpose forecasting model developed by Lightning Rod Labs. Built on Qwen3-32B and trained using outcome-based reinforcement learning, it achieves state-of-the-art forecasting performance among open-weight models—outperforming frontier LLMs 10-100x its size on prediction market benchmarks.
Key Results
In a forward-looking evaluation on 251 live Polymarket questions (July-August 2025):
| Model | Brier Score ↓ | ECE ↓ | Profitable |
|---|---|---|---|
| Foresight-32B | 0.199 | 6.0% | ✓ |
| OpenAI o3 | 0.205 | 7.8% | ✓ |
| Gemini 2.5 Pro | 0.213 | 8.2% | ✗ |
| Grok-4 | 0.218 | 9.1% | ✗ |
| Claude Opus | 0.221 | 8.9% | ✗ |
| Qwen3-32B (base) | 0.253 | 19.2% | ✗ |
| Polymarket (market) | 0.170 | — | — |
Foresight-32B led all tested LLMs on every metric: Brier score, expected calibration error (ECE), and profitability.
How It Works
See: LLMs Can Teach Themselves to Better Predict the Future See: Outcome-based Reinforcement Learning to Predict the Future
Synthetic Training Data (Foresight Learning)
We augment limited real-world prediction market data with synthetically generated forecasting questions using our data generation framework. This generates questions from streams of data (e.g., news articles) that are difficult to predict at one point in time but verifiable later. The model was trained on ~10,000 real Polymarket questions plus ~100,000 synthetic questions—with nearly 70% of training data being synthetic.
Training Details
- Base Model: Qwen3-32B
- Training Method: GRPO
- Training Data: ~10k Polymarket questions + ~100k synthetic forecasting questions
- Evaluation: Held-out test set of 1,265 questions with temporal separation to prevent leakage
Usage
Foresight-32B is available for use at dashboard.lightningrod.ai.
Input Format
The model accepts a forecasting question along with relevant context (news articles, background information) and outputs a probability estimate with reasoning. Include instructions for how the answer should be formatted for a well structured response.
Question: Will [event] happen by [date]?
Context:
[Relevant news headlines and information up to prediction date]
Output: Probability estimate (0-100%) with reasoning
Citation
If you use Foresight-32B in your research, please cite:
@article{turtel2025outcome,
title={Outcome-based Reinforcement Learning to Predict the Future},
author={Turtel, Benjamin and others},
journal={arXiv preprint arXiv:2505.17989},
year={2025}
}
@article{turtel2025llms,
title={LLMs Can Teach Themselves to Better Predict the Future},
author={Turtel, Benjamin and Franklin, Danny and Schoenegger, Philipp},
journal={arXiv preprint arXiv:2502.05253},
year={2025}
}
Contact
If you are interested in generating training data for your own models or fine-tuning custom prediction agents on your domain-specific data, reach out to support@lightningrod.ai.