Files changed (1) hide show
  1. README.md +40 -48
README.md CHANGED
@@ -2,22 +2,36 @@
2
  license: apache-2.0
3
  base_model:
4
  - Qwen/Qwen3-32B
 
 
 
 
 
 
 
5
  ---
6
- # Foresight-32B
 
7
 
8
- A 32-billion parameter language model fine-tuned for probabilistic forecasting of real-world events.
9
 
10
- ## Overview
11
 
12
- Foresight-32B is a general-purpose forecasting model developed by [Lightning Rod Labs](https://lightningrod.ai). Built on Qwen3-32B and trained using outcome-based reinforcement learning, it achieves state-of-the-art forecasting performance among open-weight modelsβ€”outperforming frontier LLMs 10-100x its size on prediction market benchmarks.
 
 
 
 
 
 
13
 
14
  ## Key Results
15
 
16
- In a forward-looking evaluation on 251 live Polymarket questions (July-August 2025):
17
 
18
  | Model | Brier Score ↓ | ECE ↓ | Profitable |
19
  |-------|---------------|-------|------------|
20
- | **Foresight-32B** | **0.199** | **6.0%** | βœ“ |
21
  | OpenAI o3 | 0.205 | 7.8% | βœ“ |
22
  | Gemini 2.5 Pro | 0.213 | 8.2% | βœ— |
23
  | Grok-4 | 0.218 | 9.1% | βœ— |
@@ -25,64 +39,42 @@ In a forward-looking evaluation on 251 live Polymarket questions (July-August 20
25
  | Qwen3-32B (base) | 0.253 | 19.2% | βœ— |
26
  | Polymarket (market) | 0.170 | β€” | β€” |
27
 
28
- Foresight-32B led all tested LLMs on every metric: Brier score, expected calibration error (ECE), and profitability.
29
 
30
  ## How It Works
31
 
32
- See: [LLMs Can Teach Themselves to Better Predict the Future](https://arxiv.org/abs/2502.05253)
33
- See: [Outcome-based Reinforcement Learning to Predict the Future](https://arxiv.org/abs/2505.17989)
34
-
35
- ### Synthetic Training Data (Foresight Learning)
36
-
37
- We augment limited real-world prediction market data with synthetically generated forecasting questions using our data generation framework. This generates questions from streams of data (e.g., news articles) that are difficult to predict at one point in time but verifiable later. The model was trained on ~10,000 real Polymarket questions plus ~100,000 synthetic questionsβ€”with nearly 70% of training data being synthetic.
38
 
39
- ## Training Details
40
 
41
- - **Base Model:** Qwen3-32B
42
- - **Training Method:** GRPO
43
- - **Training Data:** ~10k Polymarket questions + ~100k synthetic forecasting questions
44
- - **Evaluation:** Held-out test set of 1,265 questions with temporal separation to prevent leakage
45
 
46
- ## Usage
47
 
48
- Foresight-32B is available for use at [dashboard.lightningrod.ai](https://dashboard.lightningrod.ai).
49
 
50
- ### Input Format
51
 
52
- The model accepts a forecasting question along with relevant context (news articles, background information) and outputs a probability estimate with reasoning. Include instructions for how the answer should be formatted for a well structured response.
53
 
54
- ```
55
- Question: Will [event] happen by [date]?
56
 
57
- Context:
58
- [Relevant news headlines and information up to prediction date]
59
 
60
- Output: Probability estimate (0-100%) with reasoning
61
- ```
 
 
 
62
 
63
- ## Citation
64
-
65
- If you use Foresight-32B in your research, please cite:
66
-
67
- ```bibtex
68
- @article{turtel2025outcome,
69
- title={Outcome-based Reinforcement Learning to Predict the Future},
70
- author={Turtel, Benjamin and others},
71
- journal={arXiv preprint arXiv:2505.17989},
72
- year={2025}
73
- }
74
 
75
- @article{turtel2025llms,
76
- title={LLMs Can Teach Themselves to Better Predict the Future},
77
- author={Turtel, Benjamin and Franklin, Danny and Schoenegger, Philipp},
78
- journal={arXiv preprint arXiv:2502.05253},
79
- year={2025}
80
- }
81
- ```
82
 
83
- ## Contact
 
 
84
 
85
- If you are interested in generating training data for your own models or fine-tuning custom prediction agents on your domain-specific data, reach out to [support@lightningrod.ai](mailto:support@lightningrod.ai).
86
 
87
  ## License
88
 
 
2
  license: apache-2.0
3
  base_model:
4
  - Qwen/Qwen3-32B
5
+ tags:
6
+ - forecasting
7
+ - prediction
8
+ - reinforcement-learning
9
+ - calibration
10
+ - polymarket
11
+ pipeline_tag: text-generation
12
  ---
13
+ # Foresight V1 32B - Open-Source Forecasting Model
14
+ **Lightning Rod Labs** | [lightningrod.ai](https://lightningrod.ai/)
15
 
16
+ Foresight V1 32B is a forecasting model fine-tuned from Qwen3-32B via outcome-based RL. Despite being 10-100x smaller, it has **outperformed frontier models** on Brier score, ECE, and profitability.
17
 
18
+ Our latest model, Foresight V3, can be tested at [dashboard.lightningrod.ai](https://dashboard.lightningrod.ai/).
19
 
20
+ Lightning Rod Labs takes you from raw data to fine-tuned model. With automated training data generation, fine-tuning, and evaluation, all in one place. No manual labeling required.
21
+
22
+ ### 3rd Party Benchmarks πŸ†
23
+
24
+ Feb 2026: Foresight V1 32B ranked #1 on Prophet Arena Sports, a benchmark run by SIGMA Lab at UChicago, beating Grok-4, GPT-5.2, Gemini 3 Pro, and Claude Opus 4.5 on live prediction questions.
25
+
26
+ Jan 2026: Foresight V1 32B is the [only non-frontier model in the top 5](https://forecastingresearch.substack.com/p/llms-are-closing-the-gap-on-human) on ForecastBench, an independent forecasting benchmark run by the Forecasting Research Institute, where AIs compete on real-world forecasting questions.
27
 
28
  ## Key Results
29
 
30
+ Evaluated on August 25, 2025 against 251 live Polymarket questions, **Foresight-v1 outperformed every frontier model tested** on accuracy (Brier Score), calibration (ECE), and profitability.
31
 
32
  | Model | Brier Score ↓ | ECE ↓ | Profitable |
33
  |-------|---------------|-------|------------|
34
+ | **Foresight V1 32B** | **0.199** | **6.0%** | βœ“ |
35
  | OpenAI o3 | 0.205 | 7.8% | βœ“ |
36
  | Gemini 2.5 Pro | 0.213 | 8.2% | βœ— |
37
  | Grok-4 | 0.218 | 9.1% | βœ— |
 
39
  | Qwen3-32B (base) | 0.253 | 19.2% | βœ— |
40
  | Polymarket (market) | 0.170 | β€” | β€” |
41
 
42
+ Further details on our methodology and results are available [here.](https://blog.lightningrod.ai/p/foresight-32b-beats-frontier-llms-on-live-polymarket-predictions)
43
 
44
  ## How It Works
45
 
46
+ Foresight V1 32B was trained using outcome-based RL. The model was shown only information available at prediction time, forced to commit to a probability, and scored against the realized outcome using the Brier score as the reward signal. Confident wrong predictions were penalized more heavily than uncertain ones, directly incentivizing calibration over overconfidence.
 
 
 
 
 
47
 
48
+ Training data was generated using our Foresight Data platform, which automatically transformed unstructured sources into labeled training datasets β€” no human annotation required.
49
 
50
+ The same framework has been applied across domains to create prediction agents and domain expert models, including finance, healthcare, insurance, and sports analytics.
 
 
 
51
 
52
+ See: [LLMs Can Teach Themselves to Better Predict the Future](https://arxiv.org/abs/2502.05253) Β· [Outcome-based Reinforcement Learning to Predict the Future](https://arxiv.org/abs/2505.17989) Β· [Future-as-Label: Scalable Supervision from Real-World Outcomes](https://arxiv.org/abs/2601.06336)
53
 
54
+ ## Output Format
55
 
56
+ Our recommended usage is for predictions, but it also works with the OpenAI API.
57
 
58
+ ## About Lighting Rod Labs
59
 
60
+ Lightning Rod Labs takes you from raw data to fine-tuned model. With automated training data generation, fine-tuning, and evaluation, all in one place. No manual labeling required. Our research is peer-reviewed and published, including in Transactions on Machine Learning Research (TMLR). Our models have been benchmarked live and outperformed the world's best.
 
61
 
62
+ A few highlights:
 
63
 
64
+ - πŸ† #1 on ProphetArena Sport, beating GPT-5.2, Gemini 3 Pro, and Grok-4 (Feb 2026)
65
+ - πŸ“Š Top 5 on ForecastBench, outperforming Claude, O3, and Grok-4 (Jan 2026)
66
+ - πŸ”¬ Published in TMLR: 14B model matches o1 accuracy and generates >10% profit in live trading simulations [[link]](https://arxiv.org/abs/2505.17989)
67
+ - πŸ›οΈ Vetted and awardable for U.S. defense procurement via DARPA ERIS and CDAO Tradewinds marketplaces
68
+ - πŸ“° Featured in The Atlantic, TIME, and the Forecasting Research Institute
69
 
70
+ ## Contact
 
 
 
 
 
 
 
 
 
 
71
 
72
+ Interested in generating training data for your own models or building a custom prediction model?
 
 
 
 
 
 
73
 
74
+ - πŸ“§ [support@lightningrod.ai](mailto:support@lightningrod.ai)
75
+ - πŸ“… [Book a demo](https://calendly.com/d/ctq4-7gd-nyq/lightning-rod-demo)
76
+ - 🌐 [lightningrod.ai/about](https://www.lightningrod.ai/about)
77
 
 
78
 
79
  ## License
80