LightningRodLabs
/

foresight-32B

Text Generation

reinforcement-learning

Model card Files Files and versions

Update README.md

#4

by gretcheny - opened Mar 13

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

Files changed (1) hide show

README.md +2 -9

README.md CHANGED Viewed

@@ -29,15 +29,7 @@ Jan 2026: Foresight V1 32B is the [only non-frontier model in the top 5](https:/
 Evaluated on August 25, 2025 against 251 live Polymarket questions, **Foresight-v1 outperformed every frontier model tested** on accuracy (Brier Score), calibration (ECE), and profitability.
-| Model | Brier Score ↓ | ECE ↓ | Profitable |
-|-------|---------------|-------|------------|
-| **Foresight V1 32B** | **0.199** | **6.0%** | ✓ |
-| OpenAI o3 | 0.205 | 7.8% | ✓ |
-| Gemini 2.5 Pro | 0.213 | 8.2% | ✗ |
-| Grok-4 | 0.218 | 9.1% | ✗ |
-| Claude Opus | 0.221 | 8.9% | ✗ |
-| Qwen3-32B (base) | 0.253 | 19.2% | ✗ |
-| Polymarket (market) | 0.170 | — | — |
 Further details on our methodology and results are available [here.](https://blog.lightningrod.ai/p/foresight-32b-beats-frontier-llms-on-live-polymarket-predictions)
@@ -54,6 +46,7 @@ See: [LLMs Can Teach Themselves to Better Predict the Future](https://arxiv.org/
 ## Output Format
 Our recommended usage is for predictions, but it also works with the OpenAI API.
 ## About Lighting Rod Labs

 Evaluated on August 25, 2025 against 251 live Polymarket questions, **Foresight-v1 outperformed every frontier model tested** on accuracy (Brier Score), calibration (ECE), and profitability.
+<img src="image%207.png" width="1000">
 Further details on our methodology and results are available [here.](https://blog.lightningrod.ai/p/foresight-32b-beats-frontier-llms-on-live-polymarket-predictions)
 ## Output Format
 Our recommended usage is for predictions, but it also works with the OpenAI API.
+<img src="image%206.png" width="600">
 ## About Lighting Rod Labs