Update README.md
#4
by
gretcheny - opened
README.md
CHANGED
|
@@ -29,15 +29,7 @@ Jan 2026: Foresight V1 32B is the [only non-frontier model in the top 5](https:/
|
|
| 29 |
|
| 30 |
Evaluated on August 25, 2025 against 251 live Polymarket questions, **Foresight-v1 outperformed every frontier model tested** on accuracy (Brier Score), calibration (ECE), and profitability.
|
| 31 |
|
| 32 |
-
|
| 33 |
-
|-------|---------------|-------|------------|
|
| 34 |
-
| **Foresight V1 32B** | **0.199** | **6.0%** | β |
|
| 35 |
-
| OpenAI o3 | 0.205 | 7.8% | β |
|
| 36 |
-
| Gemini 2.5 Pro | 0.213 | 8.2% | β |
|
| 37 |
-
| Grok-4 | 0.218 | 9.1% | β |
|
| 38 |
-
| Claude Opus | 0.221 | 8.9% | β |
|
| 39 |
-
| Qwen3-32B (base) | 0.253 | 19.2% | β |
|
| 40 |
-
| Polymarket (market) | 0.170 | β | β |
|
| 41 |
|
| 42 |
Further details on our methodology and results are available [here.](https://blog.lightningrod.ai/p/foresight-32b-beats-frontier-llms-on-live-polymarket-predictions)
|
| 43 |
|
|
@@ -54,6 +46,7 @@ See: [LLMs Can Teach Themselves to Better Predict the Future](https://arxiv.org/
|
|
| 54 |
## Output Format
|
| 55 |
|
| 56 |
Our recommended usage is for predictions, but it also works with the OpenAI API.
|
|
|
|
| 57 |
|
| 58 |
## About Lighting Rod Labs
|
| 59 |
|
|
|
|
| 29 |
|
| 30 |
Evaluated on August 25, 2025 against 251 live Polymarket questions, **Foresight-v1 outperformed every frontier model tested** on accuracy (Brier Score), calibration (ECE), and profitability.
|
| 31 |
|
| 32 |
+
<img src="image%207.png" width="1000">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
Further details on our methodology and results are available [here.](https://blog.lightningrod.ai/p/foresight-32b-beats-frontier-llms-on-live-polymarket-predictions)
|
| 35 |
|
|
|
|
| 46 |
## Output Format
|
| 47 |
|
| 48 |
Our recommended usage is for predictions, but it also works with the OpenAI API.
|
| 49 |
+
<img src="image%206.png" width="600">
|
| 50 |
|
| 51 |
## About Lighting Rod Labs
|
| 52 |
|