Bturtel commited on
Commit
9d23dd2
·
verified ·
1 Parent(s): 90c19f5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -58,6 +58,12 @@ Evaluated on 855 held-out test questions (temporal split, Aug 2025+). Golf-Forec
58
  | gpt-oss-120b (base) | 0.218 | +12.8% | 0.083 |
59
  | GPT-5.1 | 0.218 | +12.8% | 0.106 |
60
 
 
 
 
 
 
 
61
  ### Metrics
62
 
63
  - **Brier Score**: Mean squared error between predicted probability and outcome (0 or 1). Lower is better. **Brier Skill Score (BSS)** expresses this as improvement over always predicting the base rate — positive means the model learned something useful beyond historical frequency.
 
58
  | gpt-oss-120b (base) | 0.218 | +12.8% | 0.083 |
59
  | GPT-5.1 | 0.218 | +12.8% | 0.106 |
60
 
61
+ ![Brier Skill Score](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting/resolve/main/brier_skill_score.png)
62
+
63
+ ![Brier Score Comparison](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting/resolve/main/brier_score_comparison.png)
64
+
65
+ ![ECE Comparison](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting/resolve/main/ece_comparison.png)
66
+
67
  ### Metrics
68
 
69
  - **Brier Score**: Mean squared error between predicted probability and outcome (0 or 1). Lower is better. **Brier Skill Score (BSS)** expresses this as improvement over always predicting the base rate — positive means the model learned something useful beyond historical frequency.