Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -58,6 +58,12 @@ Evaluated on 855 held-out test questions (temporal split, Aug 2025+). Golf-Forec
|
|
| 58 |
| gpt-oss-120b (base) | 0.218 | +12.8% | 0.083 |
|
| 59 |
| GPT-5.1 | 0.218 | +12.8% | 0.106 |
|
| 60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
### Metrics
|
| 62 |
|
| 63 |
- **Brier Score**: Mean squared error between predicted probability and outcome (0 or 1). Lower is better. **Brier Skill Score (BSS)** expresses this as improvement over always predicting the base rate — positive means the model learned something useful beyond historical frequency.
|
|
|
|
| 58 |
| gpt-oss-120b (base) | 0.218 | +12.8% | 0.083 |
|
| 59 |
| GPT-5.1 | 0.218 | +12.8% | 0.106 |
|
| 60 |
|
| 61 |
+

|
| 62 |
+
|
| 63 |
+

|
| 64 |
+
|
| 65 |
+

|
| 66 |
+
|
| 67 |
### Metrics
|
| 68 |
|
| 69 |
- **Brier Score**: Mean squared error between predicted probability and outcome (0 or 1). Lower is better. **Brier Skill Score (BSS)** expresses this as improvement over always predicting the base rate — positive means the model learned something useful beyond historical frequency.
|