|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
library_name: peft |
|
|
tags: |
|
|
- forecasting |
|
|
- prediction |
|
|
- reinforcement-learning |
|
|
- grpo |
|
|
- lora |
|
|
- mixture-of-experts |
|
|
- golf |
|
|
- sports |
|
|
- future-as-label |
|
|
datasets: |
|
|
- LightningRodLabs/GolfForecasting |
|
|
base_model: openai/gpt-oss-120b |
|
|
pipeline_tag: text-generation |
|
|
model-index: |
|
|
- name: Golf-Forecaster |
|
|
results: |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Probabilistic Forecasting |
|
|
dataset: |
|
|
name: GolfForecasting |
|
|
type: LightningRodLabs/GolfForecasting |
|
|
split: test |
|
|
metrics: |
|
|
- type: brier_score |
|
|
value: 0.207 |
|
|
name: Brier Score |
|
|
- type: ece |
|
|
value: 0.062 |
|
|
name: Expected Calibration Error |
|
|
--- |
|
|
|
|
|
# Golf-Forecaster |
|
|
|
|
|
**LoRA adapter** for [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b), RL-tuned to predict professional golf outcomes — tournament winners, cuts, matchups, majors, team events, season races, world rankings, and player milestones across every major tour. Trained on 3,178 binary forecasting questions from [GolfForecasting](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting) using the [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk). Beats GPT-5. |
|
|
|
|
|
[Dataset](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting) · [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) · [Future-as-Label paper](https://arxiv.org/abs/2601.06336) · [Outcome-based RL paper](https://arxiv.org/abs/2505.17989) |
|
|
|
|
|
--- |
|
|
|
|
|
## Results |
|
|
|
|
|
Evaluated on 855 held-out test questions (temporal split, Aug 2025+). |
|
|
|
|
|
| Model | Brier Score | Brier Skill Score | ECE | |
|
|
|-------|:---:|:---:|:---:| |
|
|
| **Golf-Forecaster** | **0.207** | **+17.0%** | **0.062** | |
|
|
| gpt-oss-120b (base) | 0.218 | +12.8% | 0.083 | |
|
|
| GPT-5 | 0.218 | +12.8% | 0.106 | |
|
|
|
|
|
 |
|
|
|
|
|
 |
|
|
|
|
|
 |
|
|
|
|
|
**Brier Score**: Mean squared error between predicted probability and outcome. Lower is better. **BSS** measures improvement over always predicting the base rate. **ECE**: Whether predicted probabilities match actual frequencies. Lower is better. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training |
|
|
|
|
|
- **Base model**: [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (120B MoE, 5.1B active params) |
|
|
- **Method**: GRPO with Brier score reward via [Tinker](https://tinker.computer) |
|
|
- **LoRA rank**: 32, learning rate 4e-5, batch size 32, group size 8, 100 steps |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
The adapter uses Tinker's module naming convention, so it requires a merge step before inference. A standalone `merge.py` script is included. |
|
|
|
|
|
### Merge into full model |
|
|
|
|
|
```bash |
|
|
pip install torch transformers safetensors tqdm huggingface-hub |
|
|
python merge.py --output ./golf-forecaster-merged |
|
|
``` |
|
|
|
|
|
### Inference |
|
|
|
|
|
```python |
|
|
import sglang as sgl |
|
|
|
|
|
engine = sgl.Engine( |
|
|
model_path="./golf-forecaster-merged", |
|
|
tokenizer_path="openai/gpt-oss-120b", |
|
|
trust_remote_code=True, |
|
|
dtype="bfloat16", |
|
|
tp_size=2, |
|
|
) |
|
|
|
|
|
news_context = "... relevant news articles ..." |
|
|
|
|
|
prompt = f"""You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes". |
|
|
|
|
|
Question: Will Scottie Scheffler win the 2025 Masters? |
|
|
|
|
|
Context: |
|
|
{news_context} |
|
|
|
|
|
Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags.""" |
|
|
|
|
|
output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]}) |
|
|
print(output["text"]) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Links |
|
|
|
|
|
- **Dataset**: [LightningRodLabs/GolfForecasting](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting) |
|
|
- **Training platform**: [Tinker](https://tinker.computer) |
|
|
- **Data generation**: [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) |
|
|
- **Future-as-Label paper**: [arxiv:2601.06336](https://arxiv.org/abs/2601.06336) |
|
|
- **Outcome-based RL paper**: [arxiv:2505.17989](https://arxiv.org/abs/2505.17989) |
|
|
|