Golf-Forecaster / README.md

Broaden scope description in README

87d7f5d verified 2 days ago

4.23 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: peft
	tags:
	- forecasting
	- prediction
	- reinforcement-learning
	- grpo
	- lora
	- mixture-of-experts
	- golf
	- sports
	- future-as-label
	datasets:
	- LightningRodLabs/GolfForecasting
	base_model: openai/gpt-oss-120b
	pipeline_tag: text-generation
	model-index:
	- name: Golf-Forecaster
	results:
	- task:
	type: text-generation
	name: Probabilistic Forecasting
	dataset:
	name: GolfForecasting
	type: LightningRodLabs/GolfForecasting
	split: test
	metrics:
	- type: brier_score
	value: 0.207
	name: Brier Score
	- type: ece
	value: 0.062
	name: Expected Calibration Error
	---

	# Golf-Forecaster

	LoRA adapter for [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b), RL-tuned to predict professional golf outcomes — tournament winners, cuts, matchups, majors, team events, season races, world rankings, and player milestones across every major tour. Trained on 3,178 binary forecasting questions from [GolfForecasting](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting) using the [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk). Beats GPT-5.

	[Dataset](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting) · [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) · [Future-as-Label paper](https://arxiv.org/abs/2601.06336) · [Outcome-based RL paper](https://arxiv.org/abs/2505.17989)

	---

	## Results

	Evaluated on 855 held-out test questions (temporal split, Aug 2025+).

	\| Model \| Brier Score \| Brier Skill Score \| ECE \|
	\|-------\|:---:\|:---:\|:---:\|
	\| Golf-Forecaster \| 0.207 \| +17.0% \| 0.062 \|
	\| gpt-oss-120b (base) \| 0.218 \| +12.8% \| 0.083 \|
	\| GPT-5 \| 0.218 \| +12.8% \| 0.106 \|

	![Brier Skill Score](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting/resolve/main/brier_skill_score.png)

	![Brier Score Comparison](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting/resolve/main/brier_score_comparison.png)

	![ECE Comparison](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting/resolve/main/ece_comparison.png)

	Brier Score: Mean squared error between predicted probability and outcome. Lower is better. BSS measures improvement over always predicting the base rate. ECE: Whether predicted probabilities match actual frequencies. Lower is better.

	---

	## Training

	- Base model: [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (120B MoE, 5.1B active params)
	- Method: GRPO with Brier score reward via [Tinker](https://tinker.computer)
	- LoRA rank: 32, learning rate 4e-5, batch size 32, group size 8, 100 steps

	---

	## Usage

	The adapter uses Tinker's module naming convention, so it requires a merge step before inference. A standalone `merge.py` script is included.

	### Merge into full model

	```bash
	pip install torch transformers safetensors tqdm huggingface-hub
	python merge.py --output ./golf-forecaster-merged
	```

	### Inference

	```python
	import sglang as sgl

	engine = sgl.Engine(
	model_path="./golf-forecaster-merged",
	tokenizer_path="openai/gpt-oss-120b",
	trust_remote_code=True,
	dtype="bfloat16",
	tp_size=2,
	)

	news_context = "... relevant news articles ..."

	prompt = f"""You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes".

	Question: Will Scottie Scheffler win the 2025 Masters?

	Context:
	{news_context}

	Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags."""

	output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]})
	print(output["text"])
	```

	---

	## Links

	- Dataset: [LightningRodLabs/GolfForecasting](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting)
	- Training platform: [Tinker](https://tinker.computer)
	- Data generation: [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk)
	- Future-as-Label paper: [arxiv:2601.06336](https://arxiv.org/abs/2601.06336)
	- Outcome-based RL paper: [arxiv:2505.17989](https://arxiv.org/abs/2505.17989)