Bturtel commited on
Commit
44ff89a
·
verified ·
1 Parent(s): 9d23dd2

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +14 -42
README.md CHANGED
@@ -38,11 +38,7 @@ model-index:
38
 
39
  # Golf-Forecaster
40
 
41
- ### RL-Tuned gpt-oss-120b for Predicting Professional Golf Outcomes
42
-
43
- We fine-tuned [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) with reinforcement learning to predict professional golf outcomes across PGA Tour, LIV Golf, LPGA, DP World Tour, majors, and the Ryder Cup. Trained on the [GolfForecasting](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting) dataset of 3,178 binary forecasting questions generated with the [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk), Golf-Forecaster beats GPT-5.1 on held-out forecasting questions.
44
-
45
- This repo contains a **LoRA adapter** (5.3 GB) for gpt-oss-120b. A standalone `merge.py` script is included to produce a full merged model if needed.
46
 
47
  [Dataset](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting) · [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) · [Future-as-Label paper](https://arxiv.org/abs/2601.06336) · [Outcome-based RL paper](https://arxiv.org/abs/2505.17989)
48
 
@@ -50,13 +46,13 @@ This repo contains a **LoRA adapter** (5.3 GB) for gpt-oss-120b. A standalone `m
50
 
51
  ## Results
52
 
53
- Evaluated on 855 held-out test questions (temporal split, Aug 2025+). Golf-Forecaster achieves the best Brier score, highest skill score, and best calibration.
54
 
55
  | Model | Brier Score | Brier Skill Score | ECE |
56
  |-------|:---:|:---:|:---:|
57
  | **Golf-Forecaster** | **0.207** | **+17.0%** | **0.062** |
58
  | gpt-oss-120b (base) | 0.218 | +12.8% | 0.083 |
59
- | GPT-5.1 | 0.218 | +12.8% | 0.106 |
60
 
61
  ![Brier Skill Score](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting/resolve/main/brier_skill_score.png)
62
 
@@ -64,28 +60,21 @@ Evaluated on 855 held-out test questions (temporal split, Aug 2025+). Golf-Forec
64
 
65
  ![ECE Comparison](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting/resolve/main/ece_comparison.png)
66
 
67
- ### Metrics
68
-
69
- - **Brier Score**: Mean squared error between predicted probability and outcome (0 or 1). Lower is better. **Brier Skill Score (BSS)** expresses this as improvement over always predicting the base rate — positive means the model learned something useful beyond historical frequency.
70
- - **Expected Calibration Error (ECE)**: Measures whether predicted probabilities match actual frequencies. "70%" predictions should resolve "yes" 70% of the time. Lower is better.
71
 
72
  ---
73
 
74
  ## Training
75
 
76
- - **Base model**: [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (120B MoE, 5.1B active params, 128 experts Top-4)
77
  - **Method**: GRPO with Brier score reward via [Tinker](https://tinker.computer)
78
- - **LoRA rank**: 32
79
- - **Learning rate**: 4e-5
80
- - **Batch size**: 32, group size 8
81
- - **Training steps**: 100
82
- - **Max tokens**: 16,384
83
 
84
  ---
85
 
86
  ## Usage
87
 
88
- This repo contains a LoRA adapter trained with [Tinker](https://tinker.computer). The adapter uses Tinker's module naming convention, so it requires a merge step before inference. A standalone `merge.py` script is included.
89
 
90
  ### Merge into full model
91
 
@@ -94,11 +83,7 @@ pip install torch transformers safetensors tqdm huggingface-hub
94
  python merge.py --output ./golf-forecaster-merged
95
  ```
96
 
97
- This downloads the base model, dequantizes to bf16, applies the LoRA adapter, and saves the merged model.
98
-
99
- ### Inference with the merged model
100
-
101
- With [SGLang](https://github.com/sgl-project/sglang) (recommended for MoE):
102
 
103
  ```python
104
  import sglang as sgl
@@ -111,34 +96,21 @@ engine = sgl.Engine(
111
  tp_size=2,
112
  )
113
 
114
- prompt = """You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes".
 
 
115
 
116
  Question: Will Scottie Scheffler win the 2025 Masters?
117
 
 
 
 
118
  Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags."""
119
 
120
  output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]})
121
  print(output["text"])
122
  ```
123
 
124
- Or with transformers:
125
-
126
- ```python
127
- from transformers import AutoModelForCausalLM, AutoTokenizer
128
-
129
- model = AutoModelForCausalLM.from_pretrained(
130
- "./golf-forecaster-merged",
131
- torch_dtype="auto",
132
- device_map="auto",
133
- trust_remote_code=True,
134
- )
135
- tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-120b", trust_remote_code=True)
136
-
137
- inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
138
- outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=True, temperature=0.7)
139
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
140
- ```
141
-
142
  ---
143
 
144
  ## Links
 
38
 
39
  # Golf-Forecaster
40
 
41
+ **LoRA adapter** for [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b), RL-tuned to predict professional golf outcomes across PGA Tour, LIV Golf, LPGA, DP World Tour, majors, and the Ryder Cup. Trained on 3,178 binary forecasting questions from [GolfForecasting](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting) using the [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk). Beats GPT-5.
 
 
 
 
42
 
43
  [Dataset](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting) · [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) · [Future-as-Label paper](https://arxiv.org/abs/2601.06336) · [Outcome-based RL paper](https://arxiv.org/abs/2505.17989)
44
 
 
46
 
47
  ## Results
48
 
49
+ Evaluated on 855 held-out test questions (temporal split, Aug 2025+).
50
 
51
  | Model | Brier Score | Brier Skill Score | ECE |
52
  |-------|:---:|:---:|:---:|
53
  | **Golf-Forecaster** | **0.207** | **+17.0%** | **0.062** |
54
  | gpt-oss-120b (base) | 0.218 | +12.8% | 0.083 |
55
+ | GPT-5 | 0.218 | +12.8% | 0.106 |
56
 
57
  ![Brier Skill Score](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting/resolve/main/brier_skill_score.png)
58
 
 
60
 
61
  ![ECE Comparison](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting/resolve/main/ece_comparison.png)
62
 
63
+ **Brier Score**: Mean squared error between predicted probability and outcome. Lower is better. **BSS** measures improvement over always predicting the base rate. **ECE**: Whether predicted probabilities match actual frequencies. Lower is better.
 
 
 
64
 
65
  ---
66
 
67
  ## Training
68
 
69
+ - **Base model**: [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (120B MoE, 5.1B active params)
70
  - **Method**: GRPO with Brier score reward via [Tinker](https://tinker.computer)
71
+ - **LoRA rank**: 32, learning rate 4e-5, batch size 32, group size 8, 100 steps
 
 
 
 
72
 
73
  ---
74
 
75
  ## Usage
76
 
77
+ The adapter uses Tinker's module naming convention, so it requires a merge step before inference. A standalone `merge.py` script is included.
78
 
79
  ### Merge into full model
80
 
 
83
  python merge.py --output ./golf-forecaster-merged
84
  ```
85
 
86
+ ### Inference
 
 
 
 
87
 
88
  ```python
89
  import sglang as sgl
 
96
  tp_size=2,
97
  )
98
 
99
+ news_context = "... relevant news articles ..."
100
+
101
+ prompt = f"""You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes".
102
 
103
  Question: Will Scottie Scheffler win the 2025 Masters?
104
 
105
+ Context:
106
+ {news_context}
107
+
108
  Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags."""
109
 
110
  output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]})
111
  print(output["text"])
112
  ```
113
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  ---
115
 
116
  ## Links