Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +14 -42

README.md CHANGED Viewed

@@ -38,11 +38,7 @@ model-index:
 # Golf-Forecaster
-### RL-Tuned gpt-oss-120b for Predicting Professional Golf Outcomes
-We fine-tuned [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) with reinforcement learning to predict professional golf outcomes across PGA Tour, LIV Golf, LPGA, DP World Tour, majors, and the Ryder Cup. Trained on the [GolfForecasting](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting) dataset of 3,178 binary forecasting questions generated with the [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk), Golf-Forecaster beats GPT-5.1 on held-out forecasting questions.
-This repo contains a **LoRA adapter** (5.3 GB) for gpt-oss-120b. A standalone `merge.py` script is included to produce a full merged model if needed.
 [Dataset](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting) · [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) · [Future-as-Label paper](https://arxiv.org/abs/2601.06336) · [Outcome-based RL paper](https://arxiv.org/abs/2505.17989)
@@ -50,13 +46,13 @@ This repo contains a **LoRA adapter** (5.3 GB) for gpt-oss-120b. A standalone `m
 ## Results
-Evaluated on 855 held-out test questions (temporal split, Aug 2025+). Golf-Forecaster achieves the best Brier score, highest skill score, and best calibration.
 | Model | Brier Score | Brier Skill Score | ECE |
 |-------|:---:|:---:|:---:|
 | **Golf-Forecaster** | **0.207** | **+17.0%** | **0.062** |
 | gpt-oss-120b (base) | 0.218 | +12.8% | 0.083 |
-| GPT-5.1 | 0.218 | +12.8% | 0.106 |
 ![Brier Skill Score](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting/resolve/main/brier_skill_score.png)
@@ -64,28 +60,21 @@ Evaluated on 855 held-out test questions (temporal split, Aug 2025+). Golf-Forec
 ![ECE Comparison](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting/resolve/main/ece_comparison.png)
-### Metrics
-- **Brier Score**: Mean squared error between predicted probability and outcome (0 or 1). Lower is better. **Brier Skill Score (BSS)** expresses this as improvement over always predicting the base rate — positive means the model learned something useful beyond historical frequency.
-- **Expected Calibration Error (ECE)**: Measures whether predicted probabilities match actual frequencies. "70%" predictions should resolve "yes" 70% of the time. Lower is better.
 ---
 ## Training
-- **Base model**: [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (120B MoE, 5.1B active params, 128 experts Top-4)
 - **Method**: GRPO with Brier score reward via [Tinker](https://tinker.computer)
-- **LoRA rank**: 32
-- **Learning rate**: 4e-5
-- **Batch size**: 32, group size 8
-- **Training steps**: 100
-- **Max tokens**: 16,384
 ---
 ## Usage
-This repo contains a LoRA adapter trained with [Tinker](https://tinker.computer). The adapter uses Tinker's module naming convention, so it requires a merge step before inference. A standalone `merge.py` script is included.
 ### Merge into full model
@@ -94,11 +83,7 @@ pip install torch transformers safetensors tqdm huggingface-hub
 python merge.py --output ./golf-forecaster-merged
 ```
-This downloads the base model, dequantizes to bf16, applies the LoRA adapter, and saves the merged model.
-### Inference with the merged model
-With [SGLang](https://github.com/sgl-project/sglang) (recommended for MoE):
 ```python
 import sglang as sgl
@@ -111,34 +96,21 @@ engine = sgl.Engine(
     tp_size=2,
 )
-prompt = """You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes".
 Question: Will Scottie Scheffler win the 2025 Masters?
 Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags."""
 output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]})
 print(output["text"])
 ```
-Or with transformers:
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model = AutoModelForCausalLM.from_pretrained(
-    "./golf-forecaster-merged",
-    torch_dtype="auto",
-    device_map="auto",
-    trust_remote_code=True,
-)
-tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-120b", trust_remote_code=True)
-inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=True, temperature=0.7)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-```
 ---
 ## Links

 # Golf-Forecaster
+**LoRA adapter** for [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b), RL-tuned to predict professional golf outcomes across PGA Tour, LIV Golf, LPGA, DP World Tour, majors, and the Ryder Cup. Trained on 3,178 binary forecasting questions from [GolfForecasting](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting) using the [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk). Beats GPT-5.
 [Dataset](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting) · [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) · [Future-as-Label paper](https://arxiv.org/abs/2601.06336) · [Outcome-based RL paper](https://arxiv.org/abs/2505.17989)
 ## Results
+Evaluated on 855 held-out test questions (temporal split, Aug 2025+).
 | Model | Brier Score | Brier Skill Score | ECE |
 |-------|:---:|:---:|:---:|
 | **Golf-Forecaster** | **0.207** | **+17.0%** | **0.062** |
 | gpt-oss-120b (base) | 0.218 | +12.8% | 0.083 |
+| GPT-5 | 0.218 | +12.8% | 0.106 |
 ![Brier Skill Score](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting/resolve/main/brier_skill_score.png)
 ![ECE Comparison](https://huggingface.co/datasets/LightningRodLabs/GolfForecasting/resolve/main/ece_comparison.png)
+**Brier Score**: Mean squared error between predicted probability and outcome. Lower is better. **BSS** measures improvement over always predicting the base rate. **ECE**: Whether predicted probabilities match actual frequencies. Lower is better.
 ---
 ## Training
+- **Base model**: [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (120B MoE, 5.1B active params)
 - **Method**: GRPO with Brier score reward via [Tinker](https://tinker.computer)
+- **LoRA rank**: 32, learning rate 4e-5, batch size 32, group size 8, 100 steps
 ---
 ## Usage
+The adapter uses Tinker's module naming convention, so it requires a merge step before inference. A standalone `merge.py` script is included.
 ### Merge into full model
 python merge.py --output ./golf-forecaster-merged
 ```
+### Inference
 ```python
 import sglang as sgl
     tp_size=2,
 )
+news_context = "... relevant news articles ..."
+prompt = f"""You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes".
 Question: Will Scottie Scheffler win the 2025 Masters?
+Context:
+{news_context}
 Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags."""
 output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]})
 print(output["text"])
 ```
 ---
 ## Links