Bturtel commited on
Commit
d78aaa2
·
verified ·
1 Parent(s): f24460d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +119 -0
README.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ tags:
7
+ - forecasting
8
+ - prediction
9
+ - reinforcement-learning
10
+ - grpo
11
+ - lora
12
+ - mixture-of-experts
13
+ datasets:
14
+ - LightningRodLabs/WWTD-2025
15
+ base_model: openai/gpt-oss-120b
16
+ pipeline_tag: text-generation
17
+ model-index:
18
+ - name: Trump-Forecaster
19
+ results:
20
+ - task:
21
+ type: text-generation
22
+ name: Probabilistic Forecasting
23
+ dataset:
24
+ name: WWTD-2025
25
+ type: LightningRodLabs/WWTD-2025
26
+ split: test
27
+ metrics:
28
+ - type: brier_score
29
+ value: 0.194
30
+ name: Brier Score
31
+ - type: ece
32
+ value: 0.079
33
+ name: Expected Calibration Error
34
+ ---
35
+
36
+ # Trump-Forecaster
37
+
38
+ **RL-tuned gpt-oss-120b for predicting Trump administration actions. Beats GPT-5 on held-out forecasting questions.**
39
+
40
+ This model was fine-tuned with reinforcement learning (GRPO) using Brier score as the reward signal, trained on the [WWTD-2025](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025) dataset of 2,108 binary forecasting questions about Trump's actions from January-December 2025.
41
+
42
+ ## Results
43
+
44
+ Evaluated on 682 held-out test questions (with news context):
45
+
46
+ | Model | Brier | BSS | ECE |
47
+ |---|---|---|---|
48
+ | **gpt-oss-120b RL (this model)** | **0.194** | **0.16** | **0.079** |
49
+ | GPT-5 | 0.200 | 0.14 | 0.091 |
50
+ | gpt-oss-120b (base) | 0.213 | 0.08 | 0.111 |
51
+
52
+ Without context (question only):
53
+
54
+ | Model | Brier | BSS | ECE |
55
+ |---|---|---|---|
56
+ | **gpt-oss-120b RL** | **0.242** | **-0.04** | 0.164 |
57
+ | GPT-5 | 0.258 | -0.11 | 0.191 |
58
+ | gpt-oss-120b (base) | 0.260 | -0.12 | 0.189 |
59
+
60
+ - **Brier Score**: Mean squared error between predicted probability and outcome (lower = better)
61
+ - **BSS (Brier Skill Score)**: Improvement over base-rate guessing (positive = better than naive)
62
+ - **ECE**: Expected Calibration Error (lower = better calibrated)
63
+
64
+ ## Training
65
+
66
+ - **Base model**: [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (120B MoE, 5.1B active params, 128 experts Top-4)
67
+ - **Method**: GRPO with Brier score reward via [Tinker](https://tinker.computer)
68
+ - **LoRA rank**: 32
69
+ - **Learning rate**: 4e-5
70
+ - **Batch size**: 32, group size 8
71
+ - **Training steps**: 50
72
+ - **Max tokens**: 16,384
73
+
74
+ ## Usage
75
+
76
+ ```python
77
+ from transformers import AutoModelForCausalLM, AutoTokenizer
78
+
79
+ model = AutoModelForCausalLM.from_pretrained(
80
+ "LightningRodLabs/Trump-Forecaster",
81
+ torch_dtype="auto",
82
+ device_map="auto",
83
+ trust_remote_code=True,
84
+ )
85
+ tokenizer = AutoTokenizer.from_pretrained("LightningRodLabs/Trump-Forecaster", trust_remote_code=True)
86
+
87
+ prompt = """You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes".
88
+
89
+ Question: Will Trump impose 25% tariffs on all goods from Canada by February 1, 2025?
90
+
91
+ Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags."""
92
+
93
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
94
+ outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=True, temperature=0.7)
95
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
96
+ ```
97
+
98
+ For faster inference with the MoE architecture, use [SGLang](https://github.com/sgl-project/sglang):
99
+
100
+ ```python
101
+ import sglang as sgl
102
+
103
+ engine = sgl.Engine(model_path="LightningRodLabs/Trump-Forecaster", trust_remote_code=True, dtype="bfloat16")
104
+ output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]})
105
+ ```
106
+
107
+ ## Dataset
108
+
109
+ Trained on [LightningRodLabs/WWTD-2025](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025):
110
+ - 2,790 binary forecasting questions about Trump administration actions
111
+ - Auto-generated from news (Jan-Dec 2025) using the [Lightning Rod SDK](https://lightningrod.ai/sdk)
112
+ - Ground-truth labels from web search verification
113
+ - Temporal split: 2,108 train / 682 test (no leakage)
114
+
115
+ ## Links
116
+
117
+ - Dataset: [LightningRodLabs/WWTD-2025](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025)
118
+ - Training platform: [Tinker](https://tinker.computer)
119
+ - Data generation: [Lightning Rod SDK](https://lightningrod.ai/sdk)