File size: 5,223 Bytes
d78aaa2 eb0d703 d78aaa2 0afb1aa d78aaa2 0afb1aa d78aaa2 eb0d703 0afb1aa d78aaa2 0afb1aa d78aaa2 0afb1aa eb0d703 d78aaa2 0afb1aa d78aaa2 0afb1aa d78aaa2 0afb1aa d78aaa2 0afb1aa d78aaa2 eb0d703 f061ea6 eb0d703 18a0b27 eb0d703 d78aaa2 eb0d703 d78aaa2 eb0d703 d78aaa2 eb0d703 f061ea6 d78aaa2 18a0b27 d78aaa2 18a0b27 d78aaa2 eb0d703 d78aaa2 0afb1aa d78aaa2 0afb1aa |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
---
language:
- en
license: apache-2.0
library_name: peft
tags:
- forecasting
- prediction
- reinforcement-learning
- grpo
- lora
- mixture-of-experts
- politics
- trump
- future-as-label
datasets:
- LightningRodLabs/WWTD-2025
base_model: openai/gpt-oss-120b
pipeline_tag: text-generation
model-index:
- name: Trump-Forecaster
results:
- task:
type: text-generation
name: Probabilistic Forecasting
dataset:
name: WWTD-2025
type: LightningRodLabs/WWTD-2025
split: test
metrics:
- type: brier_score
value: 0.194
name: Brier Score
- type: ece
value: 0.079
name: Expected Calibration Error
---
# Trump-Forecaster
### RL-Tuned gpt-oss-120b for Predicting Trump Administration Actions
We fine-tuned [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) with reinforcement learning to predict Trump administration actions. Trained on the [WWTD-2025](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025) dataset of 2,108 binary forecasting questions generated with the [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk), Trump-Forecaster beats GPT-5 on held-out forecasting questions.
This repo contains a **LoRA adapter** (5.3 GB) for gpt-oss-120b. A standalone `merge.py` script is included to produce a full merged model if needed.
[Dataset](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025) 路 [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) 路 [Future-as-Label paper](https://arxiv.org/abs/2601.06336) 路 [Outcome-based RL paper](https://arxiv.org/abs/2505.17989)
---
## Results
Evaluated on 682 held-out test questions under two conditions: with news context, and without context (question only). The no-context condition reveals whether the model knows what it doesn't know鈥攗ntrained models project false confidence, while RL training fixes overconfidence.
| Model | Brier (With Context) | BSS | Brier (No Context) | BSS | ECE (With Context) | ECE (No Context) |
|-------|:---:|:---:|:---:|:---:|:---:|:---:|
| GPT-5 | 0.200 | +0.14 | 0.258 | -0.11 | 0.091 | 0.191 |
| gpt-oss-120b (base) | 0.213 | +0.08 | 0.260 | -0.12 | 0.111 | 0.190 |
| **Trump-Forecaster** | **0.194** | **+0.16** | **0.242** | **-0.04** | **0.079** | **0.164** |



### Metrics
- **Brier Score**: Mean squared error between predicted probability and outcome (0 or 1). Lower is better. **Brier Skill Score (BSS)** expresses this as improvement over always predicting the base rate鈥攑ositive means the model learned something useful beyond historical frequency.
- **Expected Calibration Error (ECE)**: Measures whether predicted probabilities match actual frequencies. "70%" predictions should resolve "yes" 70% of the time. Lower is better.
---
## Training
- **Base model**: [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (120B MoE, 5.1B active params, 128 experts Top-4)
- **Method**: GRPO with Brier score reward via [Tinker](https://tinker.computer)
- **LoRA rank**: 32
- **Learning rate**: 4e-5
- **Batch size**: 32, group size 8
- **Training steps**: 50
- **Max tokens**: 16,384
---
## Usage
This repo contains a LoRA adapter trained with [Tinker](https://tinker.computer). The adapter uses Tinker's module naming convention, so it requires a merge step before inference. A standalone `merge.py` script is included.
### Merge into full model
```bash
pip install torch transformers safetensors tqdm huggingface-hub
python merge.py --output ./trump-forecaster-merged
```
This downloads the base model, dequantizes to bf16, applies the LoRA adapter, and saves the merged model.
### Inference
```python
import sglang as sgl
engine = sgl.Engine(
model_path="./trump-forecaster-merged",
tokenizer_path="openai/gpt-oss-120b",
trust_remote_code=True,
dtype="bfloat16",
tp_size=2,
)
news_context = "... relevant news articles ..."
prompt = f"""You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes".
Question: Will Trump impose 25% tariffs on all goods from Canada by February 1, 2025?
Context:
{news_context}
Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags."""
output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]})
print(output["text"])
```
---
## Links
- **Dataset**: [LightningRodLabs/WWTD-2025](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025)
- **Training platform**: [Tinker](https://tinker.computer)
- **Data generation**: [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk)
- **Future-as-Label paper**: [arxiv:2601.06336](https://arxiv.org/abs/2601.06336)
- **Outcome-based RL paper**: [arxiv:2505.17989](https://arxiv.org/abs/2505.17989)
|